##### ANOVA vs. ANCOVA vs. Interactions
#####   working with gravid data.
##### Source: started with my hw3-stats file, but then fiddled.

### read it in, take a look at it.

> gravid.data=read.table("/Users/burnett/MMB/MiscResearch/Empirical-S+howto/Example-st515Assigns/DataForAssigns/gravid.txt",header=TRUE)
> attach(gravid.data)
> names(gravid.data)
[1] "cohort" "size"   "nembr" 
> boxplot(size~cohort)

###
### Look at the data: this plot gives the best picture of what's happening:
### nembr seems to differ by cohort, but also seems related to size
### (2 parallel lines).

> plot(size,nembr,type="n"); text(size,nembr,labels=cohort)

## Here's where the lines came from:  

> gravid.regression=lm(nembr~size)
> abline(gravid.regression)
> gravid.regression=lm(nembr~cohort)
> abline(gravid.regression)

### Start with just the simple model.  (We've seen this before.)
### Model1 cohort: nembr ~ cohort.

> cohort.f=factor(cohort)
> aov(nembr~cohort.f)
Call:
   aov(formula = nembr ~ cohort.f)

Terms:
                 cohort.f Residuals
Sum of Squares   468.0863 1979.1348
Deg. of Freedom         1        93

Residual standard error: 4.613135 
Estimated effects may be unbalanced

> summary(aov(nembr~cohort.f))
            Df  Sum Sq Mean Sq F value    Pr(>F)    
cohort.f     1  468.09  468.09  21.995 9.353e-06 ***
Residuals   93 1979.13   21.28                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 


### ----------------------------------------------------
### Model 2: nembr ~ (cohort and size). Main effects only.
### This is the ANCOVA model.

> gravid.lm.model2 = lm(nembr ~ cohort.f + size)
> summary(gravid.lm.model2)

Call:
lm(formula = nembr ~ cohort.f + size)

Residuals:
      Min        1Q    Median        3Q       Max 
-11.81823  -1.99596   0.03492   2.27726  17.60081 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -9.1307     5.6970  -1.603   0.1124     ##<- ignore
cohort.f2     2.1240     0.9668   2.197   0.0305 *   ##<- ignore
size          6.4794     1.3098   4.947 3.39e-06 *** ##<- ignore
---
Signif. codes:  0 Ô***Ő 0.001 Ô**Ő 0.01 Ô*Ő 0.05 Ô.Ő 0.1 Ô Ő 1 

Residual standard error: 4.122 on 92 degrees of freedom
Multiple R-squared: 0.3612,	Adjusted R-squared: 0.3473 
F-statistic: 26.01 on 2 and 92 DF,  p-value: 1.115e-09 	
	##this p-value we care about (tells if we trust the model)


> summary(aov(gravid.lm.model2))
            Df  Sum Sq Mean Sq F value    Pr(>F)    
cohort.f     1  468.09  468.09  27.547 9.792e-07 ***
size         1  415.82  415.82  24.471 3.388e-06 ***
Residuals   92 1563.31   16.99                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

-----------------------------------------------------------------------
###
### Model 3: nembr ~ (cohort and size). WITH interaction.

> gravid.lm.model3 = lm(nembr ~ cohort.f + size + (cohort.f*size))
> summary(gravid.lm.model3)

> gravid.lm.model3 = lm(nembr ~ cohort.f + size + (cohort.f*size))
> summary(gravid.lm.model3)

Call:
lm(formula = nembr ~ cohort.f + size + (cohort.f * size))

Residuals:
    Min      1Q  Median      3Q     Max 
-9.5412 -2.4557  0.2548  2.1780 14.9670 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      10.507      7.727   1.360 0.177275      ##<- ignore
cohort.f2       -37.168     11.152  -3.333 0.001245 **   ##<- ignore
size              1.939      1.782   1.088 0.279359      ##<- ignore
cohort.f2:size    8.738      2.472   3.535 0.000643 ***  ##<- ignore
---
Signif. codes:  0 Ô***Ő 0.001 Ô**Ő 0.01 Ô*Ő 0.05 Ô.Ő 0.1 Ô Ő 1 

Residual standard error: 3.887 on 91 degrees of freedom
Multiple R-squared: 0.4383,	Adjusted R-squared: 0.4198 
F-statistic: 23.67 on 3 and 91 DF,  p-value: 2.059e-11 

> summary(aov(gravid.lm.model3))
              Df  Sum Sq Mean Sq F value    Pr(>F)    
cohort.f       1  468.09  468.09  30.989 2.599e-07 ***
size           1  415.82  415.82  27.529 1.003e-06 ***
cohort.f:size  1  188.75  188.75  12.496 0.0006434 ***
Residuals     91 1374.56   15.11                      
---
Signif. codes:  0 Ô***Ő 0.001 Ô**Ő 0.01 Ô*Ő 0.05 Ô.Ő 0.1 Ô Ő 1 


### -------------------------------------------------
### which of the above is the best model?

## Answer: Looks like Model 3 is best: everything came out
##  significant, which seems to say that it's all highly relevant,
##  ie that there are different means for each of these factors
##  that we need to account for separately.  (Using the most
##  complicated model that shows signif, which turns out to be what we did here,
##  is always the safest choice too.

## Note: in some cases you have to run an extra F-test to find out the
##  answer to this question.  (To find out why, take stats-515.)