Skip to main content

Factorial ANOVA with control treatment not integrated into the factorial

Factorial treatment designs are popular, due to advantages of research on multiple treatment factors and how they interact.  But if the design includes a control treatment that is not part of the factorial, problems occur in estimation of least squares means.  A typical example is shown here, with 2 fertilizer and 3 irrigation treatments, giving 6 factorial treatment combinations, plus a control that is defined by a 3rd level of fertilizer, and a 4th level of irrigation:
Fert1:Irrig2               Fert2:Irrig1             Fert1:Irrig1
Fert2:Irrig3               Fert1:Irrig3             Fert2:Irrig2           Control
Other situations might have the control sharing a level of one of the factors, for example the control might be defined as Fert2:Irrig4.  But this still causes problems with estimation of least squares means due to the levels of one factor not occurring with all levels of the other factor.

Let's jump into a SAS example, but using random numbers to allow easy creation of the example data.  This code creates 13 treatments, a 4 by 3 factorial plus a control,  each tested in 3 blocks of a randomized block design.
data fake;
 do block=1,2,3;
 treatment='Control'; timing='control';  y=100 + 20*block + 10*rannor(46390); output;
 do treatment='A','B','C','D';
 do timing='early','mid','late';
  y=100 + 20*block + 10*rannor(46390);
  output;
 end;end;end;
run;

If we run a standard mixed model ANOVA, you will see that all of the main effect means are missing.  The control treatment creates a level that all of the other levels are missing, thus the means are undefined.
proc mixed data=fake;
 class block treatment timing;
 model y = treatment | timing;
 random block;
 lsmeans treatment | timing ;
run;

An academic solution is to change the model to
 model y = treatment * timing;
which has the effect of analyzing just the 13 treatments, and then using contrast and estimate statements to extract the desired information.  But imagine the difficulty, as there will be 3 contrasts to recreate the main effect and interaction tests of the ANOVA table, 7 estimates to get the main effect means,  then 10+6 contrasts to compare the main effect means to each other if mean separation information is needed.
A simpler solution is to use the bylevel option, which requests that least squares means only average over the levels that exist.  The code is
proc mixed data=fake;
 class block treatment timing;
 model y = treatment | timing;
 random block;
 lsmeans treatment | timing / bylevel ;
run;
and you now see all means being estimated, and the ANOVA has the expected factorial DF.  We can verify that the means are averaged over the expected treatments by requesting raw means, and find them identical. 
proc means data=fake;  ways 1 2;
 class treatment timing;
 var y;
run;
BE VERY CAREFUL with the bylevel option, as it happily averages over whatever data exists, and can produce results that are not easily interpretable (for example means can be biased by including or not including comparable levels).  In this particular setting, the control treatment is excluded from the factorial means, but included in the list of means so mean separation can be conducted as usual, providing a perfect solution to our problem.

However, suppose the control treatment does share a level of the factorial.  We can create this situation by taking the data above and redefining the control, then rerunning the analysis.
data fake2; set fake;
 if treatment='Control' then timing='early';
run;
proc mixed data=fake2;
 class block treatment timing;
 model y = treatment | timing;
 random block;
 lsmeans treatment | timing /bylevel;
run;
Now you will see that the DF for treatment in the ANOVA table is incorrect (includes control differences), and the least squares mean for early timing is incorrect, biased by inclusion of the control.
Therefore this bylevel solution only works if the control is defined as distinct from the factorial, not sharing any of the factorial levels.

Finally, for those who use the %danda macro, the SAS code is
%include 'danda.sas';
%mmaov(fake,y,class=block treatment timing, fixed=treatment|timing, random=block, options=bylevel);


Comments

Popular posts from this blog

DANDA - A macro collection for easier SAS statistical analysis

Objective :  You are running ANOVAs or regressions in SAS, and wish there was a way to avoid writing the dozens of commands needed to conduct the analysis and generate recommended diagnostics and summary of results, not to mention the hundreds of possible options that might be needed to access recommended methods.  A possible solution is to download a copy of danda.sas below, and use this macro collection to run the dozens of commands with one statement.  We will also have future posts covering various uses of danda.sas, giving examples as always. danda.sas is under continued development, check this page for updates. Date                       Version               Link 2021/03/15             2.12.030          danda.sas 2021/03/15             2.12                UserManual.pdf     2012/08/30                 2.11                danda211.sas Example :  You have an RBD split-plot design, so typical SAS code for mixed model ANOVA is proc mixed data=one;   class block treat week;   m

UTF character data, encoding of text

Objective and Background :  You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding.  If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª­?Ÿƒeee, then you are running into this encoding issue.  Computers store text using numbers, with each number assigned to a particular character.  See  https://en.wikipedia.org/wiki/ASCII  to find that the character & is stored as 38 when using the ASCII encoding.  Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ).  In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 :  Use options to request that individual datasets be read and written in a particular encoding.  In SAS, specify encoding options on the vario

Obtain coefficients for orthogonal polynomial contrasts (SAS and R)

Objective : We are comparing means using ANOVA, and our treatment levels are amounts of something.  Thus regression hypotheses may shed light on how the treatments differ, for example is there an overall linear trend for the response variable to increase or decrease with treatment level.  This is addressed by adding orthogonal polynomial contrasts to our ANOVA, which may require that we add contrast coefficients. Example :  Treatments are amounts of corn in the diet, specifically 62%, 65%, 68%, 71% and 74%. SAS :  IML product has an orthogonal polynomial calculator.  Additional code here attempts to make the coefficients whole numbers by dividing by the smallest non-zero number.  Note IML may not be available, depending on your license. proc iml; trtlevels={0.62, 0.65,0.68,0.71,0.74}; **this is only user input; ntrt=nrow(trtlevels); coeff=orpol(trtlevels); coeff = coeff[,2:ntrt]; div=abs(coeff); zerloc=loc(div<1e-14); if nrow(zerloc)>0 then div[zer