Skip to main content

Obtain coefficients for orthogonal polynomial contrasts (SAS and R)

Objective: We are comparing means using ANOVA, and our treatment levels are amounts of something.  Thus regression hypotheses may shed light on how the treatments differ, for example is there an overall linear trend for the response variable to increase or decrease with treatment level.  This is addressed by adding orthogonal polynomial contrasts to our ANOVA, which may require that we add contrast coefficients.

Example:  Treatments are amounts of corn in the diet, specifically 62%, 65%, 68%, 71% and 74%.

SAS:  IML product has an orthogonal polynomial calculator.  Additional code here attempts to make the coefficients whole numbers by dividing by the smallest non-zero number.  Note IML may not be available, depending on your license.

 proc iml;  
  trtlevels={0.62, 0.65,0.68,0.71,0.74}; **this is only user input;  
  ntrt=nrow(trtlevels);  
  coeff=orpol(trtlevels);  
  coeff = coeff[,2:ntrt];  
  div=abs(coeff); zerloc=loc(div<1e-14);   
  if nrow(zerloc)>0 then div[zerloc]=1e10; div=div[><,];  
  coeff=round(t(coeff/div),1e-9);  
  clabels={"linear","quadratic","cubic","quartic","quintic","6th","7th","8th", "9th","10th","11th","12th"};  
  ulab=clabels[1:ntrt];  
  print coeff [rowname=ulab format=best14.9 ];  
 quit;  


Danda.sas:  If you use this collection of SAS macros, then run
%include 'd:\danda.sas';
%orthpoly(0.62 0.65 0.68 0.71 0.74);
Output is shown here.  Advantages of the macro approach are the output is formatted into contrast statements, so can be copied directly into SAS ANOVA procedures (you will have to change the generic "Treat" to match your variable name).  And there are two checks at the bottom to test if the coefficients produce valid contrasts (coefficients sum to zero) which are orthogonal to each other.  Otherwise essentially the same SAS code above is inside the macro, so IML must be available.
The SAS System 
Orthogonal Polynomial Coefficients 

Contrast 'Linear' Treat 
  -2 -1 0 1 2 ; 

Contrast 'Quadratic' Treat 
  2 -1 -2 -1 2 ; 

Contrast 'Cubic' Treat 
  -1 2 0 -2 1 ; 

Contrast 'Quartic' Treat 
  1 -4 6 -4 1 ; 

Checks that contrast coefficients sum to zero 
0 
0 
0 
0 

This matrix should have zeros except on diagonal, if contrasts are orthogonal 
10 0 0 0 
0 14 0 0 
0 0 10 0 
0 0 0 70 
R:  Code essentially identical to SAS above is
x=c(0.62,0.65,0.68,0.71,0.74) ###user enters up to 12 treatments, no other changes degree= length(x)-1 label=c("linear","quadratic","cubic","quartic","quintic","6th","7th","8th", "9th","10th","11th","12th") uselabel=label[1:degree] coeff=matrix(t(poly(x,degree=degree)),nrow=degree,dimnames=list(uselabel,NULL)) ac=abs(coeff) print(round(coeff,digits=9),digits=9) zeroloc=(ac < 1e-14) ac[zeroloc]=1e20 div=apply(ac,1, min) div=matrix(1,degree, degree+1)*div print(round(coeff,digits=9),digits=9)

Notes:

  • Relative spacing of the treatments is all that matters, we could have entered the treatment levels as (62, 65, ...), (0, 3, 6...), or even (0, 1, 2...).
  • Unequal spacing is allowed, coefficients will adjust correctly.  However the values will generally not be whole numbers, be sure to retain about 8 decimal places in order to  accuracy.
  • Some ANOVA programs allow the user to just specify an option like contrasts=orthpoly, in which case the coefficients are created for you.  No need for this post.

Comments

Popular posts from this blog

Estimation of the Peak in Quadratic Regression

 Problem:   You are running a standard quadratic (polynomial) regression analysis, and are specifically interested in the X and Y values at the peak.  If you use standard regression software, typically there will be no option that allows the peak to be estimated, with standard errors. Example:   You are studying Growth as a function of Age.  Of particular interest is when maximum Growth occurs, and at what Age. SAS code to generate artificial data, and run the analysis is: data one; do Age=1 to 20; Growth=95 + 2.7*Age - .3*Age*Age + 5*rannor(22); end; proc nlin plots=fit; parms int=2 lin=1 quad=1; model Growth = int + lin*Age + quad*Age*Age; estimate 'Age at peak' -lin/(2*quad); estimate 'Growth at peak' int + lin*(-lin/(2*quad)) + quad*(-lin/(2*quad))*(-lin/(2*quad)); run; The standard quadratic regression model with intercept, linear and quadratic slopes, is coded into Proc NLIN which has the ability to estimate any fun...

Factorial ANOVA with control treatment not integrated into the factorial

Factorial treatment designs are popular, due to advantages of research on multiple treatment factors and how they interact.  But if the design includes a control treatment that is not part of the factorial, problems occur in estimation of least squares means.  A typical example is shown here, with 2 fertilizer and 3 irrigation treatments, giving 6 factorial treatment combinations, plus a control that is defined by a 3rd level of fertilizer, and a 4th level of irrigation: Fert1:Irrig2               Fert2:Irrig1             Fert1:Irrig1 Fert2:Irrig3               Fert1:Irrig3             Fert2:Irrig2           Control Other situations might have the control sharing a level of one of the factors, for example the control might be defined as Fert2:Irrig4.  But this still causes problems with estim...

UTF character data, encoding of text

Objective and Background :  You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding.  If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª­?Ÿƒeee, then you are running into this encoding issue.  Computers store text using numbers, with each number assigned to a particular character.  See  https://en.wikipedia.org/wiki/ASCII  to find that the character & is stored as 38 when using the ASCII encoding.  Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ).  In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 :  Use options to request that individual datasets be read and written in a particular encodin...