Research Statistics Central

Posts

Factorial ANOVA with control treatment not integrated into the factorial

Factorial treatment designs are popular, due to advantages of research on multiple treatment factors and how they interact. But if the design includes a control treatment that is not part of the factorial, problems occur in estimation of least squares means. A typical example is shown here, with 2 fertilizer and 3 irrigation treatments, giving 6 factorial treatment combinations, plus a control that is defined by a 3rd level of fertilizer, and a 4th level of irrigation: Fert1:Irrig2 Fert2:Irrig1 Fert1:Irrig1 Fert2:Irrig3 Fert1:Irrig3 Fert2:Irrig2 Control Other situations might have the control sharing a level of one of the factors, for example the control might be defined as Fert2:Irrig4. But this still causes problems with estim...

Nonlinear dummy regression

Objective : We are fitting nonlinear regression lines to data, but have multiple groups (treatments), each with its own line. Since groups are a factor of interest, particularly in how they change the lines (parameters of the model), we want to compare parameter estimates among the groups. First approach is to fit the nonlinear model to each group separately, then compare the parameter estimates using t-tests. The code below generates a random example dataset, with 8 replicates for each of 5 treatments, all measured over 12 days. Then Proc Nlmixed is used to fit the model explaining change in prate with water changes over the days, "by treat", and parameter estimates are output to data ppp. This ppp dataset is processed to collect the estimates and standard errors, and t-tests are calculated for all 5*(5-1)/2 comparisons. Code will need to be customized for new data, including number and values of treatments, degrees of freedom, and parameter names...

Can I look at reported standard errors (SE) and decide if means differ?

No guarantees, but roughly if means differ by 3*SE then they are statistically significant. This is based on the Least Significant Difference, which is 2*sqrt(2)*SE. Often people use non-overlapping confidence intervals as a decision rule, but this is equivalent to 4*SE, which is a bit conservative. Things that make 3*SE fail: 1) Actually statistical differences depend on the standard error of difference, SED, not SE. Anything in the model that makes these differ will make the rule fail, such as covariates and blocking factors. 2) In general, mixed models with random effects will make the rule fail, because random variance is included in SE, but not in SED. But this will make 3*SE rule conservative, 3*SED will be even smaller. If 3*SE suggests a statistical difference, difference most likely exists. Also take a look at Error Bars paper .

UTF character data, encoding of text

Objective and Background : You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding. If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª?Ÿƒeee, then you are running into this encoding issue. Computers store text using numbers, with each number assigned to a particular character. See https://en.wikipedia.org/wiki/ASCII to find that the character & is stored as 38 when using the ASCII encoding. Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ). In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 : Use options to request that individual datasets be read and written in a particular encodin...

Reporting results from transformed analyses

Objective : Transformed data, for example log(y), is analyzed to correct normality or equal variance requirements. But we want to report means and standard errors in the original units. SAS example : data one; do treat=1 to 3; do rep=1 to 5; y=10 + treat+ exp(rannor(111)); logy=log(y); output; end;end; run; proc mixed plots=all; class treat; model y=treat; lsmeans treat/pdiff; run; proc mixed plots=all; class treat; model logy=treat; lsmeans treat/pdiff; run; The original data, variable y, might have units of pounds. If a transformation is needed, we simply calculate a new variable by applying a mathematical function known to improve normality or equal variance, and run the same analysis on the new variable. Commonly used choices are listed in the second table below. However, looking at the results for both analyses we see treat Mean Y S...

Getting higher quality default graphs in SAS

Objective : I am running a statistical analysis in SAS, and the default ODS graphics look good, but I need them to be publication quality. SAS can automatically create some nice graphs, and has greatly increased the availability of graphs within procedures. If you like what you see, you might copy graphs directly from the SAS output window, or possibly you save graphs and output to a pdf or other external file format. But this output will be low quality, generally 75 dpi. Instead, add the following statements to write graphics directly to files, allowing control of format and quality. ods graphics on / width=7in imagefmt=tiff imagemap=off imagename="MyPlot" border=off; ods listing file="Body.rtf" style=journal gpath="." dpi=600; Once these statements have been submitted, all graphs created by subsequent procedures will be written to files na...