Skip to main content

Reporting results from transformed analyses

Objective:  Transformed data, for example log(y), is analyzed to correct normality or equal variance requirements.  But we want to report means and standard errors in the original units.

SAS example:
data one;
 do treat=1 to 3;
 do rep=1 to 5;
   y=10 + treat+ exp(rannor(111));
   logy=log(y);
   output;
 end;end;
run;
proc mixed plots=all;
  class treat;
  model y=treat;
  lsmeans treat/pdiff;
run;
proc mixed plots=all;
  class treat;
  model logy=treat;
  lsmeans treat/pdiff;
run;

The original data, variable y, might have units of pounds.  If a transformation is needed, we simply calculate a new variable by applying a mathematical function known to improve normality or equal variance, and run the same analysis on the new variable.  Commonly used choices are listed in the second table below.
However, looking at the results for both analyses we see
treat Mean Y SE Y Mean logY SE logY BT Mean BT SE
1 12.55 0.771 2.52 0.054   12.43    0.67
The mean and standard error for logY are completely "wrong", do not match the data, because they have units of "log pounds".  Not very useful for scientific interpretation.  There are two common remedies:
1) report means and standard errors from the untransformed analysis, but use statistical test p-values from the transformed analysis (normality and equal variance are primarily needed to make p-values accurate).
2) back-transform the transformed results to give them the original units.  We need to calculate the BT values in the above table.
Both choices are in common use, both are statistically acceptable, but always clearly state which you used as it does make a difference in the results.
Back-transformation of the mean is fairly logical, we simply apply the opposite function.  For example exp(log(5)=5, so the exp function will "undo" the log transformation.  In the above table, BT Mean=exp(Mean logY)=exp(2.52)=12.43, reasonably close to the untransformed mean.  Back-transformed means from a log transformation will always be smaller than the original means, because they are correcting for the positive skew, giving a mean that is more like the median.
Back-transformation for standard errors is less logical, as the variance of a function of Y equals the variance of Y times the second derivative of the function evaluated at the mean of Y (see Wikipedia or other statistical theory sources).  The following table lists common transformations, their purpose, and back transformation formulas.  Using the table, we calculate the example BT SE as 0.054*12.43 = 0.67, again believably close to the untransformed SE.

Transform Formula Good for... BT Mean BT SE
Sqrt sqrt(Y + tv) Slight +Skew mean**2 - tv 2*SE*mean
Log ln( Y + tv) Strong +Skew exp(mean) - tv SE*(Btmean+tv)
Log10 log10(Y + tv) Strong +Skew 10**mean - tv SE*log(10)*(Btmean+tv)
Arcsinsqrt arcsine(sqrt(Y/tv)) Percentage Data tv*sine(mean)**2 SE*sqrt(1-Btmean/tv)*sqrt(Btmean/tv)*2*tv
Power Y**tv Anything else mean**(1/tv) SE*(1/abs(tv)) * mean**[(1-tv)/tv]
Rank rank(Y) Last resort NA NA
Table notes:  a) tv is a constant used in the first 4 transformations to avoid illegal mathematical operations, such as log or sqrt of a negative number. b) tv for Power transformation can be any number, but usually is between -3 and 3, larger values considered to produce too drastic transformations of the data.  Note that sqrt is a special case of power tv=0.5, and log transformation is approximately tv=0.25.  Negative values might work for negative skew, but there are no guarantees. c) For some BT SE's, the BT mean is calculated first, then used in the BT SE formula.  Otherwise the mean and SE from the transformed analysis are used. d) Log and Log10 have identical transformation properties, either can be used. e) Linear transformations of the form a*Y+b do not help normality or equal variance, but if used can be back-transformed by Btmean=(mean-b)/a, and BT SE=SE/a.  f)  Rank transformations can not be back-transformed, so report means and SE from the untransformed analysis.  Rank is used if no other transformation can be found that corrects normality or equal variance issues.

DANDA.sas:
If using the macro collection, simply choose transtype=[name from first column] and transvalue=tv from second column in the above table.  For the SAS example code above, running
%include 'd:\danda.sas';
%mmaov(one, y, class=treat, fixed=treat, transtype=log, transvalue=0);
will produce all untransformed, transformed, and back-transformed results. 




Comments

Popular posts from this blog

DANDA - A macro collection for easier SAS statistical analysis

Objective :  You are running ANOVAs or regressions in SAS, and wish there was a way to avoid writing the dozens of commands needed to conduct the analysis and generate recommended diagnostics and summary of results, not to mention the hundreds of possible options that might be needed to access recommended methods.  A possible solution is to download a copy of danda.sas below, and use this macro collection to run the dozens of commands with one statement.  We will also have future posts covering various uses of danda.sas, giving examples as always. danda.sas is under continued development, check this page for updates. Date                       Version               Link 2021/03/15             2.12.030          danda.sas 2021/03/15             2.12                UserManual.pdf     2012/08/30                 2.11                danda211.sas Example :  You have an RBD split-plot design, so typical SAS code for mixed model ANOVA is proc mixed data=one;   class block treat week;   m

UTF character data, encoding of text

Objective and Background :  You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding.  If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª­?Ÿƒeee, then you are running into this encoding issue.  Computers store text using numbers, with each number assigned to a particular character.  See  https://en.wikipedia.org/wiki/ASCII  to find that the character & is stored as 38 when using the ASCII encoding.  Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ).  In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 :  Use options to request that individual datasets be read and written in a particular encoding.  In SAS, specify encoding options on the vario

Obtain coefficients for orthogonal polynomial contrasts (SAS and R)

Objective : We are comparing means using ANOVA, and our treatment levels are amounts of something.  Thus regression hypotheses may shed light on how the treatments differ, for example is there an overall linear trend for the response variable to increase or decrease with treatment level.  This is addressed by adding orthogonal polynomial contrasts to our ANOVA, which may require that we add contrast coefficients. Example :  Treatments are amounts of corn in the diet, specifically 62%, 65%, 68%, 71% and 74%. SAS :  IML product has an orthogonal polynomial calculator.  Additional code here attempts to make the coefficients whole numbers by dividing by the smallest non-zero number.  Note IML may not be available, depending on your license. proc iml; trtlevels={0.62, 0.65,0.68,0.71,0.74}; **this is only user input; ntrt=nrow(trtlevels); coeff=orpol(trtlevels); coeff = coeff[,2:ntrt]; div=abs(coeff); zerloc=loc(div<1e-14); if nrow(zerloc)>0 then div[zer