Skip to main content

Sample size to estimate a mean with given precision (SAS and R)

Objective:  We need to know how many observations to collect so our estimate of the mean has a useful precision.  For example, how many animals should be measured in order to have an 80% chance that the 95% confidence interval for weight will be no wider than 20 kg?  In addition to those 3 numbers, we also need an estimate of the std. deviation.  Suppose the best situation expected has SD=20kg, but we also want to see what changes if SD=40kg,

SAS:  Run this code
proc power;
   onesamplemeans ci=t
      alpha = 0.05
      halfwidth = 10
      stddev = 20 40
      probwidth = 0.80
      ntotal = .;
run;
data adjust;
 samplesize=22;
 population=1200;
 adjsamplesize=ceil(samplesize/(1 + ((samplesize-1)/population)));
run;
proc print; run;

The 95% confidence interval is requested by setting alpha=0.05.
The 80% chance that our experiment will satisfy our objectives is specified by probwidth=0.80.
The "no wider than 20" objective is addressed with halfwidth=10, assuming the 20 kg width referred to the entire confidence interval.
The two SD that we want to explore are listed after stddev=.
Ntotal is set to missing, as it is the unknown quantity to be calculated.

Results are 22 observations needed for SD=20, but 73 observations would be needed if SD turned out to be 40kg.
Following the proc power code, an adjustment for finite population size can be used if the population being sampled is small.  Here we might be dealing with a rare species, with only 1200 still alive.  Note that the adjustment results in no change.  Populations must be quite small in order to affect sample size requirements.

R:  There are several power and sample size packages in R, but none appear to have the feature of requesting that we be 80% sure the experiment will meet the objectives.
For example this code returns a sample size of 16, smaller than SAS's 22 because it ignores variation among experiments.  We are not 80% sure our particular experiment will succeed.
library(samplingbook)
sample.size.mean(10, S=20, N = Inf, level = 0.95)
sample.size.mean(10, S=20, N = 1200, level = 0.95)

Where do I get SD?  Values may be available in publications, or you may have preliminary data from which to calculate SD.  If not, then useful guesses are SD=0.2*mean to SD=0.4*mean, based on generally observed coefficients of variation for biological data.  The last resort is to take the range of expected observed values and divide by 4, based on 95% of normally distributed data being within plus or minus 2 SD.

Comments

Popular posts from this blog

DANDA - A macro collection for easier SAS statistical analysis

Objective :  You are running ANOVAs or regressions in SAS, and wish there was a way to avoid writing the dozens of commands needed to conduct the analysis and generate recommended diagnostics and summary of results, not to mention the hundreds of possible options that might be needed to access recommended methods.  A possible solution is to download a copy of danda.sas below, and use this macro collection to run the dozens of commands with one statement.  We will also have future posts covering various uses of danda.sas, giving examples as always. danda.sas is under continued development, check this page for updates. Date                       Version               Link 2021/03/15             2.12.030          danda.sas 2021/03/15       ...

Reporting results from transformed analyses

Objective :  Transformed data, for example log(y), is analyzed to correct normality or equal variance requirements.  But we want to report means and standard errors in the original units. SAS example : data one;  do treat=1 to 3;  do rep=1 to 5;    y=10 + treat+ exp(rannor(111));    logy=log(y);    output;  end;end; run; proc mixed plots=all;   class treat;   model y=treat;   lsmeans treat/pdiff; run; proc mixed plots=all;   class treat;   model logy=treat;   lsmeans treat/pdiff; run; The original data, variable y, might have units of pounds.  If a transformation is needed, we simply calculate a new variable by applying a mathematical function known to improve normality or equal variance, and run the same analysis on the new variable.  Commonly used choices are listed in the second table below. However, looking at the results for both analyses we see treat Mean Y S...

UTF character data, encoding of text

Objective and Background :  You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding.  If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª­?Ÿƒeee, then you are running into this encoding issue.  Computers store text using numbers, with each number assigned to a particular character.  See  https://en.wikipedia.org/wiki/ASCII  to find that the character & is stored as 38 when using the ASCII encoding.  Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ).  In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 :  Use options to request that individual datasets be read and written in a particular encodin...