Skip to main content

Nonlinear dummy regression

Objective:  We are fitting nonlinear regression lines to data, but have multiple groups (treatments), each with its own line.  Since groups are a factor of interest, particularly in how they change the lines (parameters of the model), we want to compare parameter estimates among the groups.

First approach is to fit the nonlinear model to each group separately, then compare the parameter estimates using t-tests.  The code below generates a random example dataset, with 8 replicates for each of 5 treatments, all measured over 12 days.  Then Proc Nlmixed is used to fit the model explaining change in prate with water changes over the days, "by treat", and parameter estimates are output to data ppp.  This ppp dataset is processed to collect the estimates and standard errors, and t-tests are calculated for all 5*(5-1)/2 comparisons.  Code will need to be customized for new data, including number and values of treatments, degrees of freedom, and parameter names.  Proc NLIN could be used, but Nlmixed allows addition of random effects to the model if needed.  Note that df= must be specified for Proc Nlmixed, otherwise it simply uses the number of observations, not subtracting off the number of parameters estimated.


data one; a=15; b=.05; water0=0.11; drop a b water0; do treat=1 to 5; do day=1 to 12; do rep=1 to 8; water=(28-2*day-treat +ranuni(33))/30; prate=a/(1+exp(-(water-water0)/b)) + 0.2*rannor(22); output; end;end;end; run; proc sgplot data=one; scatter y=prate x=day/group=treat; run; ***Method 1, run separate analyses and calculate t-tests; %let ngrp=5; ***set the number of treatments; proc nlmixed data=one df=92; by treat; parms aa=15,bb=.05,water0=.2; model prate~normal(aa/(1+exp( -(water-water0)/bb)),sigma); ods output ParameterEstimates=ppp; run; data dottest; set ppp end=atlast; retain aa1-aa&ngrp bb1-bb&ngrp ww1-ww&ngrp aase1-aase&ngrp bbse1-bbse&ngrp
wwse1-wwse&ngrp; array aa aa1-aa&ngrp; array aase aase1-aase&ngrp; array bb bb1-bb&ngrp; array bbse bbse1-bbse&ngrp; array ww ww1-ww&ngrp; array wwse wwse1-wwse&ngrp; ***set the values of your treatments ; array ttt $ tv1-tv&ngrp; tv1='1'; tv2='2'; tv3='3'; tv4='4'; tv5='5';
do ii=1 to &ngrp; if treat=ttt[ii] then do; if parameter='aa' then do; aa[ii]=estimate; aase[ii]=StandardError; end; if parameter='bb' then do; bb[ii]=estimate; bbse[ii]=StandardError; end; if parameter='water0' then do; ww[ii]=estimate; wwse[ii]=StandardError; end; end;end; if atlast then do; do ii=1 to &ngrp-1; do jj=2 to &ngrp; comparison=compbl(ttt[ii]||' vs '||ttt[jj]); aa_tvalue=(aa[ii]-aa[jj])/sqrt(aase[ii]**2 + aase[jj]**2); aa_pvalue=(1-probt(abs(aa_tvalue),92))*2; ***2-tailed pvalue. Need DF; bb_tvalue=(bb[ii]-bb[jj])/sqrt(bbse[ii]**2 + bbse[jj]**2); bb_pvalue=(1-probt(abs(bb_tvalue),92))*2; ww_tvalue=(ww[ii]-ww[jj])/sqrt(wwse[ii]**2 + wwse[jj]**2); ww_pvalue=(1-probt(abs(ww_tvalue),92))*2; output; end;end; end; run; proc print; var comparison aa_tvalue aa_pvalue bb_tvalue bb_pvalue ww_tvalue ww_pvalue; run;

Second approach is to fit all parameters within one model, then use estimate statements to perform the t-tests.  Advantages are this pools the data, increasing degrees of freedom and statistical power.  Implicit assumption is the error variance (sigma) is common to all groups.  If you do not want to make that assumption, then sigma1-sigma5 must be defined.  A difference, good or bad, between the two approaches is the first enforces independence among groups, while the second may estimate correlations between group parameter estimates, which could affect comparisons.  A disadvantage of this second approach is typing in all the estimate statements, only one comparison of the 30 performed by the first approach is shown (a1-a2).


proc nlmixed data=one df=464; parms aa1=15,aa2=15,aa3=15,aa4=15,aa5=15,bb1=.05,bb2=.05,bb3=.05,bb4=.05,bb5=.05, water01=.2,water02=.2,water03=.2,water04=.2,water05=.2,; if treat=1 then mean=aa1/(1+exp( -(water-water01)/bb1)); if treat=2 then mean=aa2/(1+exp( -(water-water02)/bb2)); if treat=3 then mean=aa3/(1+exp( -(water-water03)/bb3)); if treat=4 then mean=aa4/(1+exp( -(water-water04)/bb4)); if treat=5 then mean=aa5/(1+exp( -(water-water05)/bb5)); model prate~normal(mean,sigma); estimate 'aa1 vs aa2' aa1-aa2; run;
For the one example, results are almost identical.
You might also be interested in a SAS example that uses a full versus reduced modeling approach to compare parameters.  The basic idea is to fit two models, one with the same parameters for groups, one with different parameters, then compare the models using residual error, or likelihood, or AIC to decide which model fits the data better.  This approach would be good for deciding if a common sigma is appropriate, as assumed in the second approach.  But it would be very cumbersome for testing specific comparisons between pairs of parameters.
As always, when making multiple comparisons as above, a warning about adjusting the tests for multiplicity must be made.  Bonferroni or other adjustments are recommended in order to control the Type I error rate.

Comments

Popular posts from this blog

DANDA - A macro collection for easier SAS statistical analysis

Objective :  You are running ANOVAs or regressions in SAS, and wish there was a way to avoid writing the dozens of commands needed to conduct the analysis and generate recommended diagnostics and summary of results, not to mention the hundreds of possible options that might be needed to access recommended methods.  A possible solution is to download a copy of danda.sas below, and use this macro collection to run the dozens of commands with one statement.  We will also have future posts covering various uses of danda.sas, giving examples as always. danda.sas is under continued development, check this page for updates. Date                       Version               Link 2021/03/15             2.12.030          danda.sas 2021/03/15             2.12                UserManual.pdf     2012/08/30                 2.11                danda211.sas Example :  You have an RBD split-plot design, so typical SAS code for mixed model ANOVA is proc mixed data=one;   class block treat week;   m

UTF character data, encoding of text

Objective and Background :  You have text data that is UTF encoded and need SAS/R to read and write datasets with that encoding.  If you have ever printed or viewed text information, and seen something like Giuffr?Ÿ’e?ƒe?Ÿƒ?ÿ?›ƒ?ªƒ?›?Ÿ’e›ƒ?ª­?Ÿƒeee, then you are running into this encoding issue.  Computers store text using numbers, with each number assigned to a particular character.  See  https://en.wikipedia.org/wiki/ASCII  to find that the character & is stored as 38 when using the ASCII encoding.  Unicode is popular internationally because it encodes special characters such as accented letters, and UTF-8 is a widely used version ( https://en.wikipedia.org/wiki/UTF-8 ).  In UTF-8 the & character is stored as 26, and you can imagine how the jumbled example above arises from the confusion of what letters are being stored. Solution 1 :  Use options to request that individual datasets be read and written in a particular encoding.  In SAS, specify encoding options on the vario

Obtain coefficients for orthogonal polynomial contrasts (SAS and R)

Objective : We are comparing means using ANOVA, and our treatment levels are amounts of something.  Thus regression hypotheses may shed light on how the treatments differ, for example is there an overall linear trend for the response variable to increase or decrease with treatment level.  This is addressed by adding orthogonal polynomial contrasts to our ANOVA, which may require that we add contrast coefficients. Example :  Treatments are amounts of corn in the diet, specifically 62%, 65%, 68%, 71% and 74%. SAS :  IML product has an orthogonal polynomial calculator.  Additional code here attempts to make the coefficients whole numbers by dividing by the smallest non-zero number.  Note IML may not be available, depending on your license. proc iml; trtlevels={0.62, 0.65,0.68,0.71,0.74}; **this is only user input; ntrt=nrow(trtlevels); coeff=orpol(trtlevels); coeff = coeff[,2:ntrt]; div=abs(coeff); zerloc=loc(div<1e-14); if nrow(zerloc)>0 then div[zer