Skip to main content

Posts

Showing posts from March, 2018

Clustering subjects based on multiple measurements (SAS)

Objective :  Individuals are measured for several characteristics, and we want to group the individuals based on similarities across all characteristics. Statistical options are cluster analysis, principle components (PCA), and biplots.  PCA has the advantage of combining correlated variables together, reducing the complexity of explaining why individuals cluster together.  And biplots adds some nice features to PCA.  This post compares these choices. Example :  10 farms measured for nutrients in grass, and the primary question is to see if/which farms have similar nutrient profiles. Create a random dataset with 4 nutrients measured on 4 pastures in each of 10 farms. Run the SAS code for producing biplots.  Here we restrict the number of PC to 2 (n=2), in general you would use PCA to decide how many components are needed.  Prinqual requires all variables to be processed by a Transform statement, here we use the identity transformation so the variables are unchanged.  Id statement s