Assessing Uses of the MIXED Procedure of SAS Software

ASSESSING USES OF THE MIXED PROCEDURE OF SAS SOFTWARE

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

HATCH

Reporting Frequency

Annual

Accession No.

0197709

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Oct 1, 2003

Project End Date

Sep 30, 2007

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
UNIVERSITY OF ARKANSAS
(N/A)
FAYETTEVILLE,AR 72703

Performing Department
AGRICULTURAL STATISTICS LAB

Non Technical Summary
Linear mixed models are important in most areas of agricultural research. These model include random factors (1) that allow the researcher to increase the breadth of inference by conducting the same experiment in multiple environments, and (2) that are of interest in their own right, such as a population of varieties which are sampled for an experiment. The research will provide information that will assist the data analyst in choosing among alternatives when analyzing a mixed model.

Animal Health Component

90%

Research Effort Categories

Basic

10%

Applied

90%

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
901	7310	2090	100%

Knowledge Area
901 - Program and Project Design, and Statistics;

Subject Of Investigation
7310 - Experimental design and statistical methods;

Field Of Science
2090 - Statistics, econometrics, and biometrics;

Keywords

gene environment interaction

genotypes

monte carlo method

data analysis

Goals / Objectives
The research will carry out an in-depth study of the consequences options in the MIXED procedure of the SAS/STAT software. Some of these are the following: (1) To compare the degrees of freedom options as they affect the properties of statistical inferences. (2) To determine the probability of choosing the correct variance structure using the minimum AIC as the selection criterion. (3) To assess the consequences of estimation errors in the variance structure on the statistical properties of fixed effect estimators and to compare them to the properties reported in the MIXED output. An additional objective is to use MIXED to answer the sample size question for a particular mixed model: (4) To determine the number of environments needed to obtain reliable estimates of stability variances in genotype x environment studies.

Project Methods
The method for obtaining results will be Monte Carlo simulation. Data will be generated using the DATA step of SAS and the RANNOR function that generates independent values from the standard normal distribution. These will be transformed as needed to possess the desired properties of the random effects in the mixed model. An array of mixed models will be chosen to cover as many cases as possible and 10,000 data sets will be created for each model and each set of parameter values. Each group of 10,000 data sets will be used for all of the first three objectives. The fourth objective will be accomplished by making its mixed model one of those used for the first three objectives. Each data set will be analyzed by the MIXED procedure. Results of the analysis will be stored in a data set. The 10,000 sets of results will become the data set for analysis to achieve the four objectives. Parameter sets will also be chosen to add breadth to the conclusions from the analysis. The primary limitations on the number of parameter sets will be the time required to generate and analyze the data. It is anticipated that many more results than those referred to in the objectives will be obtained so that additional objectives may be identified and achieved by the same simulation. One feature of using MIXED is that not every data set results in convergence. This results in two consequences about the analysis of the results. Firstly, they are conditional on convergence. Secondly, the resulting sample size for this analysis will be smaller that 10,000. With 10,000 sets of results the standard error of a probability estimate would be no larger than 0.005 which gives an margin of error of 0.01 at 95% confidence. Thus losing results due to non-convergence will cause this margin of error to increase. The choice of 10,000 was made in hopes of still being able to obtain useful results. An additional reason for using 10,000 is to obtain good estimates of distributions of important statistics whose distributional properties will be investigated

Progress 10/01/03 to 09/30/07

Outputs
OUTPUTS: Stability analysis was carried out to compare variety means of traits and the relative stability of their responses in many environments. This statistical analysis makes use of mixed model methodology in which environments are random and varieties are fixed. The relative stability of a variety is modeled by a variance component for that variety's interaction effects in the environments. This analysis was applied to a data set of yields obtained in cotton variety tests in several states and to a data set including many traits of new apple varieties being evaluated in most of the apple growing areas in the U.S. and Canada. The use and interpretation of mixed models for analyzing variety stability and level of performance for traits of interest to plant breeders and crop producers were reported at the Beltwide Cotton conference and to the regional meetings of apple researchers. A key feature of these reports was the use of a special graphical display to represent the relationship of stability to mean performance for a set of varieties. PARTICIPANTS: R.W. McNew and Kevin C. Thompson TARGET AUDIENCES: The specific target audiences are researchers who conduct variety tests. However, the broader audience would include any researcher whose objective is to compare responses among a fixed set of factor levels across a wide array of environmental conditions. PROJECT MODIFICATIONS: None

Impacts
The use of stability analysis will provide the researchers that conduct the variety tests with information about both mean variety performance and the variety's stability. The researchers can communicate this to producers in order that they have more information for selecting appropriate varieties for their operations.

Publications

Miller, S., McNew, R., Crasweller, R., Greene, D., Hampson, C., Azarenko, A., Berkett, L., Cowgill, W., Garcia, E., Lindstrom, T., Stasiak, M., Cline, J., Fallahi, B., Fallahi, E., and Greene II, G. (2007) Fruit Quality Characteristics: Performance of Apple Cultivars in the 1999 NE-183 Regional Project Planting. Journal of the American Pomological Society, 61(2):97-114.
Hampson, C., McNew, R., Crasweller, R., Greene, D., Miller, S., Berkett, L., Garcia, M.E., Azarenko, A., Lindstrom, T., Stasiak, M., Cowgill, W., and Greene II, G. (2007) Fruit Sensory Characteristics: Performance of Apple Cultivars in the 1999 NE-183 Regional Project Planting. Journal of the American Pomological Society, 61(2):115-126.
Greene, D., Crasweller, R., Hampson, C., McNew, R., Miller, S., Azarenko, A., Barritt, B., Berkett, L., Brown, S., Clements, J., Cowgill, W., Cline, J., Embree, C., Fallahi, E., Fallahi, B., Garcia, E., Greene, G., Lindstrom, T., Merwin, I., Obermiller, J.D., Rosenberger, D., and Stasiak, M. (2007) Multidisciplinary Evaluation of New Apple Cultivars: the NE-183 Regional Project 1999 Planting. Journal of the American Pomological Society, 61(2):78-83.
Crasweller, R., McNew, R., Greene, D., Miller, S., Cline, J., Azarenko, A., Barritt, B., Berkett, L., Brown, S., Cowgill, W., Fallahi, E., Fallahi, B., Garcia, E., Hampson, C., Lindstrom, T., Merwin, I., Obermiller, J.D., Stasiak, M., and Greene II, G. (2007) Growth and Yield Characteristics: Performance of Apple Cultivars in the 1999 NE-183 Regional Project Planting. Journal of the American Pomological Society, 61(2):84-96.

Progress 01/01/06 to 12/31/06

Outputs
A typical practice in analyzing mixed linear models, for which different variance structures are plausible a priori, is to fit these structures and choosing the one which gives the best fit based on the AIC. There does not seem to be any evaluations of the likelihood that this practice will identify the correct structure from among the candidates. A Monte Carlo study was initiated to address this issue for some simple models. For this study, the Monte Carlo simulation was implemented using a SAS program that produced 10,000 data sets and used the MIXED procedure to fit several variance structures for each data set. From these fits, the probability that the correct model would have the best fit was estimated. The standard error of this estimate is usually much smaller than 0.01. An example of the results obtained thus far in this study is the following: We compared the fit of a homogeneous variance structure to a heterogeneous variance structure for a one-way classification with three classes and equal sample sizes per class. When the equal-variance structure was correct, the probability of it being chosen increased from 0.82 to 0.86 as sample size increased from 4 to 20. When the class variances were unequal, the same range of sample sizes resulted in probabilities of correct selection from 0.19 to 0.26. The implication from these preliminary results is that one should not feel strongly about the correctness of a chosen structure. Many more cases will be investigated in order to assess the generality of this implication.

Impacts
This will add to the data analyst's knowledge of the use of the MIXED procedure and will improve the quality of inferences made from the analyses of mixed models.

Publications

No publications reported this period

Progress 01/01/05 to 12/31/05

Outputs
The MIXED procedure of SAS statistical software is a valuable tool for analyzing mixed linear models. If a mixed model has large numbers of random factors and their levels, the requirements of the analysis can exceed available memory or require very long computing times. The latter can be an inconvenience for the analyst but the former prevents completion of the analysis. There is a technique in MIXED programming that can overcome this problem but may not be widely recognized. This technique involves using the SUBJECT effect in the RANDOM statement. When applicable, this technique will greatly reduce memory requirements and also computing time. In order for this technique to be successful, the SUBJECT effect must be a common factor of all or most of the random effects; these random effects are then redefined with the SUBJECT effect factored out of them. The consequence of this is that smaller matrices are needed in memory during the analysis. As an example, MIXED was used to analyze data from a split-plot study in which the random, main-plot factor with 5 levels was completely randomized to 15 main plots, the fixed, subplot factor had 35 levels, and there were from 1 to 10 subsamples from each subplot. Using the main plot factor as the SUBJECT effect reduced the computing time to 10% of the time from not using a SUBJECT effect.

Impacts
Recognition of the importance of the SUBJECT effect by mixed model analysts can lead to their being able to complete a mixed model analysis or to complete it more efficiently.

Publications

Crassweller, R., McNew, R., Azarenko, A., Barritt, B., Belding, R., Berkett, L., Brown, S., Clements, J., Cline, J., Cowgill, W. 2005. Performance of apple cultivars in the 1995 NE-183 regional project planting. I. Growth and yield characteristics. Journal of the American Pomological Society 59:18-27.
Miller, S., Hampson, C., McNew, R., Berkett, L., Brown, S., Clements, J., Crassweller, R., Garcia, E., Greene, D., Greene, G. 2005. Performance of apple cultivars in the 1995 NE-183 regional project planting: III. Fruit sensory characteristics. Journal of the American Pomological Society 59:28-43.
Miller, S.S., McNew, R. W., Barritt, B. H., Berkett, L., Brown, S. K., Cline, J. A., Clements, J. M., Cowgill, W. P., Crassweller, R. M., Garcia, M. E. 2005. Effect of cultivar and site on fruit quality as demonstrated by the NE-183 regional project on apple cultivars. Hort Technology 15:886-895.

Progress 01/01/04 to 12/30/04

Outputs
A variety test is a replicated field experiment for the purpose of comparing newly developed varieties of a crop. Yield is the primary trait of interest to producers who will use results of a variety test to choose a variety for planting. When the same variety test is conducted at multiple sites and in multiple years, the site-year combinations serve as environments and allow the evaluation of variety-environment interactions. Producers seek varieties that are stable over environments. One measure of the stability of a variety is the variance of its interaction effects across environments. A stable variety is one for which the variance of its interaction effects is zero. Yield data for a variety test in multiple environments can be analyzed by a mixed model in which variety is a fixed factor and environment is a random factor. The interaction effects are entered into the model as independent random effects with heterogeneous variances by variety; these are the stability variances of the varieties. This type of analysis was implemented for cotton lint yields using the MIXED procedure of SAS Statistical Software. For a data set with 30 environments, estimates of the stability variances had standard errors that were small enough to make the variance estimates useful in evaluating variety stability. On the other hand, when the analysis was conducted with only 15 of the environments, the standard errors were too large to give any credibility to the stability variance estimates. This data set will be used to create a model for use in a simulation study to investigate the effects of number of environments on the quality of inferences about the means and stability variances of the varieties.

Impacts
Using efficient experimental designs will provide better quality data for use by cotton producers who are evaluating yield stability of potential varieties.

Publications

No publications reported this period