Source: UNIVERSITY OF ARKANSAS submitted to NRP
SMALL SAMPLE INFERENCE IN GENERALIZED LINEAR MIXED MODELS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0223569
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2010
Project End Date
Sep 30, 2015
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIVERSITY OF ARKANSAS
(N/A)
FAYETTEVILLE,AR 72703
Performing Department
Agricultural Statistics Lab
Non Technical Summary
Generalized linear mixed models represent a unified statistical theory for dealing with the analysis of variance of data involving discrete, categorical, or continuous non-normally distributed repsonse variables from the complete range of experimental and observational designs. Inference is based on the assumption of sufficiently large sample sizes. Large sample sizes are often not available in agricultural experimentation. Guidelines are needed that will allow scientists to know when the assumptions are at least approximately satisfied so that their conclusions are statistically valid. This project will develop such guidelines for a set of commonly used experimental designs and probability distributions.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
90173102090100%
Goals / Objectives
The objective of the project is to study the properties of likelihood based inference in generalized linear mixed models for small sample sizes commonly found in many agricultural applications. This includes the statistical properties of the inference procedures and the properties of the numerical algorithms on which they depend. Expected output would be guidelines on sample sizes required for large sample inference properties to be approximately attained.
Project Methods
A series of simulation studies will be conducted to study the issues involved in small sample estimation and inference in generalized linear mixed models for several commonly used designs in agricultural applications. Designs to be considered would include randomized complete blocks, completely randomized two factor factorial mixed models, and simple split plots. One set of studies will examine optimization algorithm convergence issues under the correct model. Representative distributions for generating data would include the binomial, Poisson, beta and gamma distributions. For each design - distribution combination, the effects of sample size, parameter values, starting values for the algorithm, and convergence criterion on the convergence rate and number of iterations required for convergence will be studied. Samples for which convergence was not obtained will be examined to attempt to determine the reason for non-convergence. Another set of studies using the samples generated for the convergence studies will address estimation and inference issues. For fixed effects parameters and covariance parameters, comparison of the estimators and their standard errors to the true values would be evaluated using average bias, confidence interval coverage, and confidence interval half width. In addition, the shape of the sampling distribution of each estimator would be compared to its theoretical asymptotic distribution. For standard analysis of variance null hypotheses, estimated type I error rates would be compared to nominal levels. For selected alternative hypotheses, p-values and power would be evaluated.

Progress 10/01/10 to 09/30/15

Outputs
Target Audience:Agricultural scientists; Experiment Station and USDA-ARS statisticians; general Statistics community Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The results have been reported to the Multi-state Project NCCC-170 participants as well as to a broader audience of statisticians and applied scientists at the Kansas State Conference on Applied Statistics in Agriculture on two occasions. The results have been used in my collaborative work with scientists and statistical consulting with students. How have the results been disseminated to communities of interest?The results have been disseminated through conference and meeting presentations, conference proceedings, and personal communications in collaborative and consulting projects. A manuscript for submission to a refereed journal describing the simulation results in the final draft stage prior to submission. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The simulation study of the two sample test for means of beta distributions found that the nominal type I error rates for common sample sizes up to 100 were greatly exceeded for the region of the parameter space in which the common mean and scale parameter are "small." The distributions of the estimator of both the common mean and the scale parameter were skewed to the right with skewness decreasing as a function of sample for estimator of the mean. The scale parameter was over-estimated for 50 to 70 percent of the samples depending on the sample size when the scale parameter was "small." These results are similar to those reported previously for the one sample problem.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Inference issues related to the two sample test for equality of means of beta distributions. Edward Gbur and Kevin Thompson. 2015 Annual Meeting of the Multi-state Project NCCC-170 "Research Advances in Agricultural Statistics" Small sample properties of the two independent sample test for means from beta distributions. Kevin Thompson and Edward Gbur. In Proceedings of the 2015 Kansas State University Conference on Applied Statistics in Agriculture. ed. W. Song. In press.


Progress 10/01/13 to 09/30/14

Outputs
Target Audience: Agricultural scientists, Experiment Station statisticians and the general Statistics community. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? A simulation study of the small sample properties of likelihood based inference for the beta distribution. Presentation at the annual meeting of Multistate Project NCCC-170 in Lincoln, NE in July 2014. What do you plan to do during the next reporting period to accomplish the goals? Plans to expand the simulation to a one fixed factor design have been completed. Efforts are focused on both type I error rates under the null hypothesis and power under various configurations in the alternative hypothesis. Initially a common value of the scale parameter phi will be assumed. A follow-up study will consider the effect of different scale parameters when using SAS’ PROC GLIMMIX.

Impacts
What was accomplished under these goals? A manuscript on the simulation results from the one sample beta distribution is being finalized for submission to a refereed journal for publication. Plans to expand the simulation to a one fixed factor design have been completed. Efforts are focused on both type I error rates under the null hypothesis and power under various configurations in the alternative hypothesis. Initially a common value of the scale parameter phi will be assumed. A follow-up study will consider the effect of different scale parameters when using SAS’ PROC GLIMMIX.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Thompson, K. and E.E. Gbur (2014). A simulation study of the small sample properties of likelihood based inference for the beta distribution. In Proceedings of the 2013 Conference on Applied Statistics in Agriculture. ed. Weixsing Song. Manhattan, KS: Department of Statistics, Kansas State University. pp 136-145.


Progress 01/01/13 to 09/30/13

Outputs
Target Audience: Agricultural scientists and Experiment Station statisticians. Changes/Problems: What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Through presentations at professional meetings. What do you plan to do during the next reporting period to accomplish the goals? The simulation results represent the first step in the study of small sample inference in generalized linear mixed models with a beta distributed response and logit link. The next step will be to consider a one factor generalized linear model.

Impacts
What was accomplished under these goals? Results from the simulation study of small sample inference in a single beta distributed population reported last year were presented as a poster at the 2013 Conference on Applied Statistics in Agriculture. This year the scope of the simulation was expanded to include 504 mu-phi combinations. Two thousand samples of sizes n = 5, 10, 15, 20, 50 and 100 were generated for a 7 x 12 array of mu and phi values. Mu values were 0.01, 0.05, 0.1 to 0.5 by 0.1 and phi values were 0.25, 0.5, 1, 1.5, 2, 3, 5, 10, 25 to 100 by 25. Each sample was fit using SAS PROC GLIMMIX with a logit link and both the Laplace and pseudo-likelihood methods. As in the original study, convergence problems for phi less than 5 increased as mu and/or became small. In those situations, for the samples that did converge, the confidence interval coverage for mu unexpectedly decreased as n increased. For fixed mu and phi, it was found that the average bias of mu-hat increased as n increased and the average confidence interval width for mu decreased as n increased. It was also found that negatively biased estimates of mu tended to have narrower confidence intervals and positively biased estimates tended to have wider confidence intervals. The role of phi-hat in the behavior of the confidence intervals for mu was explored. It appears that there may be a problem with the estimation of phi using the versions of the algorithms in GLIMMIX when phi and mu are small, regardless of the value of n. The results of the expanded simulation were presented at the annual meeting of the Multi-state Project NCCC-170 and were shared with the SAS personnel responsible for GLIMMIX. They provide users of GLIMMIX with valuable information on the behavior to be expected when analyzing data in these more complex models.

Publications

  • Type: Conference Papers and Presentations Status: Awaiting Publication Year Published: 2014 Citation: Thompson, K. and E.E. Gbur. A simulation study of the small sample properties of likelihood based inference for the beta distribution. In Proceedings of the 2013 Conference on Applied Statistics in Agriculture. ed. Weixsing Song. Manhattan, KS: Kansas State University. In press.


Progress 01/01/12 to 12/31/12

Outputs
OUTPUTS: A simulation study of small sample inference in a single beta distributed population was conducted. The distribution was parameterized in terms of its mean (mu) and scale parameter (phi), which is the parameterization used for generalized linear mixed models (GLMM) in SAS's GLIMMIX procedure. Two thousand samples of sizes n = 5, 10, 15, 20, 50 and 100 were generated for a 7 x 7 array of mu ranging from 0.0588 to 0.5 and phi from 0.5 to 10.0. Each sample was fit using GLIMMIX with a logit link and both the Laplace and pseudo-likelihood methods. Convergence problems arose when mu and/or phi were small regardless of the sample size n with as few as 72 percent of the fits converging and producing estimated standard errors for mu-hat under pseudo-likelihood and as few as 81 percent under Laplace. For samples for which standard error estimates were available to construct confidence intervals for mu, the sample coverage percentages for nominal 90, 95, and 98 percent confidence levels were significantly lower than the nominal levels for nearly all n, mu and phi combinations. For small mu and phi, the sample percent coverages tended to decrease as the sample size n increased. The reasons for this counterintuitive behavior is being investigated. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Agricultural scientists and Experiment Station statisticians. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
The book by Gbur et al. on generalized linear mixed models will provide agricultural scientists with practical information that will enable them to become familiar with modern, theoretically sound approaches to the statistical analysis of non-normally distributed data that are commonly collected in agricultural studies.

Publications

  • Gbur, E.E., W.W. Stroup, K.S. McCarter, S. Durham, L.J. Young, M. Christman, M. West and M. Kramer (2012). Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences. Madison, WI: American Society of Agronomy, Soil Science Society of America, Crop Science Society of America. 283 pp. (Correction list and selected data sets at www.uark.edu/misc/ncr170/)


Progress 01/01/11 to 12/31/11

Outputs
OUTPUTS: A survey of the published literature on the numerical algorithms used in the SAS procedure GLIMMIX has been conducted. The initial set of simulation studies for the gamma and beta distribution has been set up. SAS will be used to conduct the simulation. A book on the application of generalized linear mixed models to agricultural studies using GLIMMIX has been written and will be published jointly by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America in the early part of 2012. The book includes sections on the selection of algorithms, examples of power studies, and comments on small sample issues in the context of various data analyses. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Agricultural scientists and experiment station statisticians. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
The book on generalized linear mixed models will provide agricultural scientists with practical information that will enable them to become familiar with modern, theoretically sound approaches to the statistical analysis of non-normally distributed data that are commonly collected in agricultural studies.

Publications

  • Gbur, E.E., W.W. Stroup, K.S. McCarter, S. Durham, L.J. Young, M. Christman, M. West and M. Kramer (2012). Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences. Madison, WI: American Society of Agronomy, Soil Science Society of America, Crop Science Society of America. In press.