2.1.13

ESTIMATION OF THE APPARENT INFECTION RATE USING LOGISTIC REGRESSION

JE YUEN^{1}, AM DJURLE^{2}

^{1}Unit of Applied Plant Protection and ^{2}Unit of Plant Pathology 1, Swedish University of Agricultural Sciences, SE 750 07 Uppsala, Sweden

**Background and objectives**

The apparent infection rate (*r*) is one of the most widely used descriptors of a disease epidemic. It models the instantaneous rate of disease increase as the product of the current amount of disease, the rate constant *r* and a correction factor which is equal to 1.0 (or the maximum amount of disease) minus the current amount of disease. The classic methods for estimation of *r* rely on logistic transformation using the logit of p (which is equal to the logarithm of pI(1-p), if p is the proportion disease). The resulting variable shows a linear increase over time, with the slope equal to *r*. The resulting transformed data can either be graphed and *r* estimated visually or linear regression can be used.

This method works as long as the dependent variable (the amount of disease) is greater than 0 and less than 1.0, since the logit of 0 or 1.0 is not defined. We propose another method to estimate *r* directly from observed incidence data, where the incidence is the number of diseased plants, divided by the number of sampled plants.

**Materials and methods**

We formulate the model as a generalized linear model (GLM), with a logit link and binomial errors (referred to here as logistic regression) [1]. The number of plants used to determine disease incidence (N) is included in the model specification and the number of diseased plants can take any value from 0 up to and including N. If time is the only independent variable, then the regression coefficient will be an estimate of *r*.

The method was tested by generating data sets with known values for *r* and N. Simulated incidence data was generated by sampling from the binomial distribution using the true disease incidence as the binomial parameter. The resulting incidence data was then analysed using either a least-squares regression on the logit transformed data, or using logistic regression. An historical data set was also analysed using both methods.

**Results and conclusions**

More disease observations were used in logistic regression, since the method accommodates disease incidences of 0 and 1, as well as intermediate levels. When large samples were used to determine disease incidence, the results obtained with the logistic regression were generally comparable with least-squares regression of the transformed data. Logistic regression performed better when smaller samples were used to determine disease incidence, or when many values of disease incidence were either 0 or 1.

**References**

1. McCullagh P, Nelder, JA. 1993. Generalized Linear Models, 2nd Edition.