Abstract
A biological community usually has a large number of species with relatively small abundances. When a random sample of individuals is selected and each individual is classified according to species identity, some rare species may not be discovered. This paper is concerned with the estimation of Shannon’s index of diversity when the number of species and the species abundances are unknown. The traditional estimator that ignores the missing species underestimates when there is a non-negligible number of unseen species. We provide a different approach based on unequal probability sampling theory because species have different probabilities of being discovered in the sample. No parametric forms are assumed for the species abundances. The proposed estimation procedure combines the Horvitz–Thompson (1952) adjustment for missing species and the concept of sample coverage, which is used to properly estimate the relative abundances of species discovered in the sample. Simulation results show that the proposed estimator works well under various abundance models even when a relatively large fraction of the species is missing. Three real data sets, two from biology and the other one from numismatics, are given for illustration.
Similar content being viewed by others
References
Ashbridge, J. and Goudie, I.B.J. (2000) Coverage-adjusted estimators for mark-recapture in heterogeneous populations. Communications in Statistics-Simulation, 29, 1215–37.
Basharin, G.P. (1959) On a statistical estimate for the entropy of a sequence of independent random variables. Theory of Probability and Its Applications, 4, 333–6.
Batten, L.A. (1976) Bird communities of some Killarney woodlands. Proceedings of the Royal Irish Academy, 76, 285–313.
Bunge, J. and Fitzpatrick, M. (1993) Estimating the number of species: a review. Journal of the American Statistical Association, 88, 364–73.
Bunge, J., Fitzpatrick, M., and Handley, J. (1995) Comparison of three estimators of the number of species. Journal of Applied Statistics, 22, 45–59.
Chao, A. and Lee, S.-M. (1992) Estimating the number of classes via sample coverage. Journal of the American Statistical Association, 87, 210–17.
Chao, A., Hwang, W.-H., Chen, Y.-C., and Kuo, C.-Y. (2000) Estimating the number of shared species in two communities. Statistica Sinica, 10, 227–46.
Chao, A., Ma, M.-C., and Yang, M.C.K. (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika, 80, 193–201.
Colwell, R.K. and Coddington, J.A. (1994) Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society, London B, 345, 101–18.
Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap, Chapman and Hall, New York.
Engen, S. (1978) Stochastic Abundance Models, Halsted Press, New York.
Esty, W. (1986) The efficiency of Good's nonparametric coverage estimator. The Annals of Statistics, 14, 1257–60.
Good, I.J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–64.
Haas, P. and Stokes, L. (1998) Estimating the number of classes in a finite population. Journal of the American Statistical Association, 93, 1475–87.
Holst, L. (1981) Some asymptotic results for incomplete multinomial or Poisson samples. Scandinavian Journal of Statistics, 8, 243–6.
Horvitz, D.G. and Thompson, D.J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–85.
Hutcheson, K. and Shenton, L.R. (1974) Some moments of an estimate of Shannon's measure of information. Communications in Statistics, 3, 89–94.
Janzen, D.H. (1973a) Sweep samples of tropical foliage insects: description of study sites, with data on species abundances and size distributions. Ecology, 54, 659–86.
Janzen, D.H. (1973b) Sweep samples of tropical foliage insects: effects of seasons, vegetation types, elevation, time of day, and insularity. Ecology, 54, 687–708.
MacArthur, R.H. (1957) On the relative abundances of bird species. Proceedings of National Academy of Science, U.S.A., 43, 193–295.
Magurran, A.E. (1988) Ecological Diversity and Its Measurement, Princeton, Princeton University Press, New Jersey.
Mandelbrot, B. (1977) Fractals, Form, Chance and Dimension, Freeman, San Francisco.
Norris III, J.L. and Pollock, K.H. (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environmental and Ecological Statistics, 5, 391–402.
Peet, R.K. (1974) The measurement of species diversity. Annual Review of Ecology and Systematics, 5, 285–307.
Pielou, E.C. (1975) Ecological Diversity, Wiley, New York.
Smith, W. and Grassle, J.F. (1977) Sampling properties of a family of diversity measures. Biometrics, 33, 283–92.
Solow, A.R. (1993) A simple test for change in community structure. Journal of Animal Ecology, 62, 191–3.
Thompson, S.K. (1992) Sampling, Wiley, New York.
Zahl, S. (1977) Jackknifing an index of diversity. Ecology, 58, 907–13.
Zipf, G.K. (1965) Human Behavior and Principle of Least Effort, Addison-Wesley, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chao, A., Shen, TJ. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics 10, 429–443 (2003). https://doi.org/10.1023/A:1026096204727
Issue Date:
DOI: https://doi.org/10.1023/A:1026096204727