Journal of Experimental Marine Biology and Ecology
Testing of null hypotheses in exploratory community analyses: similarity profiles and biota-environment linkage
Introduction
Professor John Gray was a strong advocate of the insights obtainable from exploratory studies of gradients and did much to demonstrate their efficacy in the contexts of monitoring for pollution and studying biodiversity (see Gray et al., 1988, Gray et al., 1990, Ellingsen and Gray, 2002, amongst many others). For multivariate community analyses, however, gradient studies (broadly characterisable as adopting a regression approach) have sometimes suffered in comparison to studies involving factorial designs (broadly speaking, an analysis of variance approach) by their perceived lack of hypothesis testing for structure elucidated only a posteriori. Such criticism is often justified: for example, a search through large numbers of environmental variables, for combinations which 'explain' the among-sample structure of a biotic assemblage, is almost guaranteed to find a combination with some apparent explanatory power, even where there is no real linkage. The process of searching through many solutions for the one that optimises some criterion inevitably involves strong selection bias. At the least, what is required here is a formal test of the null hypothesis that there is no link between the sample patterns of biota and environment, adjusting for this selection bias. If the null can be decisively rejected then there is some objective basis for interpreting the observed correlative links.
In similar vein, application of hierarchical cluster analysis to a set of a priori unstructured samples of assemblage data yields a dendrogram, whether agglomerative or divisive, in which ever finer distinctions are drawn between groups of samples, ultimately terminating in each sample placed in a different group. Given that a cluster analysis will produce a grouping from data consisting entirely of random numbers, and thus with no meaningful sample structure, the question naturally arises as to what objective basis there is for interpreting particular groups or subgroups displayed by the dendrogram. Again, statistical testing is needed, this time in the form of a series of null hypothesis tests that particular groups displayed in the dendrogram have no meaningful internal structure. Only if such a hypothesis can be rejected is it permissible to interpret a further subdivision of an existing group.
This paper describes such tests, for analysis of any similarity, distance or dissimilarity matrices (generically referred to as 'resemblance' measures, following Legendre and Legendre, 1998). It places these tests in the context of the non-parametric approach to analysing species-by-samples matrices described by Clarke (1993), which has been widely adopted in marine community ecology in particular, largely through availability of the PRIMER package (v6, Clarke and Warwick, 2001, Clarke and Gorley, 2006). A notable early step in the latter was Professor Gray's enthusiastic encouragement of development of these techniques through a series of workshops held under the auspices of the UNESCO/IOC Group of Experts on the Effects of Pollutants (Bayne et al., 1988) and the FAO/UNEP Mediterranean Pollution Programme. The core routines in this approach include non-metric MDS ordination of samples and ANOSIM tests of a priori factors defined on them, together with indirect gradient analyses linking biotic assemblage patterns to 'best' subsets of environmental variables, exhibiting matching sample structure (BEST routine). These routines are based on unconstrained choice of a resemblance matrix appropriate to the data type and question of interest, and the ANOSIM and BEST routines utilise only the rank values of the among-sample resemblances.
Within the existing framework, this paper adds, firstly, a 'global BEST test' which examines whether the highest rank correlation (ρ), obtainable between the biotic similarity matrix and the matching distance matrix from the optimal subset of environmental variables, exceeds values that would be expected by chance under the null hypothesis (of no real biota-environment link). Secondly, a 'similarity profile' (SIMPROF) test is described, in which the biotic similarities from a group of a priori unstructured samples are ordered from smallest to largest, plotted against their rank (the similarity profile), and this profile compared with that expected under a simple null hypothesis of no meaningful structure within that group. Repeated application of this test generates a stopping rule for a posteriori division of the samples into ever smaller subgroups, as in hierarchical cluster analysis. These two analytical strands converge in a third routine, a counterpart to the BEST procedure of matching environmental information to species patterns, which adapts the Multivariate Regression Trees of De'ath (2002) to the non-parametric framework in PRIMER. The LINKTREE procedure is a form of constrained cluster analysis involving a divisive partition of the biotic community samples into ever smaller groups, but in which each division has an 'explanation' in terms of a threshold on one of the environmental variables. As with agglomerative hierarchical clustering, such linkage trees also need stopping rules to avoid random sampling variation among samples from a single assemblage being interpreted as further sub-group structure. These are again provided by a series of similarity profile (SIMPROF) tests.
It should be borne in mind throughout that, though the above outline and the examples of this paper are couched in terms of tests on species assemblages and their relation to environmental variables, nothing in the formulation of the methods restricts their use to this context. The SIMPROF test will provide stopping rules for any a posteriori subdivision of a group of samples, based on multiple variables of taxa, physical environment, chemical water-quality, measures of diversity, biomarkers, distributions of particle sizes, etc. The global BEST test, rather than matching subsets of environmental variables to a fixed pattern of resemblances for whole communities, can be applied to testing whether subsets of species show a significantly matching pattern of samples to those of a fixed environmental gradient. Similarly, an optimal subset of biomarkers (or other metrics) can be tested for its match to an observed chemical gradient, or to manipulated levels of contaminants; patterns in subsets of one group of biotic variables can be tested for their ability to 'explain' patterns in another set (corals structuring assemblages of reef-fish, infaunal macrobenthos structuring meiobenthic communities etc); and many other 'linkage' problems could be formulated and tested in this way.
Section snippets
Global BEST test
The idea behind this test is outlined in the schematic diagram of Fig. 1. This shows, in normal typeface, the routine for linking of biota to environment (Bio-Env) described by Clarke and Ainsworth (1993). The data consist of two matrices (left-hand side), both referring to the same set of n samples (locations/times/treatments, or whatever context determines the sampling programme). For the biotic variables (top row), a triangular matrix of resemblances between samples is calculated for
Results and specific discussion
The analyses are not presented in the contexts and under the hypotheses of the original studies. It is not the purpose of this methodological paper to discuss and interpret the specific data in any detail. They are used merely to illustrate the techniques with realistic examples of possible outcomes. The associated discussion, similarly, focuses on general caveats and corollaries of the three methods and their inter-relationships.
Global BEST test
In an indirect way, the BEST routine is trying to solve the same problem for multivariate community data as standard multiple regression does for single response variables (though more direct multivariate analogies of multiple regression are given by the dbRDA and DISTLM procedures of Legendre and Anderson, 1999, and McArdle and Anderson, 2001, since these employ explicit linear models). In multiple linear regression, 'all subset' regressions and basic stepwise selection methods (Efroymson, 1960
Acknowledgments
We thank the referees and the guest editor (RMW) for their most helpful and positive comments. This work is a contribution to the biodiversity component of the Plymouth Marine Laboratory's core strategic research programme. It was supported by the UK Natural Environment Research Council (NERC) and the UK Department for Environment, Food and Rural Affairs (DEFRA) through the AMBLE project ME3109. KRC acknowledges his position as honorary fellow of the Plymouth Marine Laboratory and of the Marine
References (40)
- et al.
On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages
J. Exp. Mar. Biol. Ecol.
(2006) - et al.
Fish fauna of the Severn Estuary. Are there long-term changes in abundance and species composition and are the recruitment patterns of the main marine species correlated?
J. Exp. Mar. Biol. Ecol.
(2001) New look at statistical model identification
IEEE Trans. Autom. Contr.
(1974)- et al.
- et al.
Classification and Regression Trees
Non-parametric multivariate analyses of changes in community structure
Aust. J. Ecol.
(1993)- et al.
Statistical design and analysis for a ‘biological effects' study
Mar. Ecol. Prog. Ser.
(1988) - et al.
A method of linking multivariate community structure to environmental variables
Mar. Ecol. Prog. Ser.
(1993) - et al.
Quantifying structural redundancy in ecological communities
Oecologia
(1998)
Change in marine communities: an approach to statistical analysis and interpretation
An index showing breakdown of seriation, related to disturbance, in a coral-reef assemblage
Mar. Ecol. Prog. Ser.
Dispersion-based weighting of species counts in assemblage analyses
Mar. Ecol. Prog. Ser.
Zooplankton communities in the Bristol Channel and Severn Estuary
Mar. Ecol. Prog. Ser.
Regression, prediction and shrinkage
J. Roy. Statist. Soc. B
Multivariate regression trees: a new technique for modeling species environment relationships
Ecology
Applied Regression Analysis
Multiple regression analysis
Cited by (822)
Benthic foraminifera as bioindicators in an area influenced by a submarine outfall, North Coast of Bahia, Brazil
2024, Regional Studies in Marine ScienceSalinity and sedimentation rate influences on the community structure of polychaetes associated with two sympatric congeneric oyster species
2024, Marine Environmental ResearchOasis of the deep: Cold-water corals of the South China Sea
2024, Marine Environmental Research