eDNA Data exploration

Welcome to ranacapa

Hi there! This app is designed to help explore biodiversity using results from environmental DNA (eDNA) analyses. Click the buttons above to navigate between visualizations and exploratory analyses. The app comes with a Demo dataset that you can use to get a sense of the types of information you can explore. Once you explore the app using the demo dataset, we encourage you to explore your own eDNA data by uploading a taxonomy and metadata table on the “Data Import” tab.

This app is maintained by Gaurav Kandlikar, in collaboration with the CALeDNA program. Please feel free to email us with any questions!

Please verify that the files below look as expected, and click on 'Run the app!' to get started!

Input taxonomy file

Input metadata file

Step 1- Explore sequencing depth

An important point to keep in mind when interpreting results from eDNA metabarcoding studies is that our estimate of species diversity in a community can depend a lot on how “deeply” we sequence the DNA extracted. If we extract DNA from a given sample and sequence 100 randomly selected DNA fragments from it, we might estimate that there are only 5 species in the community- but if we were to sequence 100000 randomly DNA fragments, we might find out that there is actually DNA from 50 species in the same sample.

In eDNA sequencing, such variation in how deeply a given sample is sequenced can happen for a variety for reasons- PCR amplification may have been more successful in one sample than an other, the sequencing machine may have worked less efficiently on certain samples than others, etc. This makes comparison between samples difficult- as in the example above, you might find more species in one sample than another simply because it has been sequenced more deeply than others.

One approach in this scenario is to ‘rarefy’ your samples by subsampling a defined number of sequences from each sample. You can choose a specific depth to rarefy to, or can choose to rarefy down to the minimum number of reads sequenced in any single sample (e.g. if you have 50000 reads in the least well-sequenced sample, all samples will be subsampled down to 50000 reads. Replicating this subsampling many times allows us to have better estimates of the diversity in the rarefied samples.

We note that there has been considerable discussion regarding the best way of dealing with unequal sampling. eDNA is an evolving field, and the scientific community has not reached consensus on this topic. We refer users to Weiss et al. 2017, Microbiome, and to McMurdie & Holmes 2014, PLoS Comp. Biol. We have not yet implemented alternative options to rarefying in ranacapa, but you are welcome to continue without rarefying the samples by selecting the “none” option on the left.

Unrarefied samples - taxon accumulation

Rarefied samples

Background on alpha diversity

In ecology, the term alpha diversity refers simply to the diversity observed in a single site (or a single sample). Although this may seem like a very simple concept, it turns out that there’s many ways to consider diversity. The most obvious way to measure alpha diversity is to count the number of species (or genera, or families, etc.) found in a sample. This is the metric calculated with the ‘Observed’ option on the left.

We may want to calculate diversity in a way that accounts for not only the number of species (or higher taxonomic groups) present in a sample, but also the relative abundance of each group. For example, consider, the following situation:

You are comparing two communities, each community has 300 individuals in total, representing 3 species. In the Community A, each species is represented by 100 individuals each, for a total of 300 individuals. Community B also has 300 individuals, but in this case, there are 290 individuals of Species 1, nine individuals of Species 2, and just one individual of Species 3. Clearly, Communities A and B have the same species richness, but the abundances are more evenly distributed among species in Community A.

This is the type of diversity captured by a metric known as Shannon diversity, which you can select on the left.

For more information on alpha diversity, check out the following resources:

Measurement of Biodiversity, from the World Register of Marine Species.
Introduction to Alpha Diversity from Dr. Dan Knights, University of Minnesota.
For a lot more information, the book Biological Diversity: Frontiers in Measurement and Assessment is a fantastic resource.

In addition to inspecting alpha diversity per sample, you can also view the alpha diversity summarized by one of the characteristics of your samples. To do so, please choose a characteristic from the dropdown list on the left.

Statistical tests of differences in diversity

A frequent question in ecology is whether different communities (or different types of communities) are statistically different from one another in terms of their diversity. For example, do soils from grasslands, forests, and shrubby communities all have equally diverse bacterial communities?

One basic way of asking this question is to perform an ANOVA test between the groups we are interested in. In its simplest form, an ANOVA asks whether the difference in diversity between groups is statistically significant. A statistically significant difference would be indicated by very small P-value in the table below. It’s important to keep in mind that two communities can have entirely different species assemblages and yet have the same diversity (e.g. Community A holds Species 1, 2, and 3; Community B holds Species 4, 5, and 6– so each community has a species richess of 3, and is thus these communities are not “different from” each other in this context.)

Notes:

There are many assumptions of an ANOVA, and the dataset you are investigating may break some (or even many) of these assumptions. We encourage you not to over-interpret the results below.
ANOVAs only work when each group is represented by a few samples- in other words, if your data set only has one sample from a grassland, one from a forest, and two from a shrubland, it is impossible to statistically tease apart the differences among these groups. You will see NAs in the table below if this is the case.

These are the results from an ANOVA comparing diversity among groups in the variable you chose:

Post-hoc tests

The ANOVA table above can only reveal whether there is a difference in diversity among sites- so even if we find that diversity varies among sites, there is no way of knowing which sites are different from each other. In other words – we might know from the ANOVA above that Sites A, B, and C are not all equally diverse, but we don’t know whether diversity in Site A is different from that in Site B, or whether Site A differes only from Site C, and so on.

One way to explore this further is via a post-hoc Tukey test, which compares each group against each other and adjusts for running multiple comparisons. Note that the following results are meaningful only if the ANOVA table above suggests that sites are in fact unequal in terms of their diversity. The results from the Tukey test are presented below:

Alpha Diversity Tukey Tests

Background on Beta Diversity

A second way to consider the diversity among samples is to investigate the extent to which samples differ in their species composition. For example, consider a pair of sites, each of which contains the same ten species- both are equally diverse (Observed richness of 10), and have the same 10 species. Now consider a second pair of sites, each of which contains ten species- but in this case, the ten species in the first site are entirely different than the ten species in the second. In other words, the two sites in the second pair are equally diverse, but quite dissimilar in terms of their species composition. Quantifying differences in species composition between sites is at the crux of Beta diversity.

Beta diversity is a complex topic and there are many ways to measure it. In fact, measuring and interpreting beta diversity is still an unresolved question in community ecology (e.g. Anderson et al. 2011, and new metrics for quantifying beta diversity continue to be developed (e.g. Ricotta 2017).

In this app we consider some simple ways of exploring beta diversity. All of these methods can be calculated using one of two measures of dissimilarity:

The Jaccard index, which incorporates differences in species presence or absence between sites, but not differences in species abundance, or
The Bray-Curtis index, which integrates information about species abundance.

We recommend using the Jaccard index, as eDNA-based abundance data might not always be reliable, but encourage you to explore both indices. Do you get the same results using both? If not, what might be driving the disparity?

In the PCoA plot below, samples are plotted such that points that are near each other on the plot are more similar in their taxonomic composition; samples that are distant in this plot have very different species lists.

PCoA plot

Clustering samples according to taxonomy

Another way to visualize similarity between plots is via a cluster analysis, which groups sites together according to their taxonomic composition. One way to do this is with Ward’s Hierarchical Clustering method, which we implement below. Sites with more similar taxonomic composition will cluster together in this analysis. Please note that this figure does not change according the selected variable.

Beta diversity analyses

The previous tab lets us visually inspect whether sites are similar or dissimilar, but it would be useful to verify the similarity or dissimilarity between groups samples using statistical tests. One way to do this is with a multivariate version of an ANOVA. This test lets us ask whether samples from within a single habitat (or within any other group of interest) have more similar compositions than samples from different habitat types. Here, we use a nonparametric version of a multivariate ANOVA test called a PERMANOVA.

We follow up this test with a subsequent test that compares the dissimilarity between particular factors– this is analogous to the “Post-hoc Tukey test” from the ANOVA page.

Multivariate ANOVA table

Multivariate ANOVA - Pairwise comparisons

Beta Diversity via Homogeneity of Variances

Beta diversity can also be thought of as the difference in heterogeneity within habitats (or other groups of samples). For example, consider a scenario in which ten samples are collected from Habitat 1. Each of the ten samples contains 50 species, but only 25 of those are shared across the ten samples. Ten samples are also collected from Habitat 2, and they too contain 50 species. But in this second case, 45 of the 50 species show up in all ten samples. Habitat 1 has greater variance in species composition than Habitat 2. To test this, we can test if there are differences in the species composition variances for two or more groups of samples. More details on this method are available in Anderson et al. 2006.

Multivariate homogeneity of groups dispersions

Multivariate homogeneity of groups dispersions - Post-hoc Tukey

Download the taxonomy tables for analyses with other pipelines

If you are interested in using other tools to explore and analyze your data, we encourage you to check out QIIME2 and the R package Phyloseq.

You can download your taxonomy-by-site matrix as a file that is easily converted into a BIOM file for analyses in QIIME2 by clicking on the button below:

Download BIOM-formatted taxonomy table

You can conver this file into to a BIOM file using the following command in the terminal (assuming you have the biom software installed; see here for installation instructions):

biom convert --to-hdf5 --table-type="OTU table" -i taxonomy-for-biom.txt -o taxonomy-as-biom.biom

Once converted, the BIOM file can be imported into QIIME2 using the steps outlined at this page.

Download taxonomy table for downstream analysis as a phyloseq object

You can also download your taxonomy table as a Phyloseq object for downstream analyses in R. You can import this phyloseq object into R using the following command:

phyloseq_obect <- readRDS("phyloseq-object.Rds")

Download phyloseq object