From metabarcoding to metaphylogeography: separating the wheat from the chaff

Turon, Xavier Antich, AdriàPalacín, CruzPræbel, KimWangensteen, Owen S.  Ecological Applications : doi:10.1002/eap.2036 (2019) DIGITAL CSIC

Metabarcoding is by now a well‐established method for biodiversity assessment in terrestrial, freshwater and marine environments. Metabarcoding data sets are usually used for α‐ and β‐diversity estimates, that is, interspecies (or inter‐MOTU) patterns. However, the use of hypervariable metabarcoding markers may provide an enormous amount of intraspecies (intra‐MOTU) information ‐ mostly untapped so far. The use of cytochrome oxidase (COI) amplicons is gaining momentum in metabarcoding studies targeting eukaryote richness. COI has been for a long time the marker of choice in population genetics and phylogeographic studies. Therefore, COI metabarcoding data sets may be used to study intraspecies patterns and phylogeographic features for hundreds of species simultaneously, opening a new field which we suggest to name metaphylogeography. The main challenge for the implementation of this approach is the separation of erroneous sequences from true intra‐MOTU variation. Here, we develop a cleaning protocol based on changes in entropy of the different codon positions of the COI sequence, together with co‐occurrence patterns of sequences. Using a data set of community DNA from several benthic littoral communities in the Mediterranean and Atlantic seas, we first tested by simulation on a subset of sequences a two‐step cleaning approach consisting of a denoising step followed by a minimal abundance filtering. The procedure was then applied to the whole data set. We obtained a total of 563 MOTUs that were usable for phylogeographic inference. We used semiquantitative rank data instead of read abundances to perform AMOVAs and haplotype networks. Genetic variability was mainly concentrated within samples, but with an important between‐seas component as well. There were inter‐group differences in the amount of variability between and within communities in each sea. For two species the results could be compared with traditional Sanger sequence data available for the same zones, giving similar patterns. Our study shows that metabarcoding data can be used to infer intra‐ and interpopulation genetic variability of many species at a time, providing a new method with great potential for basic biogeography, connectivity and dispersal studies, and for the more applied fields of conservation genetics, invasion genetics, and design of protected areas.