Universal animal DNA barcodes and mini-barcodes

For 16S, COI and cyt-b the paper used two targets, a long barcode and a shorter mini-barcode. The same names have been used in the run.sh script provided, the output of which is referred to below.

16S - long marker

The 16S primer set output is disappointing at the default abundance threshold, with only a single unique sequence observed - I suspect the long product size is part of the issue, it must be at the upper limit for overlapping MiSeq read pairs?

$ grep -v "^#" summary/16S.tally.tsv | cut -f 1,179
16S/1f2b15d58f9f40b862486676809d4744_20189  CACCTCCAGCATTCCCAGTATTGGAGGCATTGCCTGCCCAGTGACAACTGTTTAACGGCCGCGGTATCCTGACCGTGCAAAGGTAGCATAATCATTTGTTCTCTAAATAAGGACTTGTATGAATGGCCGCACGAGGGTTTTACTGTCTCTTACTTCCAATCAGTGAAATTGACCTTCCCGTGAAGAGGCGGGAATGCACAAATAAGACGAGAAGACCCTATGGAGCTTTAACTAACCAACCCAAAGAGAATAGATTTAACCATTAAGGAATAACAACAATCTCCATGAGTTGGTAGTTTCGGTTGGGGTGACCTCGGAGAATAAAAAATCCTCCGAGCGATTTTAAAGACTAGACCCACAAGTCAAATCACTCTATCGCTCATTGATCCAAAAACTTGATCAACGGAACAAGTTACCCTAGGGATAACAGCGCAATCCTATTCAAGAGTCCATATCGACAATAGGGTTTACGACCTCGATGTTGGATCAGGACATCCTGATGGTGCAACCGCTATCAAAGGTTCGTTTGTTCAACGATTAAAGTCCT

This perfectly matches Bos taurus and was found in most but not all of the samples expected - perhaps the default abundance threshold is too high?

Mini-16S - short marker

The output from the Mini-16S marker is far more diverse, with 84 unique sequences:

$ grep -c -v "^#" summary/Mini-16S.tally.tsv
84

The most common is again a perfect match to Bos taurus, which this time has no false negatives (but two false positives?).

We have all the expected Sus scrofa matches, and some of Gallus gallus and Anguilla anguilla expected in six samples. Crocodylus niloticus is also found but at far lower levels than expected.

We do see Homo sapiens, but happily only in the traditional medicine samples (multiple replicates within S3 and S8). Within those samples, the laboratory 16 replicates S3_Lab_16 and S8_Lab_16 also had Rattus tanezumi and Rattus norvegicus too, respectively.

Overall, again perhaps the default abundance threshold is too high?

COI - long marker

Assuming I understood the paper correctly, this used a pool of four left primers and four right primers. That is not easily handled with THAPBI PICT at the time of writing.

Mini-COI - short marker

The output from the Mini-COI marker is quite diverse, with 22 unique sequences:

$ grep -c -v "^#" summary/Mini-COI.tally.tsv
22

The species matches are all reasonable, it detects all the Pieris brassicae, most of the Bos taurus, Pleuronectes platessa, Sus scrofa, many of the Huso dauricus and Gallus gallus.

We have unexpected Acipenser schrenckii, which was also found in the paper and explained due to sample preparation.

There are also plenty of unclassified sequences from the traditional medicine samples, based on an NCBI BLAST search many are likely from undescribed fungi.

cyt-b - long marker

This gave no sequences at the default abundance threshold, nor at 50. Dropping to 10 we get a modest number of hits - the only perfect match was unfortunately to plants in the Asteraceae family.

Mini-cyt-b - short marker

The output from the Mini-COI marker had only 17 unique sequences:

$ grep -c -v "^#" summary/Mini-cyt-b.tally.tsv
17

This found all the expected Sus scrofa and Meleagris gallopavo, and most Bos taurus, Crocodylus niloticus, Huso dauricus and some of the Anguilla anguilla.

As above, we have explained false matches for Acipenser schrenckii, and again Homo sapiens in the traditional medicine but also in EM_8.