Unexpected sequences
In the previous section, we highlighted several unexpected contaminants in the
negative controls which could not be explained as cross-contamination from the
mock community. Likewise the read reports show plenty of unassigned sequences,
things which did not match the very narrow databases built from ITS1.fasta
or ITS2.fasta
containing markers expected from the mock community only.
Some unexpected sequences might reflect additional alternative copies of ITS1 or ITS2 in the genomes. Others are likely external contamination - after all there are fungi practically everywhere. This seems to have happened on amplicon library one in the high PCR cycle negative control at least. Meanwhile, amplicon library two does not have any obvious external contamination.
Amplicon library one - ITS1 (BITS/B58S3)
From the first amplicon library for ITS1 we saw the following sequences in the negative controls (and by chance, not in any mock community samples) - shown here with their highest single sample abundance, which supports using a minimum abundance threshold higher than 10:
MD5 checksum |
Max |
Species |
|
32 |
Wallemia muriae |
|
10 |
Wallemia muriae; Wallemia sebi |
|
17 |
Trichosporon asahii |
|
14 |
Candida tropicalis |
Here are the reads from entries with a maximum sample abundance over 75
which the onebp
and in some cases blast
based classifier failed to
match, along with the most likely match from reviewing an online NCBI BLAST
search. You can easily extract these entries (and their sequences) from the
bottom of the summary/AL1_BITS_B58S3.reads.*.tsv
files:
MD5 checksum |
Max |
Species |
|
4562 |
Epicoccum nigrum; Epicoccum layuense |
|
575 |
glomeromycetes, perhaps Rhizophagus |
|
236 |
Ascochyta or Neoascochyta? |
|
149 |
Fusarium |
|
114 |
glomeromycetes, perhaps Rhizophagus |
|
109 |
Fusarium |
The sequence with the top abundance, 5ca0acd7dd9d76fdd32c61c13ca5c881
,
perfectly matches fungus Epicoccum nigrum and Epicoccum layuense. Present
at low levels in multiple samples, this was the dominant sequence in
SRR5314339
aka FMockE.HC1_S178
, which was a high PCR cycle number
replicate of the even mixture. Perhaps this was a stray fragment of
Epicoccum which by chance was amplified early in the PCR? This example was
not highlighted in the original paper, but is exactly the kind of thing you
should worry about with a high PCR cycle number.
Next ee5382b80607f0f052a3ad3c4e87d0ce
and the less abundant sequence
cae29429b90fc6539c440a140494aa25
looks like glomeromycetes, perhaps
Rhizophagus (from the mock community), but could be from a Glomus species.
Using the blast
classifier and the minimal curated reference set matches
this to Rhizophagus irregularis, but the situation would be ambiguous in a
more complete database.
Sequence 880007c5a18be69c3f444efd144fc450
has perfect matches to lots of
unclassified fungi, and conflicting perfect matches including Ascochyta or
Neoascochyta. This was seen only in the high PCR cycle number sample
SRR5314339
as above.
Next 8e74f38b058222c58943fc6211d277fe
and
85775735614d45d056ce5f1b67f8d2b2
have good BLAST matches to several
different Fusarium species, so could also be from the mock community.
You can find all six of these sequence on the edit-graph, most as isolated grey
nodes along the bottom except cae29429b90fc6539c440a140494aa25
which is 3bp
away from Rhizophagus irregularis and linked to it with a dashed line.
So some of the ITS1 sequences in amplicon library one are likely external contamination - particularly with the high PCR cycle negative control (which was likely included exactly because of this risk).
Amplicon library two - ITS1 (ITS1f/ITS2)
Using our blast
classifier with the 19 species database, everything was
assigned a match. The default onebp
classifier was stricter. For example
while the very common f1b689ef7d0db7b0d303e9c9206ee5ad
(which with the
BITS/B58S3 primers gave bb28f2b57f8fddefe6e7b5d01eca8aea
) was matched to
Fusarium oxysporum, all the variations of this were too far away from the
database entries for a match.
These primers amplified a larger fragment to that in amplicon library one.
Focusing on those with a sample-abundance over 75 (as in the edit-graphs)
which the onebp
classifier did not match to the curated reference set:
Long sequence MD5 (ITS1f/ITS2). |
Max |
Species |
|
640 |
glomeromycetes, perhaps Rhizophagus |
|
315 |
glomeromycetes, perhaps Rhizophagus |
|
154 |
glomeromycetes, perhaps Rhizophagus |
|
132 |
Fusarium |
|
96 |
Fusarium |
|
93 |
Fusarium |
|
90 |
Fusarium |
|
88 |
Fusarium |
|
86 |
Fusarium |
|
82 |
Fusarium |
|
81 |
Fusarium |
You can find all of these sequence on the edit-graph, most of those labelled as
likely Fusarium are a 1bp edit away from large grey node f1b689
top left
(except 610caedb1a5699836310fce9dbb9c5fa
which is an isolated node placed
bottom middle). Those labelled glomeromycetes are in the middle near, and in
once case connected to, a dark red Rhizophagus irregularis node.
i.e. None of the ITS1 sequences in amplicon library two are clear cut external contamination.
Amplicon library two - ITS2
Finally, amplicon library two using the ITS3-KYO and ITS4-KYO3 primers for
ITS2. Again, the blast
based classifier matched everything to an entry in
the mock community database. The stricter onebp
classifier assigned most
reads. Here are those few it failed to match with a maximum read abundance
over 75:
MD5 checksum |
Max |
Species |
|
211 |
Fusarium |
|
133 |
Fusarium |
|
78 |
Aspergillus |
These three all appears on the edit-graph separated from a red node (database entry) by a dashed or dotted line indicating a 2bp or 3bp edit away.
Using an online NCBI BLAST search didn’t pin any of these down to species level, but they do all seem to be fungi. Again, quite a few Fusarium matches which could be alternative ITS2 sequences in the genomes but not in the curated reference set. Likewise the Aspergillus like sequence could be from the Aspergillus flavus in the mock community.
i.e. None of the ITS2 sequences in amplicon library two are clear cut external contamination.