Pipeline with defaults

Running thapbi-pict pipeline

First, we will run the THAPBI PICT pipeline command with largely default settings (including the default database and primers), other than including the metadata about the water samples. Note that this dataset has no blanks or negative controls, so we must trust the default minimum abundance threshold.

The key values which we will be changing later are the primers and database.

Assuming you have the FASTQ files in raw_data/, run the pipeline command as follows, and you should get the listed output report files:

$ mkdir -p intermediate_defaults/ summary/
$ thapbi_pict pipeline \
  -i raw_data/ -o summary/recycled-water-defaults \
  -s intermediate_defaults/ \
  -t metadata.tsv -x 7 -c 1,2,3,4,5,6
...
onebp classifier assigned species/genus to 436 of 794 unique sequences from 1 files
Wrote summary/recycled-water-defaults.ITS1.samples.onebp.*
Wrote summary/recycled-water-defaults.ITS1.reads.onebp.*
...
$ ls -1 summary/recycled-water-defaults.*
summary/recycled-water-defaults.ITS1.onebp.tsv
summary/recycled-water-defaults.ITS1.reads.onebp.tsv
summary/recycled-water-defaults.ITS1.reads.onebp.xlsx
summary/recycled-water-defaults.ITS1.samples.onebp.tsv
summary/recycled-water-defaults.ITS1.samples.onebp.xlsx
summary/recycled-water-defaults.ITS1.tally.tsv

Here we used -r (or --report) to specify a different stem for the report filenames. The sample metadata options were described earlier – this is perhaps an idealised example in that metadata.tsv was created so that we add the first six columns the table (sorted in that order), where -x 7 means index to the accession (filename prefix) in column seven.

Notice the output reported a taxonomic assignment for 431 of 794 unique sequences - that’s 54%, but considerably higher if we consider the reads.

Results

We will compare and contrast the following four samples with the second run using different primers and a custom database. These were deliberately picked from the less diverse samples for clarity.

Here we pick out the four samples at the command line with grep, you can also look at the recycled-water-defaults.ITS1.samples.onebp.xlsx file in Excel:

$ cut -f 6,7,8 summary/recycled-water-defaults.ITS1.samples.onebp.tsv \
  | grep -E "(SRR6303586|SRR6303586|SRR6303588|SRR6303596|SRR6303948)"
OSU482       SRR6303588  Phytophthora chlamydospora, Phytophthora x stagnum(*), Unknown
OSU483       SRR6303586  Phytophthora chlamydospora, Phytophthora x stagnum(*)
OSU536.s203  SRR6303948  Phytophthora ramorum
OSU121       SRR6303596  Phytopythium (unknown species)

Three of these four have Phytophthora (and one with an unknown), while the fourth has Phytopythium. However, this is discarding all the reads which do not match the default Phytophthora centric primers.