Command Line

THAPBI PICT is a command line tool, meaning you must open your command line terminal window and key in instructions to use the tool. The documentation examples use the $ (dollar sign) to indicate the prompt, followed by text to be entered. For example, this should run the tool with no instructions:

$ thapbi_pict

Rather than literally printing dot dot dot, the tool should print out some terse help, listing various sub-command names, and an example of how to get more help.

For example, -v (minus sign, lower case letter v) or --version (minus, minus, version in lower case) can be added to find out the version of the tool installed:

$ thapbi_pict -v

THAPBI PICT follows the sub-command style popularised in bioinformatics by samtools (also used in the version control tool git). This means most of the instructions take the form thapbi_pict sub-command ..., where the dots indicate some additional options.

The main sub-commands are to do with classifying sequence files and reporting the results, and these are described in the first worked example:

  • prepare - turn paired FASTQ input files for each sample, giving de-duplicated FASTA files

  • fasta-nr and sample-tally pooling intermediate files for analysis

  • classify - produce genus/species level predictions as tab-separated-variable TSV files

  • summary - summarise a set of predictions by sample (with human readable report), and by unique sequence and sample (both with Excel reports)

  • edit-graph - draw the unique sequences as nodes on a graph, connected by edit-distance

  • assess - compare classifier output to known positive controls

  • pipeline - run all of the above in sequence

There are further sub-commands to do with making or inspecting an SQLite3 format barcode marker sequence database, most of which are covered in the second worked example, with a custom database:

  • dump - export a DB as TSV or FASTA format

  • load-tax - import a copy of the NCBI taxonomy

  • import - import a FASTA file, e.g. using the NCBI style naming

  • conflicts - report on genus or species level conflicts in the database

And some other miscellaneous commands:

  • ena-submit - write a TSV table of your paired FASTQ files for use with the ENA interactive submission system.

Start with reading the help for any command using -h or --help as follows:

$ thapbi_pict pipeline -h

Most of the commands have required arguments, and if you omit a required argument it will stop with an error:

$ thapbi_pict pipeline
thapbi_pict pipeline: error: the following arguments are required: -i/--input, -o/--output