Command Line Interface

`haystac config`

Optional arguments:

-h, --help          Show this help message and exit
--cache <path>      Cache folder for storing genomes downloaded from NCBI
                    and other shared data (default:
                    /home/antony/haystac/cache)
--clear-cache       Clear the contents of the cache folder, and delete the
                    folder itself (default: False)
--api-key <code>    Personal NCBI API key (increases max concurrent requests
                    from 3 to 10,
                    https://www.ncbi.nlm.nih.gov/account/register/)
--use-conda <bool>  Use conda as a package manger (default: False)

`haystac database`

Required arguments:

--mode <mode>         Database creation mode for haystac [fetch, index,
                      build]
--output <path>       Path to the database output directory

Required choice:

--query <query>       Database query in the NCBI query language. Please
                      refer to the documentation for assistance with
                      constructing a valid query.
--query-file <path>   File containing a database query in the NCBI query
                      language.
--accessions-file <path>
                      Tab delimited file containing one record per row: the
                      name of the taxon, and a valid NCBI accession code
                      from the nucleotide, assembly or WGS databases.
--sequences-file <path>
                      Tab delimited file containing one record per row: the
                      name of the taxon, a user defined accession code, and
                      the path to the fasta file (optionally compressed).
--refseq-rep <table>  Use one of the RefSeq curated tables to construct a
                      DB. Includes all prokaryotic species (excluding
                      strains) from the representative RefSeq DB, or all the
                      species and strains from the viruses DB, or all the
                      species and subspecies from the eukaryotes DB. If
                      multiple accessions exist for a given species/strain,
                      the first pair of species/accession is kept. Available
                      RefSeq tables to use [prokaryote_rep, viruses,
                      eukaryotes].

Optional arguments:

--force-accessions    Disable validation checks for 'anomalous' assembly
                      flags in NCBI (default: False)
--exclude-accessions <accession> [<accession> ...]
                      List of NCBI accessions to exclude. (default: [])
--resolve-accessions  Pick the first accession when two accessions for a
                      taxon can be found in user provided input files
                      (default: False)
--bowtie2-scaling <float>
                      Rescaling factor to keep the bowtie2 mutlifasta index
                      below the maximum memory limit (default: 25.0)
--rank <rank>         Taxonomic rank to perform the identifications on
                      [genus, species, subspecies, serotype] (default:
                      species)
--genera <genus> [<genus> ...]
                      List of genera to restrict the abundance calculations.
--mtDNA               For eukaryotes, download mitochondrial genomes only.
                      Not to be used with --refseq-rep or queries containing
                      prokaryotes (default: False)
--seed <int>          Random seed for database indexing

Common arguments:

-h, --help            Show this help message and exit
--cores <int>         Maximum number of CPU cores to use (default: MAX_CPUs)
--mem <int>           Maximum memory (MB) to use (default: MAX_MEM)
--unlock              Unlock the output directory following a crash or hard
                      restart (default: False)
--debug               Enable debugging mode (default: False)
--snakemake '<json>'  Pass additional flags to the `snakemake` scheduler..

`haystac sample`

Required arguments:

--output <path>       Path to the sample output directory

Required choice:

--fastq <path>        Single-end fastq input file (optionally compressed).
--fastq-r1 <path>     Paired-end forward strand (R1) fastq input file.
--fastq-r2 <path>     Paired-end reverse strand (R2) fastq input file.
--sra <accession>     Download fastq input from the SRA database

Optional arguments:

--collapse <bool>     Collapse overlapping paired-end reads, e.g. for aDNA
                      (default: False)
--trim-adapters <bool>
                      Automatically trim sequencing adapters from fastq
                      input (default: True)

Common arguments:

-h, --help            Show this help message and exit
--cores <int>         Maximum number of CPU cores to use (default: MAX_CPUs)
--mem <int>           Maximum memory (MB) to use (default: MAX_MEM)
--unlock              Unlock the output directory following a crash or hard
                      restart (default: False)
--debug               Enable debugging mode (default: False)
--snakemake '<json>'  Pass additional flags to the ``snakemake`` scheduler.

`haystac analyse`

Required arguments:

--mode <mode>         Analysis mode for the selected sample [filter, align,
                      likelihoods, probabilities, abundances, reads,
                      mapdamage]
--database <path>     Path to the database output directory
--sample <path>       Path to the sample output directory
--output <path>       Path to the analysis output directory

Optional arguments:

--genera <genus> [<genus> ...]
                      List of genera to restrict the abundance calculations
                      (default: [])
--min-prob <float>    Minimum posterior probability to assign an aligned
                      read to a given species (default: 0.75)
--mismatch-probability <float>
                      Base mismatch probability (default: 0.05)

Common arguments:

-h, --help            Show this help message and exit
--cores <int>         Maximum number of CPU cores to use (default: MAX_CPUs)
--mem <int>           Maximum memory (MB) to use (default: MAX_MEM)
--unlock              Unlock the output directory following a crash or hard
                      restart (default: False)
--debug               Enable debugging mode (default: False)
--snakemake '<json>'  Pass additional flags to the `snakemake` scheduler.

Command Line Interface

haystac config

haystac database

haystac sample

haystac analyse

`haystac config`

`haystac database`

`haystac sample`

`haystac analyse`