Manual
Hobbes3 is a software package for efficiently mapping DNA
snippets (reads) against a reference DNA sequence. It can
map short and long reads, and supports Hamming distance
(only substitutions) and edit distance
(substitutions/insertions/deletions). Hobbes3 accepts both
single-end and paired-end reads for alignment, and can run
on multiple CPU cores using multithreading. It supports
three input formats (Fastq, Fasta, and text files) and the
SAM output format. Ambiguous
bases such as the 'N' character are treated as mismatches.
Manual page for the previous versions (Hobbes 1.x) is here
System Requirements
- libbz2 and libz (sudo apt-get install libbz2-dev libz-dev). - boost-iostreams (sudo apt-get install libboost-iostreams-dev).
Compiling Hobbes3
- budil.sh compress: compressed reads are supported.
(the "compress" option requires libboost, libbz2, libz libraries).
Constructing a Hobbes3 Index
Usage
./hobbes-index --sref <input fasta file> \ -i <output index file> -g <gram length> -p <number of threads>
Example
./hobbes-index --sref hg18.fa -i hg18.hix -g 11 -p 4Options
--sref <file> | Reference sequence file to index in fasta format. |
--dref <dir> | Uses all fasta files in given directory as reference sequence. File names become chromosome names. |
-i <file> | Create Hobbes3 index into given file. |
-g <int> | Use given gram length to build a Hobbes3 index. We recommend a gram length of 11. We support gram lengths up to 16, but the index size will increase dramatically after gram length 13. |
-p <int> | Use given number of parallel pthreads to construct the index. |
--noprogress | Disable progress indicator. |
Mapping Reads with Hobbes3
Single-End Reads
1) Hamming distance (substitutions only):./hobbes -q <input fastq file> --sref <fasta reference file> \ -i <hobbes index file> -a --hamming -v <hamming distance> \ -n <number of reads> -p <number of threads>2) Edit distance (substitutions/insertions/deletions):
./hobbes -q <input fastq file> --sref <fasta reference file> \ -i <hobbes index file> -a --indel -v <edit distance> \ -n <number of reads> -p <number of threads>Examples:
hobbes -q reads.fq --sref hg18.fa -i hg18.hix -a --hamming -v 2 -n 10000 -p 16
hobbes -q reads.fq --sref hg18.fa -i hg18.hix -a --indel -v 2 -n 10000 -p 16
Paired-End Reads
1) Hamming distance (substitutions only):./hobbes --pe \ --seqfq1 <first read fastq file> --seqfq2 <second read fastq file> \ --min <minimum insert size> --max <maximum insert size> \ --sref <fasta reference file> -i <hobbes index file> \ -a --hamming -v <hamming distance> -n <number of reads> \ -p <number of threads>2) Edit distance (substitutions/insertions/deletions):
./hobbes --pe \ --seqfq1 <first read fastq file> --seqfq2 <second read fastq file> \ --min <minimum insert size> --max <maximum insert size> \ --sref <fasta reference file> -i <hobbes index file> \ -a --indel -v <hamming distance> -n <number of reads> \ -p <number of threads>Examples:
./hobbes --pe --seqfq1 reads1.fq --seqfq2 reads1.fq --min 50 --max 150 \ --sref hg18.fa -i hg18.hix -a --hamming -v 2 -n 10000 -p 16 ./hobbes --pe --seqfq1 reads1.fq --seqfq2 reads1.fq --min 50 --max 150 \ --sref hg18.fa -i hg18.hix -a --indel -v 2 -n 10000 -p 16
Read Input Options
-q <file> | Map single-end reads in given fastq file. |
-r <file> | Map single-end reads in given line-by-line text file. |
-f <file> | Map single-end reads in given fasta file. |
-c <string> | Map given single-end read (only maps a single read). |
--seqfq1 <file> | First fastq file for paired-end reads. Requires --pe. |
--seqfq2 <file> | Second fastq file for paired-end reads. Requires --pe. |
--gzip | Reads file is compressed with gzip. |
--bzip2 | Reads file is compressed with bzip2. |
Reference Sequence Options
--sref <file> | Reference sequence file in fasta format. |
--dref <dir> | Uses all fasta files in given directory as reference sequence. File names become chromosome names. |
Index Options
-i <file> | Use given Hobbes3 index to perform mapping. |
Mapping Options
Hobbes3 can find all or at most k mappings per read. Note that the running time varies accordingly. If a read has exact mappings, Hobbes3 guarantees to find them. Otherwise, it finds mapping(s) within the specified distance. By default, Hobbes3 maps against the forward and reverse reference, (see --norc and --nofw).-a | Find all mapping locations. |
-m <int> | Find those reads such that the maximum number of distinct mapping locations is less than or equal to a given threshold (single-end mapping only). |
-k <int> | Find upto 'k' mappings per read (single-end mapping only). |
--hamming | Map reads using using Hamming distance. |
--indel | Map reads using edit distance. Uses heuristics to speed up the search, and is not guaranteed to find the best possible mapping locations (but very often it does). |
-v <int> | Distance threshold. Finds reads within given distance threshold (use --hamming for Hamming distance and --indel for edit distance). |
--pe | Enable paired-end mapping mode. See --seqfq1 and --seqfq2. |
--min <int> | Minimum insert size for paired-end mappings. |
--min <int> | Maximum insert size for paired-end mappings. |
-n <int> | Aligns given number of reads (first ones). By default, all the reads are aligned. |
--norc | Maps against forward reference only. |
--nofw | Maps against reverse reference only. |
Output Options
Hobbes3 produces results in the SAM output format with CIGAR strings.By default, mappings are printed to stdout.
--sam-nohead | Suppresses the header lines (starting with '@'). |
--sam-nosq | Suppresses the @SQ header lines. |
--mapout <file> | Prints the mappings to a specified file. |
Other options
-p <int> | Runs given number of parallel pthreads to perform the mapping. |
--noprogress | Disable progress indicator. |
--version | Prints version information. |
--help | Prints usage information. |
The SAM Output Format
Hobbes3 produces results in the SAM format. It outputs one mapping per line. A single read may appear at multiple lines, where the primary mapping is placed first. Reads that are unmapped are not printed. Each line has the following tab separated fields: