Manual

Hobbes3 is a software package for efficiently mapping DNA snippets (reads) against a reference DNA sequence. It can map short and long reads, and supports Hamming distance (only substitutions) and edit distance (substitutions/insertions/deletions). Hobbes3 accepts both single-end and paired-end reads for alignment, and can run on multiple CPU cores using multithreading. It supports three input formats (Fastq, Fasta, and text files) and the SAM output format. Ambiguous bases such as the 'N' character are treated as mismatches.

Manual page for the previous versions (Hobbes 1.x) is here

System Requirements

  • We developed and tested Hobbes3 under Ubuntu 14.04.2 LTS 64-bit.
  • GCC. Hobbes3 uses GCC builtin functions. We used GCC 4.8.2
  • CMake. Required for compiling Hobbes3 (sudo apt-get install cmake).
  • The following libraries are required if you want to map compressed reads
       - libbz2 and libz (sudo apt-get install libbz2-dev libz-dev).
       - boost-iostreams (sudo apt-get install libboost-iostreams-dev).
  • Compiling Hobbes3

  • Download the Hobbes3 source from here.
  • Extract the contents of the archive (tar -zxvf hobbes-3.0.tar.gz).
  • cd into the Hobbes3 source root directory (cd hobbes3.0).
  • Run either "build.sh nocompress" or "build.sh compress".
  •    - build.sh nocompress: compressed reads are not supported.
       - budil.sh compress: compressed reads are supported.
         (the "compress" option requires libboost, libbz2, libz libraries).
  • The Hobbes3 binaries are placed in the "build" directory.
  • Constructing a Hobbes3 Index

    Usage

    ./hobbes-index --sref <input fasta file> \
                   -i <output index file> -g <gram length> -p <number of threads>
    

    Example

    ./hobbes-index --sref hg18.fa -i hg18.hix -g 11 -p 4

    Options

    --sref <file> Reference sequence file to index in fasta format.
    --dref <dir> Uses all fasta files in given directory as reference sequence. File names become chromosome names.
    -i <file> Create Hobbes3 index into given file.
    -g <int> Use given gram length to build a Hobbes3 index. We recommend a gram length of 11. We support gram lengths up to 16, but the index size will increase dramatically after gram length 13.
    -p <int> Use given number of parallel pthreads to construct the index.
    --noprogress Disable progress indicator.

    Mapping Reads with Hobbes3

    Single-End Reads

    1) Hamming distance (substitutions only):
    ./hobbes -q <input fastq file> --sref <fasta reference file>       \
             -i <hobbes index file> -a --hamming -v <hamming distance> \
             -n <number of reads> -p <number of threads>
    
    2) Edit distance (substitutions/insertions/deletions):
    ./hobbes -q <input fastq file> --sref <fasta reference file>  \
             -i <hobbes index file> -a --indel -v <edit distance> \
             -n <number of reads> -p <number of threads>
    
    Examples:

    hobbes -q reads.fq --sref hg18.fa -i hg18.hix -a --hamming -v 2 -n 10000 -p 16
    hobbes -q reads.fq --sref hg18.fa -i hg18.hix -a --indel -v 2 -n 10000 -p 16

    Paired-End Reads

    1) Hamming distance (substitutions only):
    ./hobbes --pe                                                               \
             --seqfq1 <first read fastq file> --seqfq2 <second read fastq file> \
             --min <minimum insert size> --max <maximum insert size>            \
             --sref <fasta reference file> -i <hobbes index file>               \
             -a --hamming -v <hamming distance> -n <number of reads>            \
             -p <number of threads>
    
    2) Edit distance (substitutions/insertions/deletions):
    ./hobbes --pe                                                               \
             --seqfq1 <first read fastq file> --seqfq2 <second read fastq file> \
             --min <minimum insert size> --max <maximum insert size>            \
             --sref <fasta reference file> -i <hobbes index file>               \
             -a --indel -v <hamming distance> -n <number of reads>              \
             -p <number of threads>
    
    Examples:
    ./hobbes --pe --seqfq1 reads1.fq --seqfq2 reads1.fq  --min 50 --max 150 \
             --sref hg18.fa -i hg18.hix -a --hamming -v 2 -n 10000 -p 16
    ./hobbes --pe --seqfq1 reads1.fq --seqfq2 reads1.fq  --min 50 --max 150 \
             --sref hg18.fa -i hg18.hix -a --indel   -v 2 -n 10000 -p 16
    

    Read Input Options

    -q <file> Map single-end reads in given fastq file.
    -r <file> Map single-end reads in given line-by-line text file.
    -f <file> Map single-end reads in given fasta file.
    -c <string> Map given single-end read (only maps a single read).
    --seqfq1 <file> First fastq file for paired-end reads. Requires --pe.
    --seqfq2 <file> Second fastq file for paired-end reads. Requires --pe.
    --gzip Reads file is compressed with gzip.
    --bzip2 Reads file is compressed with bzip2.
    top

    Reference Sequence Options

    --sref <file> Reference sequence file in fasta format.
    --dref <dir> Uses all fasta files in given directory as reference sequence. File names become chromosome names.
    top

    Index Options

    -i <file> Use given Hobbes3 index to perform mapping.
    top

    Mapping Options

    Hobbes3 can find all or at most k mappings per read. Note that the running time varies accordingly. If a read has exact mappings, Hobbes3 guarantees to find them. Otherwise, it finds mapping(s) within the specified distance. By default, Hobbes3 maps against the forward and reverse reference, (see --norc and --nofw).

    -a Find all mapping locations.
    -m <int> Find those reads such that the maximum number of distinct mapping locations is less than or equal to a given threshold (single-end mapping only).
    -k <int> Find upto 'k' mappings per read (single-end mapping only).
    --hamming Map reads using using Hamming distance.
    --indel Map reads using edit distance. Uses heuristics to speed up the search, and is not guaranteed to find the best possible mapping locations (but very often it does).
    -v <int> Distance threshold. Finds reads within given distance threshold (use --hamming for Hamming distance and --indel for edit distance).
    --pe Enable paired-end mapping mode. See --seqfq1 and --seqfq2.
    --min <int> Minimum insert size for paired-end mappings.
    --min <int> Maximum insert size for paired-end mappings.
    -n <int> Aligns given number of reads (first ones). By default, all the reads are aligned.
    --norc Maps against forward reference only.
    --nofw Maps against reverse reference only.
    top

    Output Options

    Hobbes3 produces results in the SAM output format with CIGAR strings.
    By default, mappings are printed to stdout.

    --sam-nohead Suppresses the header lines (starting with '@').
    --sam-nosq Suppresses the @SQ header lines.
    --mapout <file> Prints the mappings to a specified file.
    top

    Other options

    -p <int> Runs given number of parallel pthreads to perform the mapping.
    --noprogress Disable progress indicator.
    --version Prints version information.
    --help Prints usage information.
    top

    The SAM Output Format

    Hobbes3 produces results in the SAM format. It outputs one mapping per line. A single read may appear at multiple lines, where the primary mapping is placed first. Reads that are unmapped are not printed. Each line has the following tab separated fields:

    1. Name of the read mapped.

    2. SAM bitwise FLAG.

    3. RNAME: Reference sequence name of the mapping. If @SQ header lines are present, RNAME must be present in one of the SQ-SN tag.

    4. POS: 1-based leftmost mapping POSition of the first matching base. The first base in a reference sequence has coordinate 1.

    5. MAPQ: Mapping Quality. A value 255 indicates that the mapping quality is not available. Since we don't support this yet, it's set to 255.

    6. CIGAR: CIGAR string.

    7. RNEXT: Reference sequence name of the NEXT fragment in the template. Currently unavailable and hence set to`*'.

    8. PNEXT: Position of the NEXT fragment in the template. In single-end alignment, it's set to 0; in paired-end alignment, it's the positon of it's mate pair.

    9. TLEN: Signed insert size, it is set as 0 for single-end reads or when the information is unavailable.

    10. SEQ: Read sequence. If the current mapping is not a primary mapping, it is set to `*'.

    11. QUAL: ASCII of base QUALity plus 33 (same as the quality string in the Sanger FASTQ format). If either the input file is not in the FASTQ or the current mapping is not a primary mapping, it is set to `*'.

    12. Optional fields: For descriptions of all possible optional fields, see the SAM format specification. The fields relevant to Hobbes3 is,

      1. NM:i:<N> - Mapped read has hamming/edit distance of <N>.

    top    

    2015 ISG | Website maintained by Jongik Kim | Created by Yun Huang | Original design Andreas Viklund