Skip to content

Output Formats

Hannes Hauswedell edited this page Nov 19, 2015 · 14 revisions

Lambda supports five output formats, specified by their file extensions:

  • .m0, .m8, .m9 (Blast formats)
  • .sam, .bam (Samtools' formats)

There is also support for on-the-fly transparent compression of of all formats (other than BAM which already is compressed). Just add .gz or .bz2 to the end of the output filename, e.g. output.m9.gz or output.sam.bz2. Depending on processor speed in relation to disk-speed it might even be faster to add .gz than to not using compression.

BLAST output formats

Currently three of the native BLAST output formats are supported:

Description legacy BLAST BLAST+ lambda extension
pairwise -m 0 -outfmt 0 .m0
tabular -m 8 -outfmt 6 .m8
tabular with comment lines -m 9 -outfmt 7 .m9

Custom columns

specifier description
std Default 12 columns (Query Seq-id, Subject Seq-id, Percentage of identical matches, Alignment length, Number of mismatches, Number of gap openings, Start of alignment in query, End of alignment in query, Start of alignment in subject, End of alignment in subject, Expect value, Bit score)
qseqid Query Seq-id
qlen Query sequence length
sseqid Subject Seq-id
slen Subject sequence length
qstart Start of alignment in query
qend End of alignment in query
sstart Start of alignment in subject
send End of alignment in subject
evalue Expect value
bitscore Bit score
score Raw score
length Alignment length
pident Percentage of identical matches
nident Number of identical matches
mismatch Number of mismatches
positive Number of positive-scoring matches
gapopen Number of gap openings
gaps Total number of gaps
ppos Percentage of positive-scoring matches
frames Query and subject frames separated by a '/'
qframe Query frame
sframe Subject frame

For the tabular and tabular with comments lines formats you may specify the order and column composition. The columns have the same specifiers as in BLAST+, right now all of the above are supported via the command line option --output-columns. Please not that it is recommend to keep the first 12 columns as they are for compatibility, i.e. you should always prefix "std": --output-columns "std score qframe".

SAMTOOLS formats

Since version 0.9.2 Lambda also supports the SAM and BAM formats (.sam and .bam). Since SAM and BAM are originally not designed for local alignments, especially of protein sequences, this document describes Lambda's implementation of the standard.

Please see the official specification if some of the terms used here are not clear to you.

column use in Lambda
QNAME name of the query sequence, truncated at first whitespace
FLAG bit 16 and bit 256 implemented in a standard conform way
RNAME name of the subject sequence, truncated at first whitespace
POS begin position of alignment on subject sequence; begin position on original untranslated DNA sequence for TBlastN, TBlastX, end position if negative strand; begin position on protein sequence for BlastP, BlastX
MAPQ 255
CIGAR query DNA cigar (untranslated DNA sequence for BlastX, TBlastX); * for BlastP, TBlastN; reversed if negative strand/frame
RNEXT *
PNEXT 0
TLEN 0
SEQ clipped query DNA sequence (untranslated DNA sequence for BlastX, TBlastX); * for BlastP, TBlastN; reverse-complemented if negative strand/frame
QUAL *
OPT see below

Following the recommendations of the specification the SEQ field is only written, if it is different from the previous line's SEQ field. This can be changed via Lambda's command line parameter --sam-bam-seq which can be set to always or never (the latter saves more space). This behaviour also applies to the ZQ tag defined below.

Optional tags

tag description
AS bit score
OC query protein cigar (* for BLASTN)
NM edit distance (in protein space unless BLASTN)
IH number of matches this query has
ZE expect value
ZR raw score
ZI % identity (in protein space unless BLASTN)
ZP % positive (in protein space unless BLASTN)
ZF query frame
YF subject frame
ZQ query protein sequence (* for BLASTN)

These tags can be specified with the command line argument --sam-bam-tags. If you would like to see any other tags supported, please don't hesitate to contact us.

Header

BAM files require all subject names to be written to the header. For SAM this is not required, so Lambda does not automatically do it to save space (especially for protein database this is a lot!). If you still want them with SAM, e.g. for better BAM compatibility, use the --sam-with-refheader option.

Clone this wiki locally