-
Notifications
You must be signed in to change notification settings - Fork 20
Output Formats
Lambda supports five output formats, specified by their file extensions:
-
.m0,.m8,.m9(Blast formats) -
.sam,.bam(Samtools' formats)
There is also support for on-the-fly transparent compression of of all formats (other than BAM which already is compressed). Just add .gz or .bz2 to the end of the output filename, e.g. output.m9.gz or output.sam.bz2. Depending on processor speed in relation to disk-speed it might even be faster to add .gz than to not using compression.
Currently three of the native BLAST output formats are supported:
| Description | legacy BLAST | BLAST+ | lambda extension |
|---|---|---|---|
| pairwise | -m 0 |
-outfmt 0 |
.m0 |
| tabular | -m 8 |
-outfmt 6 |
.m8 |
| tabular with comment lines | -m 9 |
-outfmt 7 |
.m9 |
| specifier | description |
|---|---|
std |
Default 12 columns (Query Seq-id, Subject Seq-id, Percentage of identical matches, Alignment length, Number of mismatches, Number of gap openings, Start of alignment in query, End of alignment in query, Start of alignment in subject, End of alignment in subject, Expect value, Bit score) |
qseqid |
Query Seq-id |
qlen |
Query sequence length |
sseqid |
Subject Seq-id |
slen |
Subject sequence length |
qstart |
Start of alignment in query |
qend |
End of alignment in query |
sstart |
Start of alignment in subject |
send |
End of alignment in subject |
evalue |
Expect value |
bitscore |
Bit score |
score |
Raw score |
length |
Alignment length |
pident |
Percentage of identical matches |
nident |
Number of identical matches |
mismatch |
Number of mismatches |
positive |
Number of positive-scoring matches |
gapopen |
Number of gap openings |
gaps |
Total number of gaps |
ppos |
Percentage of positive-scoring matches |
frames |
Query and subject frames separated by a '/' |
qframe |
Query frame |
sframe |
Subject frame |
For the tabular and tabular with comments lines formats you may specify the order and column composition. The columns have the same specifiers as in BLAST+, right now all of the above are supported via the command line option --output-columns. Please not that it is recommend to keep the first 12 columns as they are for compatibility, i.e. you should always prefix "std": --output-columns "std score qframe".
Since version 0.9.2 Lambda also supports the SAM and BAM formats (.sam and .bam). Since SAM and BAM are originally not designed for local alignments, especially of protein sequences, this document describes Lambda's implementation of the standard.
Please see the official specification if some of the terms used here are not clear to you.
| column | use in Lambda |
|---|---|
| QNAME | name of the query sequence, truncated at first whitespace |
| FLAG | bit 16 and bit 256 implemented in a standard conform way |
| RNAME | name of the subject sequence, truncated at first whitespace |
| POS | begin position of alignment on subject sequence; begin position on original untranslated DNA sequence for TBlastN, TBlastX, end position if negative strand; begin position on protein sequence for BlastP, BlastX |
| MAPQ | 255 |
| CIGAR | query DNA cigar (untranslated DNA sequence for BlastX, TBlastX); * for BlastP, TBlastN; reversed if negative strand/frame |
| RNEXT | * |
| PNEXT | 0 |
| TLEN | 0 |
| SEQ | clipped query DNA sequence (untranslated DNA sequence for BlastX, TBlastX); * for BlastP, TBlastN; reverse-complemented if negative strand/frame |
| QUAL | * |
| OPT | see below |
Following the recommendations of the specification the SEQ field is only written, if it is different from the previous line's SEQ field. This can be changed via Lambda's command line parameter --sam-bam-seq which can be set to always or never (the latter saves more space). This behaviour also applies to the ZQ tag defined below.
| tag | description |
|---|---|
AS |
bit score |
OC |
query protein cigar (* for BLASTN) |
NM |
edit distance (in protein space unless BLASTN) |
IH |
number of matches this query has |
ZE |
expect value |
ZR |
raw score |
ZI |
% identity (in protein space unless BLASTN) |
ZP |
% positive (in protein space unless BLASTN) |
ZF |
query frame |
YF |
subject frame |
ZQ |
query protein sequence (* for BLASTN) |
These tags can be specified with the command line argument --sam-bam-tags. If you would like to see any other tags supported, please don't hesitate to contact us.
BAM files require all subject names to be written to the header. For SAM this is not required, so Lambda does
not automatically do it to save space (especially for protein database this is a lot!). If you still want
them with SAM, e.g. for better BAM compatibility, use the --sam-with-refheader option.
If anything is unclear, don't hesitate to contact to me.