Skip to content

BLAST+ (blastall) Tutorial

by Jared on January 15th, 2011
blastall

This is a tutorial for NCBI’s BLAST+ tools (formerly blastall) which allows users to run various BLAST tools from their own machines. BLAST+ can be downloaded from NCBI’s website.

The BLAST algorithm and tool is possibly one of the most popular and important bioinformatics tools. BLAST is an algorithm that allows researchers to search a database of sequences (DNA or Amino Acid) for matches or similarities to their query sequence. BLAST is a heuristic algorithm so it isn’t as optimal as the Smith-Waterman algorithm but it is much faster and therefore generally more useful. For a detailed look into the BLAST algorithm check out the BLAST Wikipedia entry.

The first step is to download and install the BLAST+ tools onto your machine. Once you have done that you’ll find there are a number of different tools in the package. The main one that you will use to perform different blasts is blastall. Here is an example of it’s usage:

$ blastall -p blastn -d nr -e 10 -i inputfile -o outputfile

Common options are:
-p: blast program to use (blastn, blastp, blastx, …) See table below.
-d: database to use (defaults to nr) See section below.
-e: expectation value threshold (defaults to 10)
-i: input query sequence (In FASTA or Genbank format)
-o: output file name
All other options will be displayed if you run “blastall” without any arguments.

BLAST Programs

When running blastall you need to specify which blast you would like to perform (using the -p option). Below is a table of the different blast programs that blastall can run.

Program Input-Output Description
blastn nucleotide-nucleotide This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.
blastp protein-protein This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.
blastx nucleotide 6-frame translation-protein This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.
tblastx nucleotide 6-frame translation-nucleotide 6-frame translation This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences.
tblastn protein-nucleotide 6-frame translation This program compares a protein query against the all six reading frames of a nucleotide sequence database.
psi-blast position-specific iterative This program is used to find distant relatives of a protein. First, a list of all closely related proteins is created. These proteins are combined into a general “profile” sequence, which summarises significant features present in these sequences. A query against the protein database is then run using this profile, and a larger group of proteins is found. This larger group is used to construct another profile, and the process is repeated. By including related proteins in the search, PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.
megablast large numbers of query sequences When comparing large numbers of input sequences via the command-line BLAST, “megablast” is much faster than running BLAST multiple times. It concatenates many input sequences together to form a large sequence before searching the BLAST database, then post-analyze the search results to glean individual alignments and statistical values.

Database Usage

When you specify a database to query (using the -d option) you need to have a local copy of whichever database you want to use. If you want to use one of NCBI’s databases then you first need to download a copy of the database using the update_blastdb.pl script. Alternatively, if you have your own sequences that you want to use as your database then you need to use the makeblastdb tool (formerly formatdb) in order to format the sequences in a way that blastall can read. Here is an example of how to do this:

$ makeblastdb -i myseqs.fa -p T -n my_db

Common options are:
-i: input sequence file
-n: name of database to be created
-p: T or F if sequences to be parsed are in FASTA format

Once you have created your local database you can then try performing a BLAST on it:

$ blastall -p blastn -d my_db -i query_seq.fa -o outputfile

Hopefully this tutorial will help get you started using command line BLAST. If you have any further questions about some of the functionality not mentioned in this tutorial then leave a comment below and I will answer any of your questions.

No comments yet

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS