<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bioinformatics Box</title>
	<atom:link href="http://www.bioinformaticsbox.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bioinformaticsbox.com</link>
	<description>Free Bioinformatics Tools For Researchers</description>
	<lastBuildDate>Sun, 16 Jan 2011 01:04:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>BLAST+ (blastall) Tutorial</title>
		<link>http://www.bioinformaticsbox.com/blast-blastall-tutorial/</link>
		<comments>http://www.bioinformaticsbox.com/blast-blastall-tutorial/#comments</comments>
		<pubDate>Sun, 16 Jan 2011 01:03:29 +0000</pubDate>
		<dc:creator>Jared</dc:creator>
				<category><![CDATA[Bioinformatics Programming]]></category>
		<category><![CDATA[Sequence Analysis]]></category>
		<category><![CDATA[Tutorials]]></category>

		<guid isPermaLink="false">http://www.bioinformaticsbox.com/?p=77</guid>
		<description><![CDATA[This is a tutorial for NCBI&#8217;s BLAST+ tools (formerly blastall) which allows users to run various BLAST tools from their own machines. BLAST+ can be downloaded from NCBI&#8217;s website. The BLAST algorithm and tool is possibly one of the most popular and important bioinformatics tools. BLAST is an algorithm that allows researchers to search a [...]]]></description>
			<content:encoded><![CDATA[<p>This is a tutorial for <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&#038;PAGE_TYPE=BlastDocs&#038;DOC_TYPE=Download" target="_blank">NCBI&#8217;s BLAST+</a> tools (formerly blastall) which allows users to run various BLAST tools from their own machines. BLAST+ can be downloaded from <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&#038;PAGE_TYPE=BlastDocs&#038;DOC_TYPE=Download" target="_blank">NCBI&#8217;s</a> website.</p>
<p>The BLAST algorithm and tool is possibly one of the most popular and important bioinformatics tools. BLAST is an algorithm that allows researchers to search a database of sequences (DNA or Amino Acid) for matches or similarities to their query sequence. BLAST is a heuristic algorithm so it isn&#8217;t as optimal as the <a href="http://en.wikipedia.org/wiki/Smith-Waterman_algorithm" target="_blank">Smith-Waterman algorithm</a> but it is much faster and therefore generally more useful. For a detailed look into the BLAST algorithm check out the <a href="http://en.wikipedia.org/wiki/BLAST#Algorithm" target="_blank">BLAST Wikipedia entry.</a></p>
<p>The first step is to <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&#038;PAGE_TYPE=BlastDocs&#038;DOC_TYPE=Download" target="_blank">download</a> and install the BLAST+ tools onto your machine. Once you have done that you&#8217;ll find there are a number of different tools in the package. The main one that you will use to perform different blasts is <em>blastall</em>. Here is an example of it&#8217;s usage:</p>
<p><code>$ blastall -p blastn -d nr -e 10 -i inputfile -o outputfile</code></p>
<p>Common options are:<span id="more-77"></span><br />
-p: blast program to use (blastn, blastp, blastx, &#8230;) See table below.<br />
-d: database to use (defaults to nr) See section below.<br />
-e: expectation value threshold (defaults to 10)<br />
-i: input query sequence (In FASTA or Genbank format)<br />
-o: output file name<br />
All other options will be displayed if you run &#8220;blastall&#8221; without any arguments.</p>
<h2>BLAST Programs</h2>
<p>When running <em>blastall</em> you need to specify which blast you would like to perform (using the -p option). Below is a table of the different blast programs that <em>blastall</em> can run.</p>
<table>
<tr style="background-color:#ddd;">
<td><strong>Program</strong></td>
<td><strong>Input-Output</strong></td>
<td><strong>Description</strong></td>
</tr>
<tr>
<td><strong>blastn</strong></td>
<td>nucleotide-nucleotide</td>
<td>This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies.</td>
</tr>
<tr>
<td><strong>blastp</strong></td>
<td>protein-protein</td>
<td>This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies.</td>
</tr>
<tr>
<td><strong>blastx</strong></td>
<td>nucleotide 6-frame translation-protein</td>
<td>This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.</td>
</tr>
<tr>
<td><strong>tblastx</strong></td>
<td>nucleotide 6-frame translation-nucleotide 6-frame translation</td>
<td>This program is the slowest of the BLAST family. It translates the query nucleotide sequence in all six possible frames and compares it against the six-frame translations of a nucleotide sequence database. The purpose of tblastx is to find very distant relationships between nucleotide sequences.</td>
</tr>
<tr>
<td><strong>tblastn</strong></td>
<td>protein-nucleotide 6-frame translation</td>
<td>This program compares a protein query against the all six reading frames of a nucleotide sequence database.</td>
</tr>
<tr>
<td><strong>psi-blast</strong></td>
<td>position-specific iterative</td>
<td>This program is used to find distant relatives of a protein. First, a list of all closely related proteins is created. These proteins are combined into a general &#8220;profile&#8221; sequence, which summarises significant features present in these sequences. A query against the protein database is then run using this profile, and a larger group of proteins is found. This larger group is used to construct another profile, and the process is repeated. By including related proteins in the search, PSI-BLAST is much more sensitive in picking up distant evolutionary relationships than a standard protein-protein BLAST.</td>
</tr>
<tr>
<td><strong>megablast</strong></td>
<td>large numbers of query sequences</td>
<td>When comparing large numbers of input sequences via the command-line BLAST, &#8220;megablast&#8221; is much faster than running BLAST multiple times. It concatenates many input sequences together to form a large sequence before searching the BLAST database, then post-analyze the search results to glean individual alignments and statistical values.</td>
</tr>
</table>
<h2>Database Usage</h2>
<p>When you specify a database to query (using the -d option) you need to have a local copy of whichever database you want to use. If you want to use one of NCBI&#8217;s databases then you first need to download a copy of the database using the <a href="http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl" target="_blank">update_blastdb.pl</a> script. Alternatively, if you have your own sequences that you want to use as your database then you need to use the <em>makeblastdb</em> tool (formerly formatdb) in order to format the sequences in a way that <em>blastall</em> can read. Here is an example of how to do this:</p>
<p><code>$ makeblastdb -i myseqs.fa -p T -n my_db</code></p>
<p>Common options are:<br />
-i: input sequence file<br />
-n: name of database to be created<br />
-p: T or F if sequences to be parsed are in FASTA format</p>
<p>Once you have created your local database you can then try performing a BLAST on it:</p>
<p><code>$ blastall -p blastn -d my_db -i query_seq.fa -o outputfile</code></p>
<p>Hopefully this tutorial will help get you started using command line BLAST. If you have any further questions about some of the functionality not mentioned in this tutorial then leave a comment below and I will answer any of your questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bioinformaticsbox.com/blast-blastall-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioPerl Modules</title>
		<link>http://www.bioinformaticsbox.com/bioperl-modules/</link>
		<comments>http://www.bioinformaticsbox.com/bioperl-modules/#comments</comments>
		<pubDate>Mon, 01 Nov 2010 19:13:13 +0000</pubDate>
		<dc:creator>Jared</dc:creator>
				<category><![CDATA[Bioinformatics Programming]]></category>

		<guid isPermaLink="false">http://www.bioinformaticsbox.com/blog/?p=51</guid>
		<description><![CDATA[Perl has long been a popular programming language for bioinformatics programmers due in part to its exceptional text search/manipulation properties. It is also an easy to use, yet powerful, scripting language. No doubt, anyone who has done any bioinformatics programming has done a bit of Perl programming and hopefully used BioPerl. BioPerl is a great [...]]]></description>
			<content:encoded><![CDATA[<p>Perl has long been a popular programming language for bioinformatics programmers due in part to its exceptional text search/manipulation properties. It is also an easy to use, yet powerful, scripting language. No doubt, anyone who has done any bioinformatics programming has done a bit of Perl programming and hopefully used BioPerl.</p>
<p>BioPerl is a great set of open source modules for Perl programming. These modules simplify many of the common tasks that bioinformatics programmers regularly deal with. BioPerl saves the programmer lots of time so it is worth putting in a little bit of effort to become familiar with the modules. </p>
<p>BioPerl provides modules for many common bioinformatics tasks. Here are some of the features that BioPerl has modules for: <span id="more-51"></span></p>
<ul>
<li>Accessing sequence data from local and remote databases</li>
<li>Transforming formats of database/ file records</li>
<li>Manipulating individual sequences</li>
<li>Searching for similar sequences</li>
<li>Creating and manipulating sequence alignments</li>
<li>Searching for genes and other structures on genomic DNA</li>
<li>Developing machine readable sequence annotations </li>
</ul>
<p>Head on over to <a href="http://www.bioperl.org" target="_blank">BioPerl&#8217;s Website</a> and learn more about BioPerl.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bioinformaticsbox.com/bioperl-modules/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is Bioinformatics?</title>
		<link>http://www.bioinformaticsbox.com/what-is-bioinformatics/</link>
		<comments>http://www.bioinformaticsbox.com/what-is-bioinformatics/#comments</comments>
		<pubDate>Sun, 25 Jul 2010 21:47:46 +0000</pubDate>
		<dc:creator>Jared</dc:creator>
				<category><![CDATA[General Bioinformatics]]></category>

		<guid isPermaLink="false">http://www.bioinformaticsbox.com/blog/?p=35</guid>
		<description><![CDATA[I frequently get asked What is Bioinformatics? This is not a question that has an exact, easy answer due to the size and complexity of the Bioinformatics field. I find that very often I give different answers each time this question is asked. I will attempt to answer this question in a way that won&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>I frequently get asked What is Bioinformatics? This is not a question that has an exact, easy answer due to the size and complexity of the Bioinformatics field. I find that very often I give different answers each time this question is asked. I will attempt to answer this question in a way that won&#8217;t be too complex but also not too shallow. However, I will inevitably be forced to leave complex parts of Bioinformatics out of this answer.</p>
<p>The basic answer is that Bioinformatics is the field where Computer Science, Biology, and Statistics meet. But even for a basic answer that is still too vague. I like to think of it as using Computer Science and Statistics to find and solve biological problems. Personally I like to think of Bioinformatics having two main focuses: <span id="more-35"></span>Data Management and Research.</p>
<h2>Data Management</h2>
<p>A good chunk of Bioinformatics is devoted to purely Data Management, which is not an easy task considering the amount of data that Scientists generate. In the age of high-throughput machines, gigabytes of biological data can be produced in a matter of minutes or even seconds &#8212; particularly from machines that sequence DNA and Protein sequences and measure gene expression levels in a cell. The challenge lies in storing this data in databases in a way that will preserve the data and allow it to be easily accessible so it can be used in current and future research. The use of a number of different file formats adds even more complexity to this challenge.</p>
<p>Keeping the data accessible involves programming front-end applications to these databases that allow these large amounts of data to be viewed and interpreted. Extra care is taken to use optimal search algorithms so that the stored data can be quickly searched over. Computationally expensive algorithms are used to try and align sequences in order to find similar or exact matches. Using statistics, data mining techniques sometimes have to be used in order to extract the desired information from these vast databases. </p>
<h2>Research</h2>
<p>The Bioinformatics field is a research field so even within data management there is research being performed. Faster and more accurate computational algorithms are always sought after. New methods for biological data analysis are frequently being developed. Bioinformaticians use genomic and gene expression data to find new protein-protein interactions within cells. Algorithms and statistical methods are used to predict the location of genes within long strands of DNA sequences. The complex structures of DNA and Protein molecules are predicted.</p>
<p>The list of research being performed goes on and on. The current research and future potential of Bioinformatics is what makes this field exciting.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bioinformaticsbox.com/what-is-bioinformatics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (enhanced)

Served from: www.bioinformaticsbox.com @ 2012-02-22 16:00:35 -->
