<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ion Flux</title>
	<atom:link href="http://ionflux.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://ionflux.com/blog</link>
	<description>Clinical Genomics + High-Performance Computing</description>
	<lastBuildDate>Wed, 28 Sep 2011 02:02:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>ANNOVAR Performance Optimizations</title>
		<link>http://ionflux.com/blog/2011/09/27/annovar-performance-optimizations/</link>
		<comments>http://ionflux.com/blog/2011/09/27/annovar-performance-optimizations/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 01:59:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=59</guid>
		<description><![CDATA[Ion Flux contributed major performance improvements to the latest version (2011Sep11) of ANNOVAR. From the ANNOVAR website: 2011Sep11: New Version of ANNOVAR is released with significant speedup of filter operation for certain databases (dbSNP, SIFT, PolyPhen, 1000G, etc). In previous version of ANNOVAR, filter-based annotation for ex1.human (12 variants) requires ~10 minutes for snp132, sift [...]]]></description>
			<content:encoded><![CDATA[<p>Ion Flux contributed major performance improvements to the latest version (2011Sep11) of <a href="http://www.openbioinformatics.org/annovar/">ANNOVAR</a>.</p>
<p>From the ANNOVAR website:</p>
<blockquote><p>2011Sep11: New Version of ANNOVAR is released with significant speedup of filter operation for certain databases (dbSNP, SIFT, PolyPhen, 1000G, etc). <strong>In previous version of ANNOVAR, filter-based annotation for ex1.human (12 variants) requires ~10 minutes for snp132, sift or polyphen. In the new version, it takes 1 second only!</strong> [...]
</p></blockquote>
<p>We like to see open-source software projects thrive, and we&#8217;re happy to contribute where we&#8217;re able to do so.  Thanks to ANNOVAR&#8217;s author, Kai Wang, for providing such a useful piece of software, and to Ion Flux engineer Marine Huang for optimizing the implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/09/27/annovar-performance-optimizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Elastic Analysis Pipelines at Amazon AWS</title>
		<link>http://ionflux.com/blog/2011/08/31/elastic-analysis-pipelines-at-amazon-aws/</link>
		<comments>http://ionflux.com/blog/2011/08/31/elastic-analysis-pipelines-at-amazon-aws/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 08:09:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=54</guid>
		<description><![CDATA[Ion Flux founder and CEO Allen Day will be presenting the topic &#8220;Elastic Analysis Pipelines&#8221; at Amazon&#8217;s AWS Genomics Event on September 22. UPDATE: Slides from Allen&#8217;s presentation are now available here]]></description>
			<content:encoded><![CDATA[<p>Ion Flux founder and CEO Allen Day will be presenting the topic &#8220;Elastic Analysis Pipelines&#8221; at Amazon&#8217;s <a href="http://aws.amazon.com/genomicsevent/">AWS Genomics Event</a> on September 22.</p>
<p>UPDATE: Slides from Allen&#8217;s presentation are now available <a href='http://ionflux.com/blog/wp-content/uploads/2011/08/Amazon-2011-09.pdf'>here<br/><br />
<img src="http://ionflux.com/blog/wp-content/uploads/2011/08/Screen-shot-2011-09-27-at-6.47.19-PM-300x224.png" alt="" title="Screen shot 2011-09-27 at 6.47.19 PM" width="300" height="224" class="alignnone size-medium wp-image-58" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/08/31/elastic-analysis-pipelines-at-amazon-aws/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Consequence Analysis of Ion Torrent&#8217;s Gordon Moore data</title>
		<link>http://ionflux.com/blog/2011/07/28/consequence-analysis-of-ion-torrents-gordon-moore-data/</link>
		<comments>http://ionflux.com/blog/2011/07/28/consequence-analysis-of-ion-torrents-gordon-moore-data/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 06:39:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=37</guid>
		<description><![CDATA[Now that Gordon Moore&#8216;s genome variants are available as part of Ion Torrent&#8217;s publication in Nature this week, we&#8217;re announcing some work we&#8217;ve done analyzing an early, low-coverage version of these data. Ion Flux has been building highly-scalable systems for analyzing personal genome data. One of these systems produces a gene panel consequence report. The [...]]]></description>
			<content:encoded><![CDATA[<p>Now that <a href="http://en.wikipedia.org/wiki/Gordon_Moore">Gordon Moore</a>&#8216;s genome variants are available as part of <a href="http://www.nature.com/nature/journal/v475/n7356/full/nature10242.html">Ion Torrent&#8217;s publication</a> in Nature this week, we&#8217;re announcing some work we&#8217;ve done analyzing an early, low-coverage version of these data.</p>
<p>Ion Flux has been building highly-scalable systems for analyzing personal genome data.  One of these systems produces a gene panel consequence report.  The report&#8217;s purpose is to indicate where variants have been detected in a group of genes and what, if any, are the consequences of the detected variants.  We specifically chose to analyze a small panel of well-known cancer genes, for many of which treatment options are available or are in clinical trials.</p>
<p>You can have a look at some demo output of our <a href="http://ionflux.com/demo/genepanel/">gene panel consequence report</a>.  <strong>We emphasize that the information content of this report is based on pre-publication, low-coverage Ion Torrent data and is insufficient to make consequence claims with any acceptable level of confidence.</strong></p>
<p><a href="http://ionflux.com/blog/wp-content/uploads/2011/07/Screen-shot-2011-07-29-at-2.50.53-PM.png"><img src="http://ionflux.com/blog/wp-content/uploads/2011/07/Screen-shot-2011-07-29-at-2.50.53-PM-300x248.png" alt="" title="Screen shot 2011-07-29 at 2.50.53 PM" width="300" height="248" class="alignnone size-medium wp-image-50" /></a></p>
<p>We encourage you to also have a look at the <a href="http://snpedia.com/index.php/SNPedia">SNPedia</a>-based <a href="http://files.snpedia.com/reports/genome_gordon_moore.html">Promethease report</a> on Moore&#8217;s variants that were published.</p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/07/28/consequence-analysis-of-ion-torrents-gordon-moore-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cloud-based Human Genome Alignment</title>
		<link>http://ionflux.com/blog/2011/06/24/cloud-based-human-genome-alignment/</link>
		<comments>http://ionflux.com/blog/2011/06/24/cloud-based-human-genome-alignment/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 23:00:26 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=32</guid>
		<description><![CDATA[Motivation The purpose of generating the data set described here is two-fold: We want to better understand the sensitivity and specificity of particular sequences/regions in the genome, ultimately leading to faster, higher quality genome alignments. We want to establish time and cost benchmarks for processing large volumes of Human genome data. Methods We processed a [...]]]></description>
			<content:encoded><![CDATA[<h3>Motivation</h3>
<p>The purpose of generating the data set described here is two-fold:</p>
<ol>
<li>We want to better understand the sensitivity and specificity of particular sequences/regions in the genome, ultimately leading to faster, higher quality genome alignments.</li>
<li>We want to establish time and cost benchmarks for processing large volumes of Human genome data.</li>
</ol>
<h3>Methods</h3>
<p>We processed a 50bp step1 sliding-window data set of Human Genome build 19 using Ion Torrent&#8217;s <a href="http://lifetech-it.hosted.jivesoftware.com/tags?tags=tmap">TMAP</a> algorithm (from author <a href="http://www.nilshomer.com/">Dr. Nils Homer</a>).  The specificity/sensitivity analysis is underway right now, and will be published here in a followup blog post.</p>
<h3>Results</h3>
<p>Let&#8217;s look at the gross statistics of the operation:</p>
<h4>How were the data generated?</h4>
<p>Input was a 1bp step, 50mer tiling data set generated from HG19.  We&#8217;re not hosting these data, you can generate them yourself with this <a href='http://ionflux.com/blog/wp-content/uploads/2011/06/make_reads.pl_.txt'>make_reads.pl</a> script.</p>
<p>Output was a set of SAM files from TMAP, which were then post-processed to a more compact form.  See the <a href="#Data">Data</a> section, below.</p>
<h4>How much data?</h4>
<p>3 gigabases * 50 bases/read * 2 strands * 50 samples/basepair = 1.5 terabases of sequence input for a 50x uniform-coverage Human genome.</p>
<h4>How did you do it?</h4>
<p>We&#8217;re using Amazon&#8217;s cloud.  Check out <a href="http://aws.amazon.com/solutions/case-studies/ion-flux/">Amazon&#8217;s case study of Ion Flux</a>.</p>
<h4>How long did it take?  How much did it cost?</h4>
<p>It took less than 1 business day, and we can easily scale down much further.  Contact us [<a href="mailto:media@ionflux.com">media</a>] [<a href="mailto:bd@ionflux.com">business</a>] if you want to know more about our pricing and methodology.</p>
<h3>Data</h3>
<p>Raw outputs are in the S3 bucket here: <a href="http://tmap-mapability.s3.amazonaws.com/">http://tmap-mapability.s3.amazonaws.com/</a>.  Here&#8217;s a snippet:</p>
<pre>
12:+111149052	12:+111149052	3	100
12:-111149052	12:-111149052	1	100
12:+111149053	12:+111149053	3	100
12:-111149053	12:-111149053	3	100
12:-111149054	12:-111149054	1	100
12:+111149054	12:+111149054	3	100
</pre>
<p>Format is like this:<br />
<code><br />
[readChromosomeName]:[readStrand][readPosition]     [targetChromosomeName]:[targetStrand][targetPosition]     [mappingAlgorithm][score]<br />
</code><br />
<code>*Strand</code> fields take the value <code>+</code> and <code>-</code>.  Read lengths are always 50bp, so you can figure out the actual sequence aligned to the target.  The <code>mappingAlgorithm</code> can be one of <code>{1, 2, 3}</code></p>
<h3>Where is the analysis?</h3>
<p>We&#8217;ll be publishing data and some pretty pictures of the sensitivity/specificity analysis soon, stay tuned!</p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/06/24/cloud-based-human-genome-alignment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Human Genomics Crash Course 2</title>
		<link>http://ionflux.com/blog/2011/06/20/human-genomics-102/</link>
		<comments>http://ionflux.com/blog/2011/06/20/human-genomics-102/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 20:26:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=31</guid>
		<description><![CDATA[We’re posting a series of blog posts explaining genetics for the layperson in the context of their health. This is part two of the series, and describes how de novo genome sequencing is done. Other posts in this series: Part 1, genotypes, phenotypes, and polymorphism This video describes how DNA is extracted from a biological [...]]]></description>
			<content:encoded><![CDATA[<p>We’re posting a series of blog posts explaining genetics for the layperson in the context of their health.</p>
<p>This is part two of the series, and describes how de novo genome sequencing is done.  Other posts in this series:</p>
<ol>
<li><a href="/blog/2011/06/13/human-genomics-101/">Part 1, genotypes, phenotypes, and polymorphism</a></li>
</ol>
<p>This video describes how DNA is extracted from a biological tissue sample and prepared to form a &#8220;library&#8221; of bacterial clones that contain small pieces of the sample&#8217;s DNA.  These bacteria are then grown in culture and sequenced using the Sanger method:</p>
<p>After the DNA is read out, it needs to be assembled.  This traditional method of sequencing and assembly focuses on reading out longer sequences, which can be assembled with fewer readouts of each position:</p>
<p>Contrast traditional sequencing with shotgun sequencing and assembly, shown here:</p>
<p>The main consideration when choosing between traditional and shotgun methods of sequencing are, roughly:</p>
<ul>
<li>Price per base sequenced (physical materials)</li>
<li>Price per base assembled (computing costs)</li>
<li>Shotgun method produces sequence at a higher rate, but with more errors and has more assembly problems</li>
</ul>
<p>Lower costs for sequencing each base favor the shotgun method over the traditional method.  Shotgun sequencing is the method of choice today because of decreases in price-per-base sequenced, to be discussed in a later article.</p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/06/20/human-genomics-102/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Human Genomics Crash Course 1</title>
		<link>http://ionflux.com/blog/2011/06/13/human-genomics-101/</link>
		<comments>http://ionflux.com/blog/2011/06/13/human-genomics-101/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 20:18:49 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=29</guid>
		<description><![CDATA[We&#8217;re posting a series of blog posts explaining genetics for the layperson in the context of their health. This is part one of the series, and introduces the fundamental genetics concepts of genotype, phenotype, polymorphism and a vignette of how genomic data can be applied in a clinical setting. We&#8217;ll use some of the excellent [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re posting a series of blog posts explaining genetics for the layperson in the context of their health.</p>
<p>This is part one of the series, and introduces the fundamental genetics concepts of <a href="http://en.wikipedia.org/wiki/Genotype">genotype</a>, <a href="http://en.wikipedia.org/wiki/Phenotype">phenotype</a>, <a href="http://en.wikipedia.org/wiki/Polymorphism_(biology)">polymorphism</a> and a vignette of how genomic data can be applied in a clinical setting.  We&#8217;ll use some of the excellent marketing material from <a href="https://www.23andme.com/">the personal genotyping company 23andMe</a>.</p>
<p><a href="http://en.wikipedia.org/wiki/Gene">Gene</a>s: background.</p>
<p><iframe width="425" height="349" src="http://www.youtube.com/embed/eOvMNOMRRm8" frameborder="0" allowfullscreen></iframe></p>
<p><a href="http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism">SNP</a>s: some of what we find through our data analysis are SNPs.  A SNP is a specific type of <a href="http://en.wikipedia.org/wiki/Polymorphism_(biology)">polymorphism</a>, or sequence variant.  We&#8217;re looking for SNPs with some effect (a functional consequence).</p>
<p><iframe width="425" height="349" src="http://www.youtube.com/embed/5raJePXu0OQ" frameborder="0" allowfullscreen></iframe></p>
<p>Heredity: background.</p>
<p><iframe width="425" height="349" src="http://www.youtube.com/embed/lJzZ7p-47P8" frameborder="0" allowfullscreen></iframe></p>
<p>Genotypes, phenotypes and medicine.</p>
<p><iframe width="425" height="349" src="http://www.youtube.com/embed/jHWJqzlHl3w" frameborder="0" allowfullscreen></iframe><br />
<iframe width="560" height="349" src="http://www.youtube.com/embed/JTIY310FGBU" frameborder="0" allowfullscreen></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/06/13/human-genomics-101/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AWS Ion Flux Case Study</title>
		<link>http://ionflux.com/blog/2011/06/08/aws-ion-flux-case-study/</link>
		<comments>http://ionflux.com/blog/2011/06/08/aws-ion-flux-case-study/#comments</comments>
		<pubDate>Wed, 08 Jun 2011 22:02:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=27</guid>
		<description><![CDATA[Our case-study with Amazon AWS was published today: http://aws.amazon.com/solutions/case-studies/ion-flux/]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://d36cz9buwru1tt.cloudfront.net/logo_aws.gif" title="Amazon AWS Logo" class="alignnone" width="164" height="60" /></p>
<p>Our case-study with Amazon AWS was published today: <a href="http://aws.amazon.com/solutions/case-studies/ion-flux/">http://aws.amazon.com/solutions/case-studies/ion-flux/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/06/08/aws-ion-flux-case-study/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Genome Coverage Irregularities</title>
		<link>http://ionflux.com/blog/2011/05/06/genome-coverage-irregularities/</link>
		<comments>http://ionflux.com/blog/2011/05/06/genome-coverage-irregularities/#comments</comments>
		<pubDate>Sat, 07 May 2011 01:07:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ionflux.com/blog/?p=16</guid>
		<description><![CDATA[We&#8217;ve been recently working with a next-gen sequencing data set. I want to point out a feature of the data that we didn&#8217;t expect to see, namely that genome coverage has very large inter-region variance.  Check out this z-score normalized distribution of read coverage per 1Mbase genomic region: Not beautiful.  There is some some ~600sd [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve been recently working with a next-gen sequencing data set.</p>
<p>I want to point out a feature of the data that we didn&#8217;t expect to see, namely that <em>genome coverage has very large inter-region variance</em>.  Check out this z-score normalized distribution of read coverage per 1Mbase genomic region:</p>
<p><a href="http://ionflux.com/blog/wp-content/uploads/2011/05/z.all_.png"><img class="alignnone size-full wp-image-21" title="z.all" src="http://ionflux.com/blog/wp-content/uploads/2011/05/z.all_.png" alt="" width="480" height="480" /></a></p>
<p>Not beautiful.  There is some some ~600sd outlier causing trouble.  This is interesting simply on merit of being surprising.</p>
<p>It&#8217;s also intersting for practical considerations &#8212; some assumptions built into our software design aren&#8217;t optimized for this kind of very irregular, very high coverage.  The good news is that we can clean this up without too much effort.  The distribution of reads is presumed to be Poisson, for closer to normal at higher levels of coverage.  Let&#8217;s see what happens if we trim off the top decile of data:</p>
<p><a href="http://ionflux.com/blog/wp-content/uploads/2011/05/z.lopass.png"><img title="z.lopass" src="http://ionflux.com/blog/wp-content/uploads/2011/05/z.lopass.png" alt="" width="480" height="480" /></a></p>
<p>Better.  What&#8217;s up with this blip at -10z?  Oh, and yes, it does start to look normal if we bandpass and chop off the bottom decile as well:</p>
<p><a href="http://ionflux.com/blog/wp-content/uploads/2011/05/z.bandpass.png"><img title="z.bandpass" src="http://ionflux.com/blog/wp-content/uploads/2011/05/z.bandpass.png" alt="" width="480" height="480" /></a></p>
<p>There&#8217;s some R code below for how I went about this.  I&#8217;m also attaching a <a href='http://ionflux.com/blog/wp-content/uploads/2011/05/genome_coverage_zscores.txt'>data file</a> of the overall z-score coverage for all ~3K 1Mbase genomic regions.</p>
<p>The format of the file is like:<br />
<code><br />
chr1-20-143	-0.410984633519499<br />
chr13-20-100	-0.271009467956187<br />
chr14-20-69	-0.190734553729517<br />
[...]<br />
</code><br />
Column 2 is the z-score.  Column 1 is the genome region.  <chromosome name>-<bits>-<region offset>.  Bits is used to compute the region size.  Here we use bits=20, so 2**20=1MBase regions.</p>
<p>I might take a closer look what portions of the genome are falling into this blip at -10z.  It&#8217;s at half coverage of the main data mode&#8230; maybe a copy number variant?</p>
<p><code><br />
bin.size = 2**20;<br />
read.length = 90;<br />
bc = read.table('~/Desktop/bin_counts.dat',header=F);<br />
bc.hipass = bc[bc[,1]&gt;quantile(bc[,1],0.1),];<br />
bc.lopass = bc[bc[,1]&lt;quantile(bc[,1],0.9),];<br />
bc.bandpass = bc[bc[,1]&lt;quantile(bc[,1],0.9)&amp;bc[,1]&gt;quantile(bc[,1],0.1),];<br />
coverage.mean = mean(read.length*bc.bandpass[,1]/bin.size);<br />
coverage.sd = sd(read.length*bc.bandpass[,1]/bin.size);<br />
z.bandpass = ((read.length*bc.bandpass[,1]/bin.size) - coverage.mean) / coverage.sd;<br />
z.hipass = ((read.length*bc.hipass[,1]/bin.size) - coverage.mean) / coverage.sd;<br />
z.lopass = ((read.length*bc.lopass[,1]/bin.size) - coverage.mean) / coverage.sd;<br />
z = ((read.length*bc[,1]/bin.size) - coverage.mean) / coverage.sd;<br />
z.label = as.matrix(z);<br />
rownames(z.label) = bc[,2];<br />
write(names(z.label[z.label&gt;30|z.label&lt;(-10),]),"/Users/allenday/Desktop/genome_coverage_z10.txt");<br />
write.table(cbind(as.character(bc[,2]),z),"/Users/allenday/Desktop/genome_coverage_zscores.txt",quote=F,row.names=F,col.names=F,sep="\t");<br />
hist(z.lopass);<br />
hist(z.bandpass);<br />
hist(z.hipass);</code></p>
<p><a href="http://ionflux.com/blog/wp-content/uploads/2011/05/z.lopass.png"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://ionflux.com/blog/2011/05/06/genome-coverage-irregularities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk (enhanced)

Served from: ionflux.com @ 2012-05-19 17:31:31 -->
