June 18th, 2019 - BioFrontiers Institute - Boulder, CO

This year's conference will feature an exciting array of speakers who approach data visualization from many perspectives:

Maria Nattestad - DNAnexus -> Google - CV (Keynote)


Data Visualization for Genomic Insights


We will explore the many purposes of visualizing data and evaluate these as they apply to genomics and scientific research in general. Using several case studies, we will dive into examples of how visualization is tricky to do well in genomics due to the complex data, but for that same reason, it is particularly powerful and necessary to show that complex data in an intuitive way. Visualization is an important tool to shift perspectives and achieve new insights, even from the same data. Our case studies will include seeing long-range structural variations line up with copy number variants to find a missing variant call that unlocks the history of an oncogene amplification. We will see that traditional genome browsers hide the best that long-read sequencing data has to offer. I will show you how I solved a painful problem I had encountered in my thesis research with accessible software and built a business from it. We will also look at patterns of visualization and how we can align the way software works with how people actually think about exploring data, shortening the time from question to answer. Finally, we look to the future for how data visualization can help us build a better future in bioinformatics.

Deanna Church - Inscripta Inc. - CV


Challenges in Exploring and Visualizing Mammalian Genomes


The past decade has seen great strides in exploring mammalian genomes. Despite these advances, significant challenges remain before we can fully realize the initial promises of the human genome project. A critical component of this challenge relates to how we represent and visualize genomic sequence and annotation. As novel techniques for perturbing and measuring these perturbations at the level of the single cell advance, we remain hampered by data representations that don’t represent the true complexity of mammalian biology. The decision to represent mammalian genomes as single haploid (typically mosaic haploid) representations was understandable when we had a limited understanding of variation. We now have a richer understanding of variation and are beginning to explore how chromatin structure and interactions can also influence biology. It is time to consider models that more fully represent this complexity. However, the implementation of these models will require new levels of data abstraction and representation. During this talk, we will explore some of these challenges and opportunities.

Ryan Layer - BioFrontiers Institute - CU Boulder - CV


SAMPLOT: Rapid Structural Variant Visualization for Short, Long, Linked, and Phased Reads


Structural variant (SV) detection remains a challenging problem. Biological and technical artifacts complicate our ability to differentiate signal from noise causing the accuracy of SV detection to remain worse than that for smaller variants. As a result, visualization has become a standard step in the SV validation process. Inspecting the raw alignment data, genome annotations, and the region surrounding a candidate SV can help remove false positives and confirm breakpoints and genotypes.

We created SAMPLOT to support rapid, focused visualization of SVs, that highlights the alignment signals that are hallmarks of SVs including sequence coverage, discordant paired-end reads, split reads, and alignment gaps. For each SV, SAMPLOT creates a subplot for each sample that includes a sample-specific coverage profile and alignments organized along the y-axis by “event size” (insert size for paired-end reads, gap size for split-reads). This organization creates distinct clusters of normal and discordant alignments. Discordant alignments are color-coded by the type of SV they support. SAMPLOT also supports the simultaneous visualization of:

  • short Illumina paired-end reads, long reads from PACBIO and Oxford Nanopore, and linked reads from 10X
  • phased BAMs with any number of haplotype assignments
  • genome annotations such as transcripts, repetitive regions, and mapping quality tracks

With SAMPLOT, users can quickly and easily inspect SVs across many samples and technologies to determine if any sample contains discordant alignments that support the candidate SV, which can be especially helpful when evaluating de novo variants in families and somatic SVs tumor/normal samples.

Source code and documentation at

Sanjida Rangwala - NCBI - NIH - CV


What Can You See at NCBI? How to Complete the Picture with NCBI’s Genome Visualization Tools


For 30 years, NCBI has been the premier repository for biological sequence data and an authoritative source of sequence and genome annotation. Users can visit NCBI’s website to access RefSeq transcript, protein, and genome sequence data, which form the foundation for NCBI’s Gene database and genome annotations. NCBI’s variation databases, including ClinVar and dbSNP, provide information about pathogenic and benign human variation mapped to the human reference assembly. NCBI also hosts a large store of experimental sequence data, including mRNA and DNA sequence data in the GenBank INSDC database, and higher throughput data (microarray, short read, ChIP-seq, RNA-seq) archived in GEO (Gene Expression Omnibus), SRA (Sequence Read Archive), and dbGaP (Genotypes and Phenotypes). NCBI’s genome browser integrates molecular sequence data from NCBI’s sequence and variation databases and outside annotation, and allows users to add their own data or third-party data, resulting in a rich visual data display relative to a reference genome. Here, I will show you how the Genome Data Viewer and other NCBI resources can be used to create visuals to provide insight and show connections among genes and transcripts from the Gene database, variation data from dbSNP, epigenomic data from GEO, and gene-function relationships from dbGaP. The unrelenting growth of biological data and the development of new data structures pose challenges in search and visualization, needing creative solutions for the future.

Danielle Szafir - Dept. of Information Science - CU Boulder - CV


Perceptually-Driven Approaches for Visualizing Biological Data


Visualization helps people make sense of large and complex data, leveraging the power of the human visual system to enable serendipitous insights into information. By understanding how the visual system processes visual information, we can craft systems and techniques that provide more scalable and accurate insight into biological data. In this talk, I will discuss three examples bridging perception and visualization for exploratory data analysis. The first example explores how understanding our abilities to visually aggregate data informed a scalable whole genome alignment visualization. I then quantify how visual design influences the information that a display conveys, leading to a system for analyzing machine learning data from biochemistry. Finally, I model interactions between color and design for web applications, including applications in structural biology. Through this work, we can develop generalizable knowledge of perceptual science for data science, blending human- and data-centered practices to improve the accessibility and comprehensibility of information.

Dan Larremore- BioFrontiers Institute - CU Boulder - CV

POI Worksheet 0119.jpg

High-Impact Data Visualization in Four Steps


Strong data visualization can clarify your science, emphasize your message, build trust with an audience, and even inform their decisions. Of course, the opposite is also true: bad data visualization can muddle your message, convince your audience to tune out and refresh Twitter, or close your paper to start writing up a negative peer review. In this talk, I'll present a set of four steps that my lab follows when making scientific figures, covering broad ideas as well as the nitty gritty of color choice, prototyping, simulating your audience, and more. I'll explain why we follow these steps by using a large set of real (and published) figures as examples, discussing intended and unintended consequences, along the way.