Genome Academy is a series of stand-alone workshops in genome topics offered to Duke faculty, postdocs, graduate students and staff at little to no fee. Sabbatical scholars and other collaborating visitors may request registration and will be accommodated on a space-available basis. The workshops, taught by faculty and staff from various departments, range from 101-style introductions in genomic technologies, computational approaches and mass spectrometer analyses to more focused topics of molecular analysis. They are intended to introduce Duke community members to the field and build capacity in areas to further their own research.
There is no enrollment cap for online courses. Enrollment for in-person courses is capped at 20 students, and registration closes 7 days before each class. You will receive an enrollment confirmation 3 days before each class. In the event you are unable to attend your registered course(s), please contact Matthew Franco. Many in-person courses have a waitlist and we can offer your spot to another person.
Fall 2024 Courses:
Fall 2024 Courses
Introduction to Mass Spectrometry-Based Proteomics
Instructor: Erik Soderblom, Ph.D.
Date and Time: November 4, 12pm-3pm
Location: 2240 CIEMAS
Cost: Free
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides and proteins for both basic and clinical research projects. This Genome Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification). Finally, the use of open source software tools for interpretation of these datasets will be discussed.
Introduction to DNA Sequencing
Instructor: Devi Swain Lenz, PhD
Date and Time: November 11, 1pm-2pm
Location: 2240 CIEMAS
Cost: Free
During the past two decades, a new generation of high-throughput DNA sequencers has transformed biomedical and biotechnology research. These new technologies have fostered the development of a wide range of applications to basic and clinical research, including SNP discovery, transcriptome profiling, genome sequencing, and epigenetics. The goal of this introductory course is to teach the basic principles of next generation sequencing technology (NGS) and to present an overview of various library preparations and their applications. Advantages and limitations of various methods will be discussed and compared across technologies/platforms (Illumina, PacBio, Oxford Nanopore, and startup technologies). This course will also provide an introduction to primary data analysis and data quality assessment steps. Attendees will become familiar with NGS technology terms and fundamentals, NGS data format and quality, and will acquire a better understanding of how to choose a suitable NGS sequencing method or instrument for their study.
Past Course Offerings
Liquid chromatography coupled with mass spectrometry (LC/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and/or online resources geared toward LC-MS based datasets. This GCB Academy session is designed as a complement to GCB Academy course “Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses” (Nov 7th) and GCB Academy course “Experimental Design: Get the most out of your proteome” (Nov 8th) and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic or Metabolomic Datasets with the Shared Resource. This first portion of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second portion of the course will cover common proteomic and metabolomic data analysis strategies from supplemental data (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative proteomic workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels, calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro.
This course will provide an in-depth overview of experimental design, focusing on proteomic analysis of protein post-translational modifications (PTMs) and protein expression in (but not limited to) mammalian cells, tissues and biofluids. Topics will be aimed at getting maximum biological information from your samples. We will discuss methods for enriching subproteomes and PTMs; best practices for insuring sample integrity and avoiding common contaminants that will be carried downstream; and how to be aware of additional factors that might influence reproducibility across biological replicates. In addition, we will discuss where discovery-based or targeted proteomic analyses may be most appropriate. Feel free to bring specific questions about your favorite proteins, model systems, or biological matrices. Prerequisite: Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses, encouraged, but not required.
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides, proteins and metabolites for both basic and clinical research projects. This GCB Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics and metabolomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein/metabolite chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification).
This seminar will offer an introductory overview of key considerations and best practices in establishing and maintaining clinical biospecimen collections for genomic and precision medicine research. Topics covered will include: basic concepts in biobank and cohort research; role of standardization, harmonization, and quality control; maintaining unique sample identification and robust chain-of-custody tracking; need for secure information and inventory management systems for samples and data; important considerations in repository design; and an overview of biobanking resources at Duke and beyond.
This half-day tutorial will provide you with a better understanding of the data processing and analysis methods that are used in RNA-seq analysis. We will cover topics such as data quality control, normalization, and calling differentially expressed genes. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
*Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience)
During the past two decades, a new generation of high-throughput DNA sequencers has transformed biomedical and biotechnology research. These new technologies have fostered the development of a wide range of applications to basic and clinical research, including SNP discovery, transcriptome profiling, genome sequencing, and epigenetics. The goal of this introductory course is to teach the basic principles of next generation sequencing technology (NGS) and to present an overview of various library preparations and their applications. Advantages and limitations of various methods will be discussed and compared across technologies/platforms (Illumina, PacBio, Oxford Nanopore, and startup technologies). This course will also provide an introduction to primary data analysis and data quality assessment steps. Attendees will become familiar with NGS technology terms and fundamentals, NGS data format and quality, and will acquire a better understanding of how to choose a suitable NGS sequencing method or instrument for their study.
Liquid chromatography coupled with tandem mass spectrometry (LC/MS/MS) continues to be the key technology for the qualitative and quantitative analysis of peptides and proteins for both basic and clinical research projects. This GCB Academy session is designed as an introduction for researchers needing to expand their knowledge of the use of LC/MS/MS-based methods for proteomics, and thus help researchers better understand how these technologies can help inform their research goals. Background material in basic protein chemistries will be provided, with an emphasis on how to use the physicochemical characteristics of these biomolecules for sample preparation specifically for LC/MS/MS analyses. In addition, the fundamentals of liquid chromatography and mass spectrometry will be discussed to enable students to understand the nuances of the experimental designs required to address their specific project. Real-world examples will be used to illustrate sample preparation and analysis strategies, including basic identification projects, characterization of Post-Translational Modifications and differential expression analyses (including 'omic biomarker discovery and targeted biomarker verification). Finally, the use of open source software tools for interpretation of these datasets will be discussed.
This 4-hour tutorial will first spend time discussing important considerations for the design of your study and the collection of your samples. It will also introduce you to the data processing and analysis methods that are used in 16S microbiome analysis. We will cover topics such as data quality control, diversity indices, and calling differentially abundant microflora. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
This 4-hour tutorial will provide you with a better understanding of the data processing and analysis methods that are used in RNA-seq analysis. We will cover topics such as data quality control, normalization, and calling differentially expressed genes. We will provide hands-on experience that will allow you to go back to your lab and work with your own data.
*Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience).
Computing has become an integral and indispensable part of genomic biology. This course teaches basic skills in scientific computing, with a focus on applications for genomic science, aimed at making you more productive, your computational work more reliable, and your research easier to reproduce and extend, including by your future self. The course includes introductions to (1) using Unix shell commands to efficiently find, organize, and stage data for analysis; (2) basic data types, control flows, functions, and 3rd party packages for the Python programming language commonly encountered in scientific computing; (3) using version control to manage with confidence the numerous directions research code takes from inception to publication; and (4) effectively using a high-performance computing cluster to run computational analyses. The format of the course is inspired by the acclaimed Software Carpentry-style bootcamps. Hence, this is a fully hands-on workshop, and students are expected to bring a laptop.
*Prerequisites: “Introduction to Unix” (or equivalent experience)
Course Website: https://duke-gcb.github.io/SciComp-Nov-2019/
This 4-hour hands-on tutorial will provide you with experience working with data from a single-cell RNA-Seq experiment. We will cover quality control, filtering, normalization, clustering, differential expression and mark identification analysis.
*Pre-requisites: Must have previously taken the GCB Academy “RNA-Seq Analysis” course.
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and other online resources to perform proper data analysis, interpretation, and forward-looking experimental design. This GCB Academy session is designed as a complement to “Introduction to Mass Spec Technologies for Proteomics and Metabolomics” and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic or Metabolomic Datasets with the Shared Resource. The first section of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second section will cover the interpretation and meta-analysis of data provided from the quantitative proteomics and metabolomics pipelines (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels (e.g. peptide versus protein), calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro (SAS Institute, Cary, NC). Finally, we will cover the use of Skyline as a tool for targeted quantitative proteomics workflows. This will include utilizing Skyline for verification following LC-MS based discovery experiments, as well as a brief introduction to using Skyline to design and interpret targeted proteomics and metabolomics analysis. This portion will utilize hands-on analysis of raw data collected in the Shared Resource. For advance training in Skyline, the tutorials on the Skyline software web site are highly recommended (https://skyline.gs.washington.edu/labkey/wiki/home/software/Skyline/page.view?name=tutorials).
Prerequisites:
1) GCB Academy course; “Introduction to Mass Spec Technologies for Proteomics and Metabolomics” (optional, but preferred)
2) Personal laptop with Scaffold (http://www.proteomesoftware.com/products/scaffold/download/), Skyline (https://skyline.gs.washington.edu/labkey/project/home/software/Skyline/…?), and JMP Pro (downloaded from Duke OIT, https://software.oit.duke.edu/comp-print/software/index.php) pre-installed.
Metabolomics has emerged as a powerful approach for characterization of molecular systems and also development of biomarkers for disease progression or diagnosis. Broadly, metabolomics is the characterization of small molecules by mass spectrometry and can include both "unbiased" or non-targeted techniques, as well as "targeted" methods. The measurement of metabolites by mass spectrometry is also directly translatable to the clinic; many common assays such as amino acids, acylcarnitines, vitamin D epimers, steroid hormones, and drugs of abuse are all clinical mass spec assays. Whether developing a novel assay or using a validated metabolite assay, the most important aspect for a successful metabolomics study is deciding which technique to use and understanding the data each approach will likely be able to provide. In this course, we will discuss sample types which are amenable to metabolomics, and utilize case studies to discuss the critical differences in targeted and non-targeted metabolomics and an investigator might choose one over another. We will use example datasets to demonstrate techniques for analysis of high dimensional metabolomic data. We will also cover the methods needed for accurate quantification, how to enable longitudinal translation of metabolomics assays, and how a targeted mass spec assay may differ in utilization from a clinical ELISA.
An introductory discussion on what is involved in designing and analyzing the microbiome. We will cover study design, sample collection/storage/preparation/sequencing of 16S rDNA and provide a 3-4 hour analysis using basic QIIME data analysis.
This 3-hour workshop will be an introduction to microbiome research. We will cover the basics of study design, sample collection, preparation, sequencing, and analysis of a microbiome high-throughput amplicon sequencing (e.g. 16S rRNA) experiment.
PCR, quantitative PCR and droplet-digital PCR technologies will be discussed along with examples on which technology would best fit your research.
This course has two objectives. First, it seeks to develop an understanding of risk prediction and classification in the Omics setting. Second, for researchers who plan to develop risk models, this course seeks to provide concrete steps for study design, analysis, and interpretation. To accomplish these goals, we will discuss how different aspects of a statistical model can provide measures of association or measures of predictive accuracy. This distinction is important in understanding how developing a model for association/etiology/causal inference is conceptually different from using the model to predict. We will then discuss risk models in the conventional setting: larger sample sizes with a smaller number of predictors. We will cover study design, statistical models, and performance metrics. The course seeks to develop an appreciation of challenging considerations in the field, but also seeks to provide clear steps on how to proceed. Finally, we will review areas of active research and in what direction the field is moving. After establishing foundations, we will move into the Omics realm, which is characterized by smaller samples sizes and thousands of predictors. Prediction models in Omics often use machine-learning techniques, so we will cover some common machine-learning techniques and what makes them different from more conventional models. We will review current best practices with an emphasis on estimating performance. This course will not include any hands-on coding because of time limitations, but this will be the topic of a future course. The course focuses on understanding the most important aspects of risk prediction and classification.
Liquid chromatography coupled with mass spectrometry (LC/MS) is a versatile tool for the qualitative and quantitative characterization of peptides, proteins and metabolites for both basic and clinical research projects. One of the most important considerations in being able to translate LC-MS datasets into meaningful biological observations is to effectively use open source software packages and/or online resources geared toward LC-MS based datasets. This GCB Academy session is designed as a complement to GCB Academy course “Fundamentals of Mass Spectrometry for Proteomic and Metabolomic Analyses” (Nov. 7) and GCB Academy course “Experimental Design: Get the most your of your proteome” (Nov. 8) and is intended for users of the Proteomics and Metabolomics Shared Resource who have or plan on generating LC/MS based Proteomic Datasets with the Shared Resource. This first portion of the course will focus on the effective use of Scaffold to characterize qualitative proteomic datasets. This will include an overview of Scaffold and features such as interpretation of spectral matches at a protein or peptide level, gene ontology classification, homology matching, spectral count data, and data export. The second portion of the course will cover common proteomic data analysis strategies from supplemental data (typically .xlsx file formats from Rosetta Elucidator) provided as part of the Shared Resource’s quantitative proteomic workflows. This will include an overview of the typical features of a quantitative data return document, various data summarization levels, calculating peptide/protein relative fold-changes and p-values, exporting data for motif analysis (PTM specific datasets), and performing Principle Component Analysis (PCA) and 2D Clustering within JMP Pro.
Critical review of a Proteomics data analysis presents unique challenges because of the complex workflows involved in going from raw mass spectrometry data to results interpretation. Using tools discussed in the “Bioinformatics Tools” course, this class will work to ‘deconstruct’ a proteomics experiment which has had flaws in the analysis and interpretation. By finding the errors in data analysis and interpretation, the goal of this case study will be to become more aware of many common pitfalls in proteomics data analysis, and enhance your skills in reviewing proteomics datasets which are becoming much more common in the peer-reviewed literature. The material will be guided, but hands-on participation is expected. Laptops required. Prerequisites: Attendance at “Fundamentals” and “Experimental Design” classes recommended but not required, attendance at “Bioinformatics Tools” highly recommended.
This 90-minute course will provide attendees with an overview of general principles of genetics, genomics and molecular biology, and clinical applications and technologies currently used in clinical practice. In particular, the course will provide an overview of genomics, genome-wide association studies and other large initiatives and a range of testing technologies for diagnosis and treatment. Introduction of new technologies such as liquid biopsies will also be briefly discussed.
In this 3-day workshop, participants will prepare stranded RNA-Seq libraries and will have the opportunit to generate and analyze expression data. This hands-on workshop consists of two parts: 1) sample preparation and data generation (wet lab) and 2) data analysis. In the first part, participants will be trained at estimating RNA sample quality, generating stranded directional RNA-Seq libraries, and assessing RNA-Seq library quality. In the second part, participants will learn how to perform basic bioinformatics analyses on the RNA-Seq data, including data QC, mapping reads, and differential expression analysis. For more in-depth analyses, the GCB Academy course on RNA-Seq analysis is recommended.
Pre-requisites: Attendees should have basic laboratory skills such as lab safety principles, best RNA practices, pipetting, and dilutions.
Single cell expression profiling can be facilitated through the automation of the Fluidigm C1 System. The system allows one to capture single cells and explore gene expression profiling through the use of qPCR or RNA sequencing analysis. The technology will be discussed as well as how it can be applied in your research. Pre-requisites: Basic understanding of molecular biology. PCR, quantitative PCR and droplet-digital PCR technologies will be discussed along with examples on which technology would best fit your research.
Single cell expression profiling can be facilitated through the automation of the Fluidigm C1 System. The system allows one to capture single cells and explore gene expression profiling through the use of qPCR or RNA sequencing analysis. The technology will be discussed as well as how it can be applied in your research. Pre-requisites: Basic understanding of molecular biology.
This hands-on tutorial will introduce the data processing steps for the purpose of calling variants from whole exome sequencing data. We will go step-by-step through the best practices guide from the Genome Analysis Toolkit. After completing this tutorial, you should feel comfortable calling variants from data generated in your own labs.
Pre-requisites: "Introduction to Unix" and "Introduction to Scientific Computing for Genomics" (or equivalent experience).