INTRODUCTION TO NEXT-GENERATION SEQUENCING DATA AND ANALYSIS COURSE

08 May 2015 - June 23-26, 2015 | CIBIO-InBIO, Vairão, Portugal

This course provides an introduction to the basics of next generation sequencing data and analysis, and includes hands-on exercises throughout.

Topics covered include:

basics of the Linux command line;
DNA/RNA preparation and sequencing technologies, including reduced-representation sequencing;
what to do with newly-delivered sequencing data;
assembling sequences of varying sizes and complexities;
variant discovery;
annotation of small to large datasets;
ancient and variable-coverage DNA;
transcriptomes and gene expression analysis; and
walkthroughs of example analyses.

Emphasis throughout is on understanding fundamentals, and on developing skills for design of practical sequencing projects and analysis of sequencing data in light of research questions and biological and practical limitations.

Class time is limited, so students will be given more information in materials than will be covered in lectures and exercises. These materials are also considered essential for understanding NGS data analysis, but will not be covered directly in class so that we have sufficient time for exercises. Online sources of information will be emphasised.

PROGRAMME

Tuesday

9.30-12.30 | Introduction to the Linux command line

Entering a command, redirecting output, and pipes
Using your command history
Downloading a program and building it
Why write scripts
BioPerl, BioPython, R/BioConductor
How can I learn more?
Introduction to SeqAnswers, BioStars, StackOverflow, blogs

14.00-17.00 | Introduction to Next-Generation Sequencing

What is a genome, and what is a read, in light of NGS technologies?
Library construction
Sequencing technologies
Reduced-representation sequencing and OTUs
Trimming and cleaning new sequencing data
** Hands-on examination and cleaning of NGS data
What questions can we answer just with reads?

Wednesday

9.30-12.30 | Introduction to Assembly

Small, medium and large assembly tasks
Overlap-consensus, de Briujn and hybrid approaches
** Hands-on assembly
Reference-guided assembly
Challenges facing all assembly tasks
Assembly validation
** Hands-on assembly validation

14.00-17.00 | Introduction to Variant Discovery

Types of variants
Read mapping and direct examination of mappings
** Hands-on read mapping and examination of BAM files
NGS challenges to variant discovery: inference required
VCF files
Strength of variant calls and variant filtering
Online variant catalogs
Typical variant pipeline with GATK
** Hands-on variant calling and filtering
Typical reduced-representation variant pipeline

Thursday

9.30-12.30 | Introduction to Annotation

What questions do you want to answer?
Quick similarity searches with Mummer and Blast
GFF and GTF files
Annotation pipelines
** Hands-on examination of assembly with annotation
Homology searches for gene function
** Domain-based searches
What do these variants affect?
** Examine assembly, annotation and VCF files
** Summaries with Bedtools and snpEff
Annotation of transcriptomes and reduced-representation assemblies
Annotation of repetitive content and assembly masking
Uncertainty in annotation

14.00-17.00 | Introduction to Transcriptomics and Gene Expression

What questions do you want to answer?
Biological considerations and limitations
Transcriptome assembly
** Identifying isoforms
Basics of RNA read mapping
** Examine RNA-seq read mapping
Basics of statistical analysis and sources of noise
Sampling design, replication, and sequencing design
** Example of analysis with and without correct error structure

Friday

9.30-12.30 | Working with Low- and Variable-Coverage DNA

In what condition and how accessible is the DNA? RNA?
What do we now mean by "coverage"?
Amplification and capture
Ancient DNA and directly analysing reads
The DNA is fine: What is gained and lost with low-coverage sequencing
Working with single cells or single samples
Pooling: What is gained and what is lost

14.00-17.00 | Putting It All Together: Example Projects and Workflows

Working out 3-4 examples:

- Population structure, divergence and diversity for a non-model species;
- Genomic signatures of selection;
- Gene expression differences between natural treatments;
- Ancient DNA.

LECTURERS
Douglas Scofield
John Archer
Antonio Muñoz

DATES & LOGISTICS
The course will take place at CIBIO-InBIO, in Vairão Campus - Room 2, June 23-26, 2015, from 9.30-12.30 and 14.00-17.00 (24 hours).
The lectures will be in English.

REQUIREMENTS
The course is aimed for postgraduate researchers with particular interest in the analysis of NGS, genomics and transcriptomics data (preferably already working on such kind of projects).
Pending upon the number of applications received, PhD students directly involved in NGS & genomic studies can also be accepted.
All participants must bring their own personal laptop.

REGISTRATION
Deadline for registration is May 25, 2015.
Participation is free of charge.
The course accepts a minimum of 15 and a maximum of 20 attendees, selected according to their track record and the relevance of the course for their research and/or work.
Preference will be given to CIBIO-InBIO researchers, but other applications can be accepted.

To register, please send an email to newgen.course@cibio.up.pt with a one-paragraph explanation as to how the course will reinforce your research and/or work (preferably including a link to your web profile/short cv).

COURSE ORGANIZERS
Catarina Ginja
Fredrik Oxelfelt
John Archer
Antonio Muñoz