Whole Genome Sequencing
Advances in massively parallel sequencing technologies have reduced dramatically the cost to undertake whole genome sequencing of bacteria.
PulseNet participants are currently running a number of pilot projects to implement whole genome sequencing as a routine tools for use in foodborne outbreak investigations and surveillance. We are looking to bring together the best tools and approaches from around the world, and implement them within the PulseNet network.
The following protocols have been developed by PulseNet USA, and are made available to the International community.
|[PDF, 636 KB]||
Updated January 2016
|[PDF, 260 KB]||
Updated May 2015
|[PDF, 37 KB]||
Updated May 2015
We are looking to bring together the best tools and approaches from around the world, and implement them within the PulseNet network. Key collaborators or tools we are exploring include:
- Center for Genomic Epidemiology (external link) (Denmark)
- Integrated Rapid Infectious Disease Analysis – IRIDA (external link) (Canada)
- Whole genome multi locus sequence typing (wgMLST) (external link) (Applied Maths, Belgium)
What is Whole Genome Sequencing (WGS)?
- WGS is the output and the process of generating the full DNA sequence of the genome of a microorganism. For foodborne bacteria, the genome includes the chromosome and any extrachromosomal genomic material such as plasmids. The actual process is also called next generation sequencing (NGS) and is performed by sequencing the DNA in multiple (10- >100 x) small random fragments (‘reads’) that typically vary in size between less than 100 to several 1000 DNA basepairs (bp) (‘massive parallel sequencing’). The average number of times the genome is sequenced is called the coverage. Before the data can be analyzed, it must be cleaned and assessed for quality and often assembled into as few contiguous pieces (contigs) as possible. A completely assembled genome is in one contig for the chromosome and the extrachromosomal elements in each one piece but most often a genome will be assembled in 5- 200 contigs. If a genome is not fully assembled, we do not know the actual sequence of the whole genome but rather 97- 99 % of it. Assembling genomes is a computer intensive process that can be done by aligning the raw sequences against a well assembled sequence of a closely related strain (reference based assembly) or simply by aligning overlapping sequences from different reads without the need of a reference genome (de novo assembly). However, some comparisons of genomes may be performed little assembly (‘assembly free’) with minimal processing. For example, if you want to check if a specific gene, e.g, rpoB for species identification, or a specific set of genes, e.g., those used for multi locus sequence typing (MLST), for which the sequence(s) are known, the raw reads of the strain in question may be queried without assembly for the presence of this gene or these genes.