"Targeted Exon Sequencing by In-Solution ... - Wiley Online Library

10 downloads 0 Views 714KB Size Report
genomic DNA libraries using an in-solution hybrid selection approach for sequencing on ... amplification followed by Sanger sequencing (Sjöblom et al., 2006).
Targeted Exon Sequencing by In-Solution Hybrid Selection

UNIT 18.4

Brendan Blumenstiel,1 Kristian Cibulskis,1 Sheila Fisher,1 Matthew DeFelice,1 Andrew Barry,1 Tim Fennell,1 Justin Abreu,1 Brian Minie,1 Maura Costello,1 Geneva Young,1 Jared Maquire,1 Andrew Kernytsky,1 Alexandre Melnikov,1 Peter Rogov,1 Andreas Gnirke,1 and Stacey Gabriel1 1

Broad Institute, Cambridge, Massachusetts

ABSTRACT This unit describes a protocol for the targeted enrichment of exons from randomly sheared genomic DNA libraries using an in-solution hybrid selection approach for sequencing on an Illumina Genome Analyzer II. The steps for designing and ordering a hybrid selection oligo pool are reviewed, as are critical steps for performing the preparation and hybrid selection of an Illumina paired-end library. Critical parameters, performance metrics, and C 2010 analysis workßow are discussed. Curr. Protoc. Hum. Genet. 66:18.4.1-18.4.24.  by John Wiley & Sons, Inc. Keywords: exon sequencing r hybrid selection r mutation discovery r DNA sequencing r targeting

INTRODUCTION The ability to identify rare polymorphisms in the human genome is crucial for discovering genetic associations and causative mutations related to human disease. With the completion of the Human Genome Project (Lander et al., 2001; http://www. genome.gov/10001772), the framework was set for establishing a deep understanding of genomic variation, its structure, and its role in human disease. While genomic sequencing is the most powerful tool for identifying a variety of genetic variants, whole-genome sequencing of thousands of samples remains prohibitively expensive, thus requiring targeted approaches to sequencing genomic regions of interest (Ng et al., 2009). Traditionally, targeted sequencing has been performed using single-plex PCR-based ampliÞcation followed by Sanger sequencing (Sj¨oblom et al., 2006). For a multitude of reasons including cost and logistic workßow, PCR-based targeting is no longer a costeffective match for many of the new next-generation sequencing technologies emerging on the market. In recent years, several methods of exon targeting by hybrid selection have been developed by leveraging the massively parallel synthesis of long oligonucleotides on programmable arrays (Li et al., 2008; Gnirke et al., 2009). This relatively inexpensive method for simultaneous synthesis of tens of thousands of unique oligos has led to highly multiplexed methods for exon sequencing. By moving to array-based oligonucleotides ranging from 60 to 170 bp in length, precise targeting of relatively short exons across many genes is possible. Programmable oligonucleotide microarrays can be used to capture and enrich exons by either solidphase or solution-based hybrid selection. In the solution-based method developed and implemented at the Broad Institute, PCR-ampliÞed DNA probes are then transcribed into biotinylated RNA, which is hybridized in-solution with a randomly sheared genomic DNA library. Hybridized DNA-RNA duplexes are pulled down using streptavidin-coated magnetic beads. Immobilized beads are then washed, removing non-hybridized DNA. Current Protocols in Human Genetics 18.4.1-18.4.24, July 2010 Published online July 2010 in Wiley Interscience (www.interscience.wiley.com). DOI: 10.1002/0471142905.hg1804s66 C 2010 John Wiley & Sons, Inc. Copyright 

HighThroughput Sequencing

18.4.1 Supplement 66

The remaining captured DNA is subsequently denatured from the immobilized RNA, enriched by PCR, and sequenced on Illumina’s Genome Analzyer II (GAII) sequencing system (Gnirke et al., 2009). Outlined in this unit are the steps for performing solution-based hybrid selection of exons and preparing enriched libraries for paired-end Illumina sequencing on the Illumina GAII. Steps include genomic DNA shearing (Basic Protocol 1); Illumina paired-end library construction (end repair, A base addition, paired-end adapter ligation, PCR enrichment, and clean-up; Basic Protocol 2); hybrid selection (Basic Protocol 3); and library quantiÞcation for optimized cluster density using qPCR (Basic Protocol 4). In the Support Protocol, we describe several recommendations for performing read alignment, calculating meaningful hybrid selection metrics, and visualizing and assessing sequence data for overall protocol performance. SpeciÞc challenges and points of sensitivity are further discussed, along with speciÞc performance metrics that can be expected by following the published protocols.

STRATEGIC PLANNING Choosing Targets and Baits The process for choosing targets is fairly straightforward, and several standard capture panels are commercially available, such as the Agilent SureSelect Human All Exon Kit. In choosing custom targets for hybrid capture, there are two major areas of consideration: target uniqueness and target size. The genome-wide uniqueness of the capture targets must be considered. If a region is not sufÞciently unique in the genome, it may not be able to be aligned uniquely with short reads. Therefore, although the DNA fragments may be physically captured and sequenced, it is not straightforward to analyze the data. A more critical, related problem is targeting regions of high copy number in the genome, such as mitochondrial genes and ALU repeats. Targeting these regions is detrimental, not only because the results are difÞcult to interpret, but because the high representation of these regions in the DNA causes them to be oversampled. As an example, in one recent capture experiment, 7% of reads mapped to targeted mitochondrial genes, even though those genes, represented only 0.1% of the target set (unpub. observ.). The total size of the targets to be captured has an effect on the efÞciency of the hybrid selection, with smaller target sets causing a smaller fraction of reads to align to the target. With large target sets, such as whole-exome capture, over 80% of the reads typically align to the desired target. However, with smaller sets of a few hundred genes, often only 50% to 70% of reads may align to the target. Once a set of targets is chosen, baits are typically tiled across the target with a small overlap between baits, as seen in Figure 18.4.1. The Þgure also illustrates the nomenclature commonly used to refer to regions surrounding the targets and baits.

Targeted Exon Sequencing by In-Solution Hybrid Selection

Biotinylated RNA Baits Solution-based hybrid selection involves the hybridization of a prepared paired-end Illumina library (“pond”) with a pool of biotinylated RNA (“baits”). These RNA baits are generated from unique oligonucleotides synthesized on an Agilent programmable DNA microarray. Up to 55,000 unique oligos can be synthesized simultaneously; they are 150-200 bp in length and include 15-bp universal PCR primer sites at the extreme ends. Following synthesis, the oligos are stripped from the array substrate and are universally PCR ampliÞed into double-stranded DNA. A second round of PCR incorporates a T7 promoter site into the amplicon, which is used to transcribe the DNA into single-stranded, biotinylated RNAs. This process has recently been commercialized by Agilent Technologies and is currently being marketed as the SureSelect Target

18.4.2 Supplement 66

Current Protocols in Human Genetics

on target off target

off target

baits (120-mer)

near bait on bait (target +/- 250 b)

near bait (target +/- 250 b)

Figure 18.4.1 Targets, baits, and nomenclature. Sequencing reads can fall into several categories depending on where they align along a targeted region of the genome. Bases aligning to the exact targeted sequence are considered “on target.” Because RNA bait sequences can hang off the ends of the actual target, aligned bases can be “off target” but “on bait.” Additionally, because randomly sheared fragments vary in size, it is realistic to expect a proportion of aligned bases to be “near bait,” which is considered ±250 bp of the bait sequence. Metrics calculating the percentage of bases falling into these categories are helpful in understanding the performance of a hybrid selection experiment. For the color version of the figure, go to http://www.currentprotocols.com/protocol/hg1804.

Enrichment System. Agilent has developed a streamlined web interface for uploading custom probe sequences that can be synthesized and manufactured into a ready-to-use biotinylated RNA pool (https://earray.chem.agilent.com/earray).

DNA Quality and Quantity DNA quality and quantity must be considered when compiling a cohort for hybrid selection sequencing. If available, DNA samples with more than 3 μg of high-quality DNA should be used. Although whole-genome DNA extracted from cell lines and blood is preferred, whole-genome ampliÞed DNA can be used as long as the starting DNA is not highly degraded. Before beginning a hybrid selection study, all samples should be quantiÞed and an aliquot from each sample should be assessed for quality by gel electrophoresis or bioanalyzer. DNA FRAGMENTATION Genomic DNA must be fragmented in order to capture and sequence exons. First, because the goal is to sequence only exons and as little background genome as possible, DNA must be fragmented to a size that allows maximum sequence coverage of targeted exons with minimal sequencing of neighboring intronic regions. Because exons average ∼160 bp in length, shearing DNA to a mean length of ∼150 bp enables the efÞcient capture and sequencing of these small target regions. Second, for optimal clonal cluster ampliÞcation on the ßow cell, DNA fragments should range from 200 to 500 bp in length. With a tight fragment size distribution, uniformly sized clusters are more easily differentiated from one another on the GAII, ultimately increasing sequence yields. Several methods for randomly shearing DNA are in use today, including nebulization using compressed air, sonication, and hydro-shearing. These methods typically produce a wide size distribution and often require the use of a preparative gel and size selection to obtain the tight size distribution preferred for exon hybrid selection. To eliminate material loss and the time-consuming process of gel-based size selection, we routinely use a more recently developed DNA shearing technology called Adaptive Focused Acoustics (AFA). The Covaris S-series Sample Prep Station is highly adjustable and allows genomic

BASIC PROTOCOL 1

HighThroughput Sequencing

18.4.3 Current Protocols in Human Genetics

Supplement 66

Band size (bp)

genomic sheared DNA DNA

2,000 1,500 1,000 700 500 300

150

50

Figure 18.4.2 Sheared genomic DNA size distribution. High-quality genomic DNA was sheared using the Covaris instrument. Unsheared gDNA (100 ng) and sheared DNA (200 ng) were run in parallel on a 2% agarose gel. After shearing, the bulk of the fragments should run between ∼100 and 400 bp.

DNA to be sheared into a tight band averaging 150 bp with a distribution of ∼100 to 400 bp (Fig. 18.4.2). The Covaris instrument uses adjustable acoustic energy that is focused into a glass vial containing the diluted DNA samples. The focused energy creates tiny bubbles that constantly collapse in a process called cavitation, which shears the DNA. By adjusting the energy level and the exposure time, genomic DNA can be sheared to many size distributions.

Materials DNA sample (e.g., see APPENDIX 3B) Nuclease-free water 70% (v/v) ethanol NanoDrop ND-1000 spectrophotometer Covaris S-2 Sample Preparation System VWR circulating chiller Covaris shearing vial (6 × 16−mm AFA Þber vial; cat. no. 520045) 1.5-ml microcentrifuge tube Agencourt AMPure XP kit (Beckman Coulter, cat. no. A63881) Magnetic separator (DynaMag Spin Magnet, Invitrogen, cat. no. 123-20D) Additional reagents and equipment for DNA quantitation (APPENDIX 3D) and agarose gel electrophoresis (UNIT 2.7) Targeted Exon Sequencing by In-Solution Hybrid Selection

Dilute DNA sample 1. Prepare a dilution of DNA sample at a concentration of 3 μg in 100 μl nuclease-free water (∼30 ng/μl).

18.4.4 Supplement 66

Current Protocols in Human Genetics

2. ConÞrm DNA concentration by absorption at 260 nm on a Nanodrop ND-1000 spectrophotometer (APPENDIX 3D), using nuclease-free water to blank the instrument.

Shear DNA 3. Fill a Covaris water bath to the Þll line, adjust circulating chiller bath to 4◦ C, and begin degassing. Allow system to chill and degas for 20 min or more. 4. Pipet 100 μl DNA sample through the split septum cap of a shearing vial, insert vial into holder, and place holder into position. 5. Adjust shearing parameters and run program as follows: Duty cycle Intensity Cycler/burst Mode # Cycles

10% 5% 200 frequency sweeping 3.

6. Pipet the sheared sample from the vial to a clean 1.5-ml microcentrifuge tube.

Clean DNA using AMPure XP beads 7. Allow AMPure beads to equilibrate to room temperature ∼20 min. For additional information about using AMPure beads, see manufacturer’s instructions.

8. Gently shake the bottle to resuspend any beads that may have settled and ensure that mixture is homogeneous. 9. Slowly add 1.8× volume (180 μl) of beads to the sheared DNA. 10. Vortex bead/reaction mixture for 10 sec or until the mixture is homogeneous. Incubate 5 min at room temperature. 11. Place the tube on a magnetic separator and allow the beads to separate out of solution for 2 min until the solution appears clear. 12. With the tube still on the magnet, slowly pipet off and discard the supernatant. 13. Gently pipet 500 μl of 70% ethanol into the tube, being careful not to disturb beads. Let stand 30 sec and then remove and discard the ethanol wash. 14. Repeat wash, being sure to remove all ethanol after the second wash. 15. With tube still on the magnet, allow beads to air dry for 2 min. Do not allow beads to over dry and appear cracked, as this will greatly reduce DNA recovery.

Elute DNA 16. Remove tube from magnet and add 32 μl nuclease-free water to elute DNA. 17. Brießy vortex to ensure all beads come in contact with eluant. 18. Place tube on magnetic separator and allow beads to separate for 1 min until liquid is clear. 19. Carefully pipet the eluate to a new labeled tube. Store at −20◦ C until end repair step.

Check DNA fragment size 20. Run 2 μl of eluate on a 2% agarose gel to ensure correct fragment distribution. The smear should be from ∼100 to 400 bp with a peak around 150 to 200 bp.

HighThroughput Sequencing

18.4.5 Current Protocols in Human Genetics

Supplement 66

BASIC PROTOCOL 2

PAIRED-END LIBRARY PREPARATION Library preparation follows a slightly modiÞed Illumina paired-end sample preparation protocol by which randomly sheared genomic DNA fragments are modiÞed so that they can be effectively hybridized to a ßow cell, cluster ampliÞed, and subsequently sequenced on the Genome Analyzer II. Brießy, randomly sheared DNA fragments are end-repaired to produce blunt ends. Blunt-ended fragments are extended with a single dATP to produce a single A-base overhang to which speciÞc adapters with a single dTTP overhang can be ligated. A universal PCR ampliÞcation is used to enrich for successfully adapter-ligated fragments, increase library concentration, and add an additional utility sequence used to hybridize fragments to a ßow cell for cluster ampliÞcation and sequencing.

Materials Illumina Paired End Sample Prep Kit (cat. no. PE-102-1001), containing: 10× T4 DNA ligase buffer w/10 mM ATP T4 polynucleotide kinase T4 DNA polymerase Klenow fragment (3 →5 exo) and Klenow buffer 10 mM dNTP mix 1 mM dATP DNA ligase and 2× buffer Nuclease-free water Sheared, cleaned DNA sample (see Basic Protocol 1) Paired-end oligo mix (Illumina) 2× Phusion high-Þdelity PCR master mix (Finnzymes, cat. no. F-531S) PCR primers, 100 μM each: PE1.0: AAT GATACGGCGACCACCGAGATCTACACTCTTTCCCTACAC GACGCTCTTCCGATCT PE2.0: CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCT GAACCGCTCTTCCGATCT 96-well PCR plate Thermocycler Additional reagents and equipment for cleaning DNA with AMPure beads (see Basic Protocol 1), agarose gel electrophoresis (UNIT 2.7), and DNA quantitation (APPENDIX 3D) Perform end repair 1. Prepare end repair master mix on ice as follows (20 μl/reaction): 5.0 μl 10× T4 DNA ligase buffer with 10 mM ATP 2.5 μl T4 polynucleotide kinase 2.5 μl T4 DNA polymerase 0.5 μl Klenow fragment 2.0 μl 10 mM dNTP mix 7.5 μl nuclease-free water. 2. Add 20 μl end repair mix to 30 μl sheared, cleaned DNA in a 96-well PCR plate for a total reaction volume of 50 μl. 3. Vortex, spin down, seal, and incubate on a thermocycler as follows: Targeted Exon Sequencing by In-Solution Hybrid Selection

25◦ C 4◦ C

30 min hold.

18.4.6 Supplement 66

Current Protocols in Human Genetics

4. Clean reaction mixture using AMPure XP beads (see Basic Protocol 1, steps 7 to 19). Use a 1.8× bead concentration and elute with 32 μl nuclease-free water.

Carry out A-base addition 5. Prepare A-base addition mix on ice as follows (18 μl/reaction): 5.0 μl Klenow buffer 10.0 μl 1 mM dATP 3.0 μl Klenow fragment (exo). 6. Add 18 μl A-base addition mix to 32 μl cleaned, end-repaired DNA in a 96-well PCR plate for a total reaction volume of 50 μl. 7. Vortex, spin down, seal, and incubate on a thermocycler as follows: 37◦ C 4◦ C

30 min hold.

8. Clean reaction mixture using 1.8× volume AMPure XP beads and elute with 24 μl nuclease-free water.

Ligate adapter 9. Prepare adapter ligation mix on ice as follows (36 μl/reaction): 30 μl 2× DNA ligase buffer 3.9 μl paired-end oligo mix 2.7 μl DNA ligase. 10. Add 36 μl adapter ligation mix to 24 μl cleaned, A-tailed DNA in a 96-well PCR plate for a total reaction volume of 60 μl. 11. Vortex, spin down, seal, and incubate on a thermocycler as follows: 25◦ C 4◦ C

30 min hold.

12. Clean reaction mixture using 1.8× volume AMPure XP beads and elute with 40 μl nuclease-free water.

Enrich by PCR 13. Prepare PCR master mix on ice as follows (15 μl/reaction): 50 μl 2× Phusion master mix 1.0 μl Primer PE1.0 1.0 μl Primer PE2.0 8.0 μl nuclease-free water. 14. Add 60 μl PCR master mix to 40 μl cleaned, adapter-ligated DNA in a 96-well PCR plate for a total reaction volume of 100 μl. 15. Vortex, spin down, seal, and incubate on a thermocycler as follows: 1 cycle: 6 cycles:

1 cycle:

30 sec 10 sec 30 sec 30 sec 5 min

98◦ C 98◦ C 65◦ C 72◦ C 72◦ C.

16. Clean reaction mixture using 1.8× volume AMPure XP beads and elute with 35 μl nuclease-free water.

HighThroughput Sequencing

18.4.7 Current Protocols in Human Genetics

Supplement 66

17. Run 3 μl of cleaned PCR product on a 2% agarose gel to conÞrm ampliÞcation. Ligation of paired-end adapters and ampliÞcation using PE1.0/PE2.0 primers adds an additional 120 bp of Illumina utility sequence, increasing sheared fragment size by 120 bp. The smear should now run from ∼250 to 600 bp, with a peak around 350 bp.

18. Quantify DNA using the NanoDrop ND-1000 (APPENDIX 3D). BASIC PROTOCOL 3

HYBRID SELECTION In-solution hybrid selection works on the same principle as any typical DNA microarray. In this case, speciÞc capture of targeted exons is accomplished by mixing single-stranded biotinylated RNA baits with a denatured Illumina paired-end library under high stringency conditions. A 24-hour incubation at 65◦ C drives the speciÞc hybridization of DNA and RNA based on sequence complementarity. To wash away non-hybridized DNA fragments, biotinylated DNA-RNA duplexes are immobilized using streptavidincoated paramagnetic beads and are pulled out of solution using a magnetic separator. Repeated washing of beads at high stringency removes non-speciÞcally hybridized DNA fragments. Captured DNA fragments are then chemically denatured from the immobilized RNA with sodium hydroxide. The released DNA fragments are cleaned and PCR-enriched to produce a highly enriched targeted exon library ready for sequencing on the Illumina Genome Analyzer.

Materials Adapter-ligated DNA (see Basic Protocol 2) 50× Denhardt’s solution 20× SSPE Nuclease-free water 10% SDS 0.5 M EDTA 1.0 mg/ml human Cot-1 DNA (Invitrogen, cat. no. 15279-101) 10.0 mg/ml salmon sperm DNA (Invitrogen, cat. no. 15632-011) Blocking oligos (200 μM each, custom oligos from IDT) Oligo 1.0: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACAC GACGCTCTTCCGATCT Oligo 2.0: CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCT GCTGAACCGCTCTTCCGATCT 100 ng/μl Biotinylated RNA Oligo Library (Agilent Technologies SureSelect) 20 U/μl Superase-In RNAse Inhibitor (Applied Biosystems, cat. no. AM2694) Dynabeads M-280 Streptavidin Beads (Invitrogen, cat. no. 112-05D) 5 M NaCl 1 M Tris-Cl 20× SSC 0.1 N NaOH 2× Phusion high-Þdelity PCR master mix (Finnzymes, cat. no. F-531S) PCR primers, 100 μM each: PE1.0: AAT GATACGGCGACCACCGAGATCTACACTCTTTCCCTACAC GACGCTCTTCCGATCT PE2.0: CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCT GAACCGCTCTTCCGATCT

Targeted Exon Sequencing by In-Solution Hybrid Selection

NanoDrop ND-1000 spectrophotometer Speedvac evaporator 65◦ C heating block with 1.5-ml tube holder 96-well PCR plates 1.5-ml microcentrifuge tubes

18.4.8 Supplement 66

Current Protocols in Human Genetics

Adhesive plate seal 96-well thermocycler with heated lid 50 ml conical tube Magnetic separator (DynaMag Spin Magnet, Invitrogen, cat. no. 123-20D) Additional reagents and equipment for DNA quantitation (APPENDIX 3D) and cleaning DNA with AMPure beads (see Basic Protocol 1) Hybridize DNA to RNA 1. Ensure that pond DNA is at a concentration of ≥100 ng/μl by checking on a Nanodrop ND-1000 spectrophotometer (APPENDIX 3D). If concentration is too low, concentrate sample in a Speedvac evaporator. 2. Prepare hybridization buffer in a 1.5-ml tube as follows:

500 μl 20× SSPE 240 μl nuclease-free water 200 μl 50× Denhardt’s solution 20 μl 10% SDS 20 μl 0.5 M EDTA. 3. Vortex hybridization buffer and place in a 65◦ C heating block. Occasionally re-vortex mixture to ensure SDS precipitate is fully dissolved and buffer appears clear. 4. In a 96-well plate labeled “DNA,” combine the following in appropriate wells:

500 ng enriched pond (5.0 μl at 100 ng/μl) 2.5 μl 1.0 mg/ml human Cot-1 DNA 2.5 μl 10.0 mg/ml salmon sperm DNA 1.5 μl 200 μM blocking oligo 1.0 1.5 μl 200 μM blocking oligo 2.0. 5. In a 1.5-ml tube labeled “bait” combine:

5.0 μl 100 ng/μl biotinylated oligo library (bait) 1.0 μl 20 U/μl Superase-In RNAse inhibitor 1.0 μl nuclease-free water. 6. Seal the DNA plate with adhesive plate seal, vortex, centrifuge brießy, and place on a thermocycler. Close lid and start the hybrid selection program as follows: 95◦ C 65◦ C

5 min hold.

7. Allow DNA to denature for 5 min and then equilibrate to 65◦ C for 5 min. 8. As soon as the DNA plate has stabilized at 65◦ C for 2.5 min, place the tube containing RNA bait into a 65◦ C heating block and set timer for 2.5 min. 9. Once RNA has incubated for 2.5 min at 65◦ C, pause program, open lid of thermocycler, and remove adhesive seal. It is critical to perform the following addition steps quickly to minimize volume loss to evaporation while the plate is open and heated at 65◦ C.

10. With the DNA plate still in the thermocycler, remove hybridization buffer from the heating block, brießy spin down, and pipet 13 μl hybridization buffer to each sample. 11. Quickly remove the RNA bait tube from the 65◦ C heating block, spin down, and pipet 6 μl to each sample.

HighThroughput Sequencing

18.4.9 Current Protocols in Human Genetics

Supplement 66

12. Mix 10 times with a pipettor set at 10 μl, reseal plate with new adhesive seal, close the thermocycler lid, and continue the program. 13. Incubate hybridization reaction at 65◦ C for 24 hr.

Prepare M-280 streptavidin beads 14. In a 50-ml conical tube, prepare bead wash buffer as follows: 19.7 ml nuclease-free water 5 ml 5 M NaCl 250 μl 1 M Tris-Cl 50 μl 0.5 M EDTA. 15. In a 1.5-ml microcentrifuge tube, combine 50 μl streptavidin beads with 200 μl bead wash buffer per sample (e.g., for 5 samples, combine 250 μl beads with 1 ml buffer). Vortex for 30 sec. 16. Place tube on magnetic separator for 2 min to allow beads to settle out of the mixture. 17. With the tube still on the magnet, use a pipet to remove and discard the supernatant. 18. Remove the tube from the magnet and add 165 μl bead wash buffer per reaction (e.g., for 5 samples, add 825 μl buffer). Resuspend beads by vortexing 30 sec. 19. Repeat steps 17 and 18 for a total of three washes. 20. After the Þnal wash, resuspend beads in 165 μl bead wash buffer per reaction. These wash steps remove the storage buffer and are necessary for effective streptavidinbiotin binding.

Capture hybridized DNA/RNA 21. When hybridization is complete (step 13), remove DNA plate from the thermocycler and transfer each reaction to a labeled 1.5-ml microcentrifuge tube. 22. Add 165 μl washed beads to each tube. Vortex 10 sec and incubate mixture at room temperature for 30 min, vortexing occasionally to keep beads suspended. 23. Place tubes on magnetic separator and allow to separate for 2 min. Remove and discard supernatant. Remove the tube from the magnet. 24. Prepare low-stringency buffer as follows:

23.5 ml nuclease-free water 1.25 ml 20× SSC 250 μl 10% SDS. 25. Add 165 μl low-stringency buffer to each sample and incubate at room temperature for 15 min. 26. Place tubes on magnet and allow beads to separate for 2 min. Remove and discard supernatant and remove tube from magnet. 27. Prepare high-stringency buffer as follows:

24.6 nuclease-free water 125 μl 20× SSC 250 μl 10% SDS. Targeted Exon Sequencing by In-Solution Hybrid Selection

Aliquot into 1.5-ml tubes and warm to 65◦ C in a heating block. 28. Add 165 μl prewarmed high-stringency buffer to each sample, vortex to resuspend beads, and incubate in heating block at 65◦ C for 10 min.

18.4.10 Supplement 66

Current Protocols in Human Genetics

29. Place tubes on the magnet and let beads separate for 2 min. Remove and discard the supernatant. 30. Repeat steps 28 and 29 for a total of three washes.

Denature DNA/RNA hybrid 31. Denature DNA from bead-bound RNA by adding 50 μl of 0.1 N NaOH to each sample. Vortex to resuspend beads and incubate at room temperature for 10 min. 32. Transfer tube to the magnet and let beads separate for 2 min. 33. Remove the supernatant (containing the target-selected DNA) and transfer to a fresh tube. 34. Add 50 μl of 1 M Tris-Cl to neutralize the NaOH. 35. Clean reaction using 1.8× volume AMPure XP beads (see Basic Protocol 1, steps 7 to 19). Elute “catch” DNA using 40 μl nuclease-free water.

Enrich captured DNA 36. Prepare PCR master mix on ice as follows (15 μl/reaction): 50 μl 2× Phusion master mix 1.0 μl Primer PE1.0 1.0 μl Primer PE2.0 8.0 μl nuclease-free water. 37. Add 60 μl PCR master mix to 40 μl cleaned “catch” DNA in a 96-well PCR plate for a total reaction volume of 100 μl. 38. Vortex, spin down, seal, and incubate on a thermocycler as follows: 1 cycle: 12 cycles:

1 cycle:

30 sec 10 sec 30 sec 30 sec 5 min

98◦ C 98◦ C 65◦ C 72◦ C 72◦ C.

39. Clean reaction mixture using 1.8× volume AMPure XP beads and elute with 35 μl nuclease-free water. Store at −20◦ C until sequencing. 40. Quantify DNA using the NanoDrop ND-1000. The size of the PCR-ampliÞed DNA can be veriÞed by agarose gel electrophoresis; however, with only twelve cycles of PCR, the product may not be detectable on a gel. The subsequent qPCR step will reveal whether there is enough product for sequencing.

LIBRARY QUANTIFICATION BY qPCR Accurate quantiÞcation of a library is critical for efÞcient sequencing on the Genome Analyzer. Loading a sample at too high a concentration can saturate the surface chemistry, hindering the ability of the software to differentiate one cluster from another and ultimately reducing the yield of quality reads. Alternatively, loading a sample at too low a concentration fails to fully utilize ßow cell real estate, generating low sequencing yields and limiting the coverage of targets of interest. Because only DNA fragments containing the correct sequences on either end will hybridize and produce clusters on the ßow cell, simple quantiÞcation by OD is often insufÞcient for accurately calculating the optimal loading concentration of a given sample. An effective solution is to run a quantitative real-time PCR assay using a previously sequenced paired-end library as a standard and primers complementary to the P5 and P7 sequences used in hybridization and cluster ampliÞcation on the ßow cell.

BASIC PROTOCOL 4

HighThroughput Sequencing

18.4.11 Current Protocols in Human Genetics

Supplement 66

Any well-characterized library can be used for a standard curve, but in many labs the PhiX control library provided by Illumina is often well-calibrated for optimal cluster densities. Brießy, 1-μl of each library of unknown concentration is diluted 100-fold, and 1μl is used as template in a PCR reaction containing SYBR Green stain and P5 and P7 primers. Only DNA fragments with both a P5 and P7 sequence on either end will be ampliÞed and generate ßuorescent signal by the intercalation of the SYBR stain into double-stranded DNA. The qPCR reaction is performed on a real-time PCR machine, which records the ßuorescence intensity for each cycle of ampliÞcation for each well. Upon completion of the cycling program, the software determines an intensity threshold (Rn ) within the exponential log phase of PCR ampliÞcation. For each well, the Ct (or cycle-threshold) value is calculated by determining the exact cycle at which the ßuorescence intensity crosses the set intensity threshold. The Ct values for the standard curve are plotted, a bestÞt line is calculated, and concentrations for unknown samples are reported (Fig. 18.4.3).

Materials 10 nM PhiX Control Library (Illumina, cat. no. 1006471) Nuclease-free water Target-selected DNA library (see Basic Protocol 3) 2× Brilliant SYBR Green QPCR Master Mix (Stratagene, cat. no. 600548) 1 mM ROX Reference Dye 1.25 μM P5 PCR primer (AATGATACGGCGACCACCGA) 1.25 μM P7 PCR primer (CAAGCAGAAGACGGCATACGA) 384 well MicroAmp Optical Reaction Plate (Applied Biosystems, cat. no. 4326270) MicroAmp Optical Adhesive Film (Applied Biosystems, cat. no. 4311976) ABI 7900HT Real-Time PCR System with SDS V2.3 software (Applied Biosystems) Create standard curve 1. Add 2 μl of 10 nM PhiX Control Library to 98 μl nuclease-free water. Vortex well, spin down, and label this dilution “PhiX Control 20 nM.” Although the true dilution is now 0.2 nM, the label (20 nM) is scaled up 100-fold as a means of accounting for the 1/100 dilution of the library. Entering 100× standard curve points in the ABI analysis software allows the read-out to reßect the true concentration of the undiluted library.

2. Add 50 μl of “PhiX Control 20 nM” to 50 μl nuclease-free water, vortex well, and spin down. Continue dilutions to create a seven-step, two-fold serial dilution (20, 10, 5, 2.5, 1.25, 0.625, and 0.313 nM). Add 50 μl nuclease-free water to another tube and label as “NTC” (No Template Control). These labeled concentrations cover a 1/100 dilution of sample template and are at a working concentration for the qPCR assay.

Carry out qPCR quantiÞcation 3. Prepare a 1/100 dilution of each library to be quantiÞed by carefully pipetting 1 μl of library into 99 μl nuclease-free water. Vortex 30 sec and spin down. 4. Prepare qPCR master mix on ice as follows:

Targeted Exon Sequencing by In-Solution Hybrid Selection

18.4.12 Supplement 66

12.5 μl 2× Brilliant SYBR Green Master Mix 0.375 μl 2 μM ROX Reference Dye 1.0 μl 1.25 μM P5 primer 1.0 μl 1.25 μM P7 primer 9.125 1.25 μM nuclease-free water. The 1 mM reference dye is diluted 1:500 (2 μM) before 0.375 μl is added to the master mix. Current Protocols in Human Genetics

Figure 18.4.3 qPCR library quantification. Real-time SYBR Green qPCR is used for accurate quantification of libraries prior to sequencing. An accurate quantitation is essential for calculating the amount of library to be loaded onto a flow cell for optimal cluster density and high sequence yields. Shown in this figure are the amplification plots for a two-fold serial dilution standard curve as well as four libraries, all run in triplicate. The standard curve is plotted and used to calculate the concentration of each library. For the color version of the figure, go to http://www.currentprotocols.com/protocol/hg1804.

5. Array 24 μl master mix into all appropriate wells of a 384-well MicroAmp Optical Plate. For best accuracy and to account for pipetting error, the standard curve and all samples should be run in triplicate.

6. For wells designated for the standard curve, carefully pipet 1 μl of each standard dilution into the corresponding wells containing master mix. 7. For wells designated for sample libraries, carefully pipet 1 μl of 1/100 dilution to the corresponding wells containing master mix. 8. Seal the plate with MicroAmp Optical Adhesive Film, lightly vortex the plate, and spin down.

HighThroughput Sequencing

18.4.13 Current Protocols in Human Genetics

Supplement 66

9. Open the SDS software for the ABI 7900HT real-time PCR instrument and open a new template Þle for Standard Curve/Absolute QuantiÞcation. 10. Load plate and program the cycling conditions as follows: 1 cycle: 40 cycles:

30 min 10 min 10 sec 1 min

50◦ C 95◦ C 95◦ C 60◦ C.

11. On the plate set-up tab, apply the SYBR Green detector to all well positions containing master mix and assign a sample type to each well (standard, no template control, or unknown). For standard curve wells, also enter the appropriate concentration (20, 10, 5, 2.5, 1.25, 0.625, 0.313 nM, and NTC). 12. Once the template is set up and the plate is properly loaded, run the cycling program. Run time is ∼2.5 hr.

13. Once program is complete, run the automated analysis by clicking the Analyze button presented as a green arrow at the top of the window. The software will automatically calculate a best-Þt line for the standard curve and plot all unknown samples. Concentrations for unknown libraries are calculated and can be exported as a text document that can be reopened in Excel.

14. ConÞrm the quality of the standard curve by ensuring that the R2 value is >0.95. 15. Average the reported concentrations for the triplicate wells of each sample, and use this concentration to calculate the volume of library that should be denatured for hybridization and cluster generation. Concentrations should be >2 nM for a library to be sequenced. The hybrid selection library is now ready for cluster generation and paired-end sequencing on the GAII. Refer to Illumina Cluster Station and Genome Analyzer II user guides for complete instructions and protocols (see http://www.illumina.com/ support/documentation.ilmn). SUPPORT PROTOCOL

READ ALIGNMENT AND EVALUATION OF SEQUENCE DATA Once the Illumina pipeline has processed the raw data, the reads must be aligned to a reference genome. Many algorithms exist for this purpose, including the ELAND aligner that is part of the Illumina pipeline. Other commonly used aligners include MAQ (Li et al., 2008) and BWA (Li and Durbin, 2009). A new data standard format, SAM/BAM (Li et al., 2009), has been created as part of 1000 Genomes (http://www.1000genomes.org/ ) to represent next-generation sequencing and alignment data. One component of the SAMTools package is a converter to produce SAM/BAM from ELAND. MAQ also provides a converter and BWA outputs in SAM/BAM format natively. Two popular toolsets for processing NGS data are Picard and the Genome Analysis Toolkit (GATK). Instructions for downloading, installing and running can be found at:

Targeted Exon Sequencing by In-Solution Hybrid Selection

Picard: http://picard.sourceforge.net GATK: http://www.broadinstitute.org/gatk These two software packages are required for calculating a variety of hybrid selection performance metrics discussed further below.

18.4.14 Supplement 66

Current Protocols in Human Genetics

Library Complexity Sequencing a library with low molecular diversity leads to the same molecules being observed many times. Without recognizing and correcting for this effect, multiple signals from a single template molecule may be interpreted as many different molecules. An example of where this causes a problem is in variant detection, because a single molecule with a PCR error that is observed ten times is given the same weight as observing ten independent molecules with the variant, which leads to false positives in downstream analysis. In order to address this problem, MarkDuplicates, a component of the Picard tool suite (http://picard.sourceforge.net), was developed to measure the amount of molecular duplication in sequencing data and to mark those reads as molecular duplicates. The former function can be used to monitor and improve the laboratory process that generates the reads, while the latter allows the data to be used regardless of the duplication rate by discarding reads from duplicate molecules. This, of course, lowers the effective yield and coverage of the lane based on the amount of duplication, but does allow the data to be analyzed. The MarkDuplicates algorithm produces a Þle with many statistics, all of which are described in the Picard documentation. However, there are two key metrics: PERCENT DUPLICATION: the percentage of molecular duplicates observed in the reads. This number will increase with sequencing depth as the molecules are sampled more deeply. ESTIMATED LIBRARY SIZE: a statistical estimate of the number of unique molecules in the library, with larger numbers indicating higher diversity. Since this number is an estimate of the underlying diversity, it does not change with sequencing depth. Although larger numbers indicate higher diversity, the expected amount of diversity scales with the size of the target. For high-quality, whole-exome captures, typical values are over 1.5× 108 .

Necessary resources Aligned Sequence BAM MarkDuplicates.jar (Picard Tools) Using the required Þles and software Execute java --Xmx2g --jar MarkDuplicates.jar / INPUT=input.bam / OUTPUT=output.bam / METRICS FILE=dupe metrics.out Where input.bam is the input BAM Þle of aligned sequence reads output.bam is the output BAM to be written with duplicates marked dupe metrics.out is the output Þle containing duplication metrics. Selection SpeciÞcity Once a BAM Þle has been created and duplicate molecules identiÞed, another Picard tool can be used to assess the efÞciency of the selection event itself. The Picard documentation contains detailed explanations of every metric produced by this software, but the most important metrics to measuring the efÞciency of the selection process are: PCT SELECTED BASES: a measure of target enrichment, deÞned as the percentage of aligned bases that are either on or near the bait (±250 bp from bait).

HighThroughput Sequencing

18.4.15 Current Protocols in Human Genetics

Supplement 66

ZERO CVG TARGETS PCT: a measure of target capture capability, deÞned as the percentage of targets that have fewer than two overlapping reads. PCT TARGET BASES 20×: a measure of project completion, the percentage of target bases with coverage over 20×. This is typically used as project completion metric (for example, 80% of targets over 20× coverage). This metric is also produced for 2×, 10×, and 30× coverage. FOLD 80 BASE PENALTY: a measure of non-uniformity of sequencing depth, deÞned as the additional lanes of sequencing required to bring 80% of the target bases to the mean coverage for this original lane. Typical values range from 3 to 5, with lower numbers indicating more uniform coverage. HS PENALTY 10×: an aggregate measure of all capture inefÞciencies, including duplication, percent selected bases, and non-uniformity of coverage, given desired target coverage of 10×. This can be interpreted as a multiplier of aligned input sequence required to raise a target base by 1× coverage. For example, with an HS PENALTY 10× of 8.63, sequencing 30 Mb of target to 10× coverage would require 8.63 × 30 Mb × 10× = 2.59 Gb of aligned input sequence. If this metric has no value, it indicates that it is not possible to sequence this library to the desired coverage, usually due to insufÞcient library complexity (i.e., high molecular duplication). This metric is also calculated for a desired coverage of 20× and 30×.

Necessary resources Duplicate-marked BAM Target information Þle Bait information Þle CalculateHsMetrics.jar (Picard Tools) Using the required Þles and software Execute java -Xmx2g --jar CalculateHsMetrics.jar / BAIT INTERVALS=bait.interval list / TARGET INTERVALS=targets.interval list / INPUT=input dupe marked.bam / OUTPUT=hs metrics.out Where baits.interval list – Bait locations as described in the Picard Documentation targets.interval list – Target locations as described in the Picard Documentation input dupe marked.bam – Sequencing data BAM with duplicates marked hs metrics.out – Output Þle containing hybrid selection metrics

Targeted Exon Sequencing by In-Solution Hybrid Selection

Sequence Visualization While various calculated metrics provide a good overall snapshot of performance for a hybrid selection run, visualizing actual sequence coverage of a targeted region can sometimes aid in troubleshooting and further protocol optimization. A very useful visualization tool has been developed at the Broad Institute. The Integrative Genomics Viewer (http://www.broadinstitute.org/igv/) allows for interactive navigation/visualization of sequence reads (Fig. 18.4.4). An aligned sequence Þle (BAM) is uploaded along with a reference genome build. By entering genomic intervals of interest into the navigator, one can zoom in on a particular gene or exon of interest and visually observe how sequence reads pile up over targeted regions. In addition, multiple BAM Þles can be loaded so that different samples or experimental conditions can be visually compared.

18.4.16 Supplement 66

Current Protocols in Human Genetics

A

B

Figure 18.4.4 Hybrid selection visualization using the Integrative Genomics Viewer (IGV). After analysis, sequence BAM files are loaded into the IGV. (A) Exons of varying lengths on the BRCA1 gene can be seen in the lower RefSeq Gene track represented by thick blue bars. In the upper sequencing read track, aligned reads can be seen piling up over the targeted regions showing deep coverage of the exonic regions and minimal off-target sequencing. (B) Zooming in to a higher basepair resolution allows actual mutations to be observed in comparison to a reference sequence. For the color version of the figure, go to http://www.currentprotocols.com/protocol/hg1804.

COMMENTARY Background Information Until recently, directed sequencing has been accomplished using locus-speciÞc PCR ampliÞcation of target regions followed by Sanger sequencing (Sanger et al., 1977; Raymond et al., 2009). While some success has been achieved in characterizing candidate regions in clinical cohorts by this approach, process costs and capacity limitations have restricted the scope of studies to a relatively small number of candidate genes (Mardis, 2008; Shendure and Ji, 2008). Recent development of “next-generation” DNA sequencing technologies and the rapid

reduction in sequencing costs have now enabled the research community to survey common and rare genetic variation in wellcharacterized clinical samples on a larger scale (typically less than 25). One approach to fully utilizing these new technologies is to perform whole-genome sequencing of many individuals. Unfortunately, despite the dramatic cost reductions realized by moving away from traditional capillary-based Sanger sequencing towards next-generation technologies such as Illumina, Roche/454, and ABI Solid, the cost of whole-genome sequencing remains high enough to prohibit

HighThroughput Sequencing

18.4.17 Current Protocols in Human Genetics

Supplement 66

Targeted Exon Sequencing by In-Solution Hybrid Selection

high-powered studies involving thousands of samples. While sequencing costs continue to plummet, an efÞcient interim approach to discovering rare, medically signiÞcant polymorphisms is direct genome targeting coupled with deep resequencing of only the protein coding regions using Illumina’s sequencingby-synthesis technology (Bentley et al., 2008; Li et al., 2008; Quail et al., 2008; Ng et al., 2009). Direct PCR-based targeting approaches have been evaluated using next-generation sequencing technologies that enable very deep sampling of a large number of PCR amplicons (Thomas et al., 2006; Harismendy et al., 2009). With the massively parallel nature of these emerging sequencing technologies, taking full advantage of increased sequence yields using a PCR approach requires the generation of tens of thousands of unique PCR amplicons. Primer design, characterization, ampliÞcation, and pooling become signiÞcant bottlenecks to overall throughput. For example, a single lane on an Illumina GAII is currently capable of generating up to 6 Gb of high-quality aligned bases, which could provide >50-fold coverage of many thousands of short PCR-generated amplicons. However, fully utilizing these yields would require the individual design and synthesis of >120,000 unique well-characterized primer pairs, a monumental and prohibitively expensive task. Logistic and cost challenges aside, PCR-based resequencing is further limited by the rigidity and speciÞcity of primer placement. For example, if a PCR primer site contains an unknown mutation and fails to produce an amplicon, any novel mutation— whether an SNP, indel, or CNV in that targeted region—will go unobserved in the Þnal sequencing. One solution to the laborious process of locus-speciÞc PCR-based resequencing is targeting individual exons by hybrid selection. Methods for direct selection were described in the 1990s as methods for selecting and enriching cDNAs and larger genomic contigs using large genomic DNA clones immobilized to Þlters or paramagnetic beads (Bashiardes et al., 2005). Brießy, sequence-speciÞc DNA fragments (referred to here as “baits”) are hybridized to a complex library of fragments (referred to as the “pond”). In this case, the pond is a digested complete genome. Hybridized duplexes are then immobilized and the remaining non-hybridized DNA is washed away, leaving a selected subset of the ini-

tial fragment population (referred to as the “catch”). This enriched population of fragments is then recovered by denaturation from the immobilized baits and is subsequently sequenced. Over the past several years, various groups have taken advantage of advanced maskless, programmable DNA microarrays (Agilent/Nimblegen) to custom-synthesize short, unique target sequences (Li et al., 2008; Gnirke et al., 2009). The massively parallel nature of array-based oligo synthesis allows for the inexpensive targeting of tens of thousands of unique, noncontiguous genomic regions of interest from complex genomes. By moving away from large-insert YAC/BAC methods (which target long contiguous regions) towards array-based oligos ranging from 60 to 170 bp in length, it is possible to precisely target relatively short exons across many genes. Programmable oligonucleotide microarrays can be used to capture and enrich exons in two basic ways. (1) Solid-phase hybrid selection involves hybridizing sheared genomic DNA directly to the array-bound oligos (Ng et al., 2009). Samples are hybridized and the array is washed, leaving only the selected genomic targets of interest, which are then denatured off, enriched, and sequenced. (2) Solution-based hybrid selection utilizes the same array-based oligo synthesis, but rather than direct hybridization of DNA samples to individual arrays, the oligos are stripped from the array and PCR-ampliÞed using universal PCR priming sites synthesized onto the extreme ends of each sequence. In the method developed and implemented at the Broad Institute, PCR-ampliÞed DNA probes are then transcribed into biotinylated RNA, which is hybridized in-solution with a randomly sheared genomic DNA library. Hybridized DNA-RNA duplexes are pulled down using streptavidincoated magnetic beads. Immobilized beads are then washed to remove non-hybridized DNA, and the captured DNA is subsequently denatured from the immobilized RNA, enriched by PCR, and sequenced on Illumina’s GAII (Gnirke et al., 2009). Fully leveraging the ever-increasing efÞciency of next-generation sequencing technologies to discover rare mutations associated with disease has required the development of novel targeting methods for resequencing. For studies targeting thousands of exons in many samples, simple locusspeciÞc PCR approaches are not economically

18.4.18 Supplement 66

Current Protocols in Human Genetics

or logistically feasible. Over the past 2 years, several methods have been published that take advantage of programmable microarray technology for massively parallel synthesis of tens of thousands of unique probes targeting genomic regions of interest. These probes can be used to capture genomic targets while remaining bound to the array surface in the case of solid-phase hybrid selection methods, or they can be cleaved from the microarray substrate and ampliÞed into a capture pool for in-solution hybrid selection. One advantage of solid-phase hybrid selection is that probes are synthesized in relatively even quantities across the array. Relatively large amounts of sheared DNA are loaded onto the array in high molar excess compared to the array-bound probes. This produces relatively even sequencing coverage across targets, but at the expense of requiring up to 20 μg of genomic DNA for hybridization. The beneÞts of in-solution hybrid selection implemented at the Broad Institute are twofold. First, a single array and bait preparation can yield large quantities of biotinylated RNA bait, which can be used across many samples and studies. Large lots of RNA bait can be QC’d and released into production with a pre-established performance baseline. In contrast, the single array/single sample approach of solid-phase hybrid selection can be susceptible to array-to-array and lot-to-lot manufacturing variability. Secondly, because hybrid selection is performed in solution, less input DNA is required, making this method accessible to a larger number of archived samples. A high degree of study design ßexibility is afforded by the in-solution hybrid selection approach. Study designs can range from targeting several hundred genes of interest to the entire human exome. With each Agilent array capable of synthesizing 55,000 unique probe sequences, small target designs of 2,000 to 3,000 exons can be prepared from a single array with a high number of synthesis replicates per probe sequence, while higher numbers of exons can be targeted simply by synthesizing additional arrays with added content and blending the Þnal RNA baits together for hybrid selection in a single reaction.

Critical Parameters Library complexity Sequencing depth is critical for accurately identifying rare mutations using hybrid selection exon sequencing. While Illumina’s

Genome Analyzer II is capable of generating very deep coverage of tens of thousands of exons for a single DNA sample in a single lane, coverage can ultimately be limited by the overall complexity of the hybrid selection library. For a mutation to be called conÞdently, the allele must be observed multiple times in unique sequence fragments or in fragments with different Þrst and last bases. A constant challenge for any form of hybrid selection sequencing is maintaining the diversity of DNA fragments throughout the many steps of the protocol. The two largest factors affecting library complexity are (1) enzymatic efÞciency and (2) loss of material during clean-up steps. Throughout the library preparation process, various enzymes are used to add Illumina utility sequences to the ends of randomly sheared DNA fragments, ultimately enabling them to be hybridized to a ßow cell and sequenced. If enzymatic efÞciency is reduced at any step, the percentage of molecules viable for sequencing is lowered. Additionally, each enzymatic step is followed by a reaction clean-up step that, if performed inefÞciently, can further reduce the number of unique sequenceable molecules in subsequent reactions. Enzymatic efÞciency and reaction clean-up steps can create a form of “complexity bottleneck” that progressively restricts the diversity of the Þnal library. A severe loss of complexity at each step coupled with universal PCR enrichment can generate a library made up primarily of a reduced population of highly duplicated molecules. Sequencing such a library generates low coverage of targets as a result of discarding duplicated fragments from the mutation calling analysis. A fragment is considered duplicated and discarded if it shares the exact same start and stop position in the genome as another fragment. In order to maximize library complexity, attention should be paid to several factors. (1) As much DNA material as possible should be carried from step to step throughout the process. Never discard material by carrying forward a fraction of the eluate from any previous clean-up step. (2) PCR enrichment should be performed using as few cycles as possible to yield just enough material for the hybrid selection reaction. Additional cycles can contribute to increased duplication and introduce PCR biases. (3) Clean-up steps using AMPure beads should be performed as consistently as possible to maximize DNA recovery and avoid loss of material.

HighThroughput Sequencing

18.4.19 Current Protocols in Human Genetics

Supplement 66

Targeted Exon Sequencing by In-Solution Hybrid Selection

Selection speciÞcity Another factor that can greatly affect overall sequencing coverage of targeted exons is selection speciÞcity, or the proportion of sequencing reads that align to targeted regions. The ultimate goal of hybrid selection is to effectively capture exonic sequence while washing away non-targeted background genomic DNA. If this step is not performed optimally, background DNA is carried through and sequenced, wasting valuable sequencing reads and reducing coverage of exons. Selection speciÞcity can be affected by (1) hybridization temperature, (2) wash temperature and stringency, and (3) utility sequence blocking. During hybridization set-up, particular attention should be paid to ensuring that the denatured pond, hybridization buffer, and RNA bait have all equilibrated at 65◦ C before being combined. This ensures that DNA remains single-stranded and available for hybridization to RNA baits as soon as they are combined. If a bench-top heating block is used to heat RNA and hybridization buffer, conÞrm that the actual solution temperature reaches 65◦ C by using a microcentrifuge tube of water and a thermocouple device or thermometer. The denatured DNA-RNA mixture must hybridize for 24 hr to allow speciÞc annealing of highly homologous DNA-RNA sequences. Following overnight hybridization and immobilization to biotinylated beads, the majority of non-hybridized DNA is washed away with a non-stringent wash at room temperature. A high-stringency wash buffer, prewarmed to 65◦ C, is then used to wash away nonspeciÞcally hybridized DNA fragments. Too high a wash temperature can result in the washing away of targeted exon fragments, causing loss of library complexity. Too low a wash temperature can allow slightly homologous, nontargeted DNA fragments to remain hybridized, ultimately contributing to a higher percentage of off-target sequencing. Finally, off-target sequencing is greatly reduced by using a high molar concentration of universal sequence blockers in the hybridization mixture. During the paired-end library preparation process, ∼60 bp of universal utility sequence are added to each end of every DNA fragment. Experiments have shown that left unblocked, these universal ends can become “sticky” and link targeted fragments with non-targeted fragments, creating a chain-

like molecule during hybridization. These offtarget fragments can remain hybridized during bead capture and subsequent washes and will then be sequenced, contributing to a high percentage of off-target sequencing and low selection speciÞcity. Evenness of coverage Economically, it is important that targeted exons be covered as uniformly as possible. Ideally, every targeted base is covered by the same number of sequencing reads, allowing mutation calling to be performed with the same level of conÞdence across all targeted exons. With an even library, if additional depth is required, deeper coverage can be added simply by running additional lanes of sequencing. However, if subpopulations of the targeted exon set are grossly over- or underrepresented, large sequencing penalties must be paid in order to bring coverage of underrepresented targets up to a level at which mutations can be called effectively, greatly increasing sequencing costs. Several factors are believed to affect evenness of coverage and should be considered when performing hybrid selection. Excessive PCR ampliÞcation of both the pond and the catch can preferentially amplify certain target populations over others. By performing the absolute minimum number of PCR cycles to obtain enough material to proceed, the risk of over-amplifying some exons over others is minimized. Underrepresentation of RNA baits has also been implicated in low coverage of certain targets. This method for in-solution hybrid selection has been designed so that each RNA bait is theoretically represented in high molar excess compared to targeted DNA fragments in the hybridization reaction. This should ensure that any unevenness in the bait population has a limited affect on evenness of Þnal sequencing coverage. This approach becomes compromised if a probe fails to be synthesized on the microarray or if it falls out during the bait manufacturing process. This potential loss of certain baits can be minimized by ensuring that all exons are targeted by more than one probe sequence.

Troubleshooting Table 18.4.1 lists several problems that may affect in-solution hybrid selection along with possible causes and solutions.

18.4.20 Supplement 66

Current Protocols in Human Genetics

Table 18.4.1 Troubleshooting Guide for In-Solution Hybrid Selection

Problem

Possible cause

Solution

DNA not sheared to correct size

Covaris shearing protocol not set correctly

Ensure that correct program is selected and all parameters are set according to protocol. Shear several test samples under different conditions to Þne-tune fragment distribution.

No pond ampliÞcation after six enrichment cycles

Poor ligation of paired-end adapters

Make sure to use fresh enzymes for end repair, A-base addition, and adapter ligation steps

Loss of material during clean-up steps

Quantify DNA with NanoDrop after each bead cleanup to conÞrm that DNA was retained through process

Failed PCR

Increase number of PCR cycles (not more than twelve)

Loss of material through process; inefÞcient AMPure cleanup. (Even with poor ligation efÞciency or PCR failure, >500 ng of DNA should be present.)

Make sure that ALL material is carried from reaction to reaction. Follow AMPure clean-up protocol strictly. Do not allow beads to over-dry after ethanol wash. Be sure to elute with nuclease-free water.

Less than 500 ng enriched pond for hybridization reaction

Less than 2 nM enriched catch by Hybrid selection failed due to many qPCR (too little material to sequence) possible reasons including: RNA bait not added, hybridization conditions incorrect, DNA not denatured, bead capture failed, wash temperature too high, wash too stringent, 0.1 N NaOH not used for elution

Recheck all buffers for proper concentrations; ensure thermocycler is functioning correctly; check heat block temperature with thermocouple or thermometer; use 0.1 N NaOH for elution

High duplication rate in Þnal sequencing

Loss of material throughout process

Ensure that all material is carried forward step to step; follow AMPure cleanup protocol strictly; do not allow beads to over-dry after ethanol wash

Poor ligation efÞciency of paired-end adapters (if six cycles of pond PCR yields 85% Zero coverage targets: