Differential haplotype amplification leads to

0 downloads 0 Views 629KB Size Report
Bonferroni corrected significant p-value at 1% level is p ≤ 0.00125. ..... Ear- lier reports observed that some of the mismatch base pair- ing between primer and template, such as G·T, .... Tribou, E. H., Kelly, J. J., Noble, P. A., Stahl, D. A., Appl.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/233538525

Differential haplotype amplification leads to misgenotyping of heterozygote as homozygote when using single nucleotide mismatch primer ARTICLE in ELECTROPHORESIS · DECEMBER 2012 Impact Factor: 3.03 · DOI: 10.1002/elps.201200363 · Source: PubMed

READS

533

3 AUTHORS: Navonil De Sarkar

Mousumi Majumder

University of Washington Seattle

The University of Western Ontario

23 PUBLICATIONS 11 CITATIONS

46 PUBLICATIONS 296 CITATIONS

SEE PROFILE

SEE PROFILE

Bidyut Roy Indian Statistical Institute 72 PUBLICATIONS 1,236 CITATIONS SEE PROFILE

Available from: Navonil De Sarkar Retrieved on: 29 February 2016

3564 Navonil De Sarkar Mousumi Majumder∗ Bidyut Roy Human Genetics Unit, Biological Sciences Division, Indian Statistical Institute, Kolkata, India

Received July 9, 2012 Revised August 18, 2012 Accepted August 23, 2012

Electrophoresis 2012, 33, 3564–3573

Research Article

Differential haplotype amplification leads to misgenotyping of heterozygote as homozygote when using single nucleotide mismatch primer Mismatches at the 3 end of /or within a primer are reported to affect the efficiency of PCR and cause allele drop. Here, we report preferential amplification of one haplotype and misgenotyping, when double heterozygotes at NAT1 (rs1057126 and rs15561) were genotyped by sequencing and PCR-RFLP methods using mismatch reverse primers located next to the target SNP. Detailed study revealed highest (100%) and lowest (0%) misgenotyping when the mismatch was at the 3rd and 15th nucleotide positions from 3 end of the primer, respectively. But, the same primers, without any mismatch genotyped heterozygotes correctly. Homozygotes can always be detected correctly irrespective of mismatch position in the primer. Similar results were observed for two SNPs (rs12947788 and rs 12951053) at TP53. Using mismatch NAT1 reverse primers, located three nucleotides away from the target SNP, both TaqMan and sequencing methods showed preferential synthesis of one haplotype strand and misgenotyping in heterozygotes, respectively. So, mismatch primer, located next to target SNP, should be avoided to genotype heterozygotes, since, PCR and sequencing based genotyping methods may lead the investigators to report faulty allelic and genotypic frequencies. This study mimics a situation when an unknown variation is present in the primer-binding sites of both chromosomes. Keywords: Differential haplotype synthesis / Misgenotyping / Mismatch primer / PCR / Sequence / TaqMan DOI 10.1002/elps.201200363

1 Introduction PCR efficiency depends on dNTP/MgCl2 concentrations, secondary structure in primers and templates, binding affinity between forward and reverse primers, yield of nonspecific products, etc. [1, 2]. Rational designing of primer is one of the key issues for successful PCR. Several publicly available primer-designing algorithms such as Primer3 (http://frodo.wi.mit.edu/), Primer premier (http://www.premierbiosoft.com/primerdesign/index.html), PrimerSelect DNASTAR (http://www.dnastar.com), etc. take care of several such issues including secondary structure, primer–dimer formation, nonspecific annealing, etc. in designing efficient primers. A publicly available web tool

Correspondence: Professor Bidyut Roy, Human Genetics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, India E-mail: [email protected] Fax: (033) 2577-3049

Abbreviations: Ct, cycles at threshold; LD, linkage disequilibrium; np, nucleotide position; nt, nucleotide  C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

(https://ngrl.manchester.ac.uk/SNPcheckV2) can check the presence of SNP(s) in DNA sequence from Caucasian, Japanese, Chinese, and African populations. However, this web tool may not be applicable for other populations since it does not consider their SNP data as reference. So, there may be a chance of having unknown SNP/variation(s) in the template of primer-binding site while designing primers for other population. Reports have revealed that a mismatch at the 3 end of or within a primer might affect the efficiency of PCR and may lead to allele drop [3–6]. Only a few methods, such as melting curve analysis, single base extension based “GenoSNIP” have reported that allele drop can be surmounted even though there is a mismatch within the primer [7, 8]. We were conducting an association study between NAT1 polymorphisms and risk of oral cancer. The SNP genotypes (rs1057126 and rs15561) were determined by TaqMan assay



Current address: Department of Anatomy & Cell Biology, The University of Western Ontario Medical Sciences Building, M436, London, Ontario, N6A 5C1, Canada

Colour Online: See the article online to view Fig. 3 in colour.

www.electrophoresis-journal.com

Nucleic acids

Electrophoresis 2012, 33, 3564–3573

3565

Figure 1. Positions of forward, reverse (match and mismatch) primers, and SNPs. Mismatch nucleotide positions are shown by larger fonts in both NAT1 and TP53 reverse primers: (A) Location of NATC1 and NATC2 primers and probes used in TaqMan assay to determine the genotypes at two SNPs (rs1057126 and rs15561) at NAT1 loci. (B) Two SNPs at NAT1 (rs1057126 and rs15561) are located six nucleotides apart. Positions of NAT3212, NATC2, and NAT1_NOS and corresponding mismatch primers are shown. (C) Two SNPs at TP53 (rs12947788 and rs12951053) are located 19 nucleotides apart. Positions of P53F, P53ex7rev, and P53_NOS and corresponding mismatch primers are shown. (D) At NAT1, reverse primer NATC2 and forward primer (NATWD_F), located 404 nt upstream from NAT3212 primer, were used for sequencing and to find out presence of any SNP within NAT3212 and NAT1_NOS primer positions in this population.

using a single set of probes, covering both SNPs (Fig. 1A). These probes and primers are designed and supplied by Applied Biosystems (Foster City, USA). The Genotypes were cross-validated by the PCR-RFLP method using a mismatch reverse primer, located next to the target SNP, rs15561. This mismatch reverse primer was present in our stock for some unrelated experiment and thus, use of this primer was unintentional. However, heterozygotes were genotyped ambiguously. So, sequencing of the PCR products was performed to cross-validate TaqMan results using the same mismatch reverse primer as used in PCR-RFLP. Surprisingly, all heterozygotes were read as homozygotes in the sequencing chromatograms. In a different association study genotypes of two closely located SNPs (rs12947788 and rs12951053) at TP53 were also being typed by sequencing method using matched (perfectly complementary) reverse primer located three nucleotides (nts) away from the SNP rs12951053. It was then planned to check whether genotypes can also be crossvalidated by sequencing while a mismatch reverse primer located next to the target SNP rs12951053 is used. Interestingly, it was noted that similar to the results of NAT1, heterozygotes at rs12947788 and rs12951053 at TP53 were wrongly geno-

 C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

typed as homozygotes in most of the cases. Therefore, it is an urgent need to explore the extent of misgenotyping when a mismatch primer is used in genotyping methods. Here, we report that a single-nt mismatch within a primer, located next to the target SNP, amplifies preferably one of the two haplotype strands; thus, the allele on that strand in heterozygote will be dominating in the product. Sequencing chromatogram from this PCR product may be read as homozygote and thus may show misgenotyping. Effects of mismatches at different nucleotide positions (nps) in the primer were also addressed when genotyping was attempted at two linked loci from each of TP53 and NAT1 genes.

2 Materials and methods 2.1 DNA sample preparation DNA was isolated using salting out method from blood samples collected from Oral and Maxillofacial Pathology Department, Dr R. Ahmed Dental College and Hospital, Kolkata, India and was being used in a case-control association study [9]. Blood samples were collected with written

www.electrophoresis-journal.com

3566

N. De Sarkar et al.

Electrophoresis 2012, 33, 3564–3573

consent from all donors and the study was approved by the Institutional Committee for Protection of Research Risk to Humans.

2.2 Genotyping at SNPs of NAT1 by TaqMan method NAT1 SNPs were genotyped by TaqMan method at rs1057126 (T/A, dbSNP Build 129) and rs15561 (C/A, dbSNP Build 129) which are located six nts apart from each other, using NATC1 and NATC2 primers and fluorescence labeled TaqMan probes (Fig. 1A) (Applied Biosystems) following published procedure [10].

2.3 Resequencing to validate the NAT1 genotype data To validate the genotypes determined by TaqMan method, resequencing was done following amplification with forward (NAT3212) and different reverse primers (Table 1). Except NATC2 reverse primer, which is 16 nt away from the SNP rs15561, 3 ends of NAT1_NOS and other mismatch reverse primer series start just after the same SNP (Fig. 1B). Mismatch primers had single-nt replacement at any one of the 3rd, 6th, 9th, 12th and 15th np within the primer, NAT1_NOS. The PCR reaction mixture comprised of 1U FS Taq DNA polymerase (Roche Diagnostics, Germany), 10 ng of each primer, and 50 ng genomic DNA in a total volume of 10 ␮L with MgCl2 (2.5 mM) and dNTP (50 ␮M). With an initial hold for 10 min at 95⬚C, the PCR cycling and temperature protocol were 42 cycles × (94⬚C for 30 s, 46⬚C for 22 s, and 66⬚C for 30 s) for all sets of primers except the set where reverse primer was MISMCH9C. In this case, 1.5 mM MgCl2 and 75 ␮M dNTP were taken in the PCR mixture but amplification was done for 42 cycles as for the other primers. After

the initial PCR with forward (NAT3212) and reverse primers, only the forward primer was used for sequencing the PCR products in ABI 3100 Genetic analyzer. To study the effect of mismatch primer on PCR amplification at NAT1, 41 DNA samples which are heterozygous at both loci, four DNA samples which are minor or major allele homozygous at both loci, and three DNA samples which are homozygous at rs1057126 but heterozygous at rs15561 were chosen. Experiment with each sample was done twice and misgenotyping frequencies for both the experiments are tabulated as Ex-1 and Ex-2 accordingly (Table 1). Sequencing PCR was performed using only the forward primer, since the SNP loci of interest are located too close to the reverse primer. To investigate the effect of distant mismatch reverse primers, resequencing of heterozygous samples were also done using NAT3212 forward primer. For these sequencing reactions, initial PCR was done using NAT 3212 and match/mismatch reverse primers starting at 3rd and 6th nts downstream of rs15561 (Supporting Information Fig. 1a).

2.4 TaqMan assay to compare Ct values between PCR product yields from two haplotype strands (i.e. T––C and A––A) of NAT1 double heterozygous and T/T––C/C and A/A––A/A double homozygous samples In this quantification experiment, NATC1 (5 GAAACATAACCACAAACCTTTTCAAA-3 ) forward primer and different reverse primers (Tables 2 and 3 and Supporting Information Fig. 1b) were used. We opted this new forward primer because it will generate a small PCR amplicon (of size ∼96 bp), one precondition for successful TaqMan assay. This assay was performed with an initial hold of 10 min at 95⬚C followed by 40 cycles of each of 95⬚C for 15 s, 50⬚C for 30 s, 60⬚C for 30 s using universal master mix (Applied

Table 1. NAT1 primers with or without mismatches and misgenotyping profile

Primers

ID

Sequence

% Misgenotyping at SNP sites rs1057126 (T/A) Ex-1 rs1057126 (T/A) Ex-2 rs15561 (C/A) Ex-1 rs15561 (C/A) Ex-2

Forward Forward Reverse Reverse Reverse Reverse Reverse Reverse Reverse

NAT3212 NATWD_F MISMCH15C MISMCH12T MISMCH9C MISMCH6C MISMCH3G NAT1_NOS NATC2

5 TAAAACAATCTTGTCTATTTGT 3 — — 5 CGATGTTGGGAGGGTATGT 3 0 5 ACCGGCCATCTTTAAAA 3 8 5 ACAGGTCATCTTTAAAA 3 5 ACAGGCCACCTTTAAAA 3 24 87 5 ACAGGCCATCTCTAAAA 3 100 5 ACAGGCCATCTTTAGAA 3 0 5 ACAGGCCATCTTTAAAA 3 5 AAATCACCAATTTCCAAGATAACCA 3 0

— — 0 48 49 51 100 0 0

— — 0 28 68 87 100 0 0

— — 0 56 78 93 100 0 0

Ex-1 and Ex-2: results from duplicate experiments with 41 double heterozygotes. Regression-based trend analysis was found to be highly significant (p = 5.47 × 10−6 for rs1057126 and p = 4.62 × 10−5 for rs15561). Mismatch nucleotides at different np are marked in bold and underlined. NAT1_NOS and MISMCH primers are located between the same nucleotide positions (np) (i.e. 18080652–18080668 np of chromosome 8; NCBI, Entrez gene, Built 37.1) and were without and with mismatches, respectively. NATC2 (located between 18080667–18067693 np) is a perfectly matched reverse primer, located 16 nucleotides away from SNP rs15561, and was initially used to know the genotypes at these loci by TaqMan experiments.

 C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

www.electrophoresis-journal.com

Nucleic acids

Electrophoresis 2012, 33, 3564–3573

3567

Table 2. Comparison of Ct values for amplification of T––C and A––A haplotypes from NAT1 double heterozygote samples using forward NATC1 and the reverse primers mentioned below

Reverse primers

Ct value of T—C haplotype (± SD)

Ct value of A—A haplotype (± SD)

p-Values

Remarks

3apNAT_NOS 3apNAT_MISMCH3 3apNAT_MISMCH6 3apNAT_MISMCH9 6apNAT_NOS 6apNAT_MISMCH3 6apNAT_MISMCH6 6apNAT_MISMCH9

25.25 ± 1.13 31.10 ± 0.88 26.59 ± 1.09 28.14 ± 1.59 22.96 ± 1.24 25.97 ± 1.20 23.45 ± 0.93 24.05 ± 0.71

26.12 ± 1.35 32.61 ± 1.08 28.72 ± 0.92 30.95 ± 3.02 24.22 ± 1.05 28.15 ± 2.04 24.90 ± 0.83 25.25 ± 1.43

0.007554 0.00008 a) 0.00011 0.230111 b) 0.003456 0.171934 0.013906 0.015815

NS S S NS NS NS NS NS

NAT1 double heterozygote carries T/A and C/A genotypes at two SNPs, respectively. 3apNAT_NOS and 6apNAT_NOS: primer’s 3 end starts from three and six nucleotides away from SNP rs15561 and they do not have mismatch (Supporting Information Fig. 1a). 3apNAT_MISMCH3 and 6apNAT_MISMCH3: mismatch at 3rd np in 3apNAT_NOS and 6apNAT_NOS primers, respectively. NS: non-significant; S: significant. a) Ct values were compared by paired t-test since two haplotype data were obtained from each heterozygote sample (n = 16). Bonferroni corrected significant p-value at 1% level is p ≤ 0.00125. b) Due to high SD, the differences in Ct values are nonsignificant. Table 3. Comparison of Ct values for amplification of major (T––C) and minor (A––A) allele haplotypes from double homozygote samples using forward NATC1 and the reverse primers mentioned below

Reverse primers

Ct from T/T—C/C homozygote (± SD)

Ct from A/A—A/A homozygote (± SD)

p-Values

Remarks

3apNAT_NOS 3apNAT_MISMCH3

26.48 ± 0.93 30.12 ± 1.51

26.02 ± 1.07 32.08 ± 1.71

0.043879 a) 0.000126

NS S

NAT1 major and minor allele double homozygotes carry T/T and C/C and A/A and A/A genotypes at two SNPs, respectively. 3apNAT_NOS: primer’s 3’ end starts from three nucleotides away from SNP rs15561 and it does not have mismatch (Supporting Information Fig. 1a). 3apNAT MISMCH3: mismatch at 3rd np in 3apNAT NOS. NS: non-significant; S: significant. a) Ct values from homozygote samples (n = 4 for each homozygote) were compared by Fisher’s two sample t-test since each haplotype data (T––C or A––A) were obtained from two different samples. Bonferroni corrected significant p-values at 1% level is p ≤ 0.005.

Biosystems). The experiment was repeated twice for each of 16 samples to check the reproducibility. 2.5 Genotyping at TP53 Initially, genotypes at rs12947788 (C/T) (dbSNP Build 129) and rs12951053 (G/T) (dbSNP Build 129) were determined by resequencing using the forward (P53F) and reverse primer P53ex7rev (Fig. 1C). The reverse primer P53ex7rev is three bases away from SNP rs12951053. To find out the effect of mismatch in the primer, TP53 SNPs were genotyped through sequencing using P53F forward primer and different sets of reverse primers (i.e. P53_NOS series) with or without mismatch (Table 4). Here, two SNPs are separated by 19 nts and the reverse primer, P53_NOS, starts just after SNP rs12951053 (Fig. 1C). Eleven DNA samples, heterozygous at both loci and five DNA samples which are minor or major allele homozygous at both loci, were chosen for these experiments. Reaction mixture for amplification comprises 1U high fidelity Fast Start Taq (FS-Taq) DNA polymerase (Roche Diagnostic), 10 ng of both primers and 50 ng genomic DNA in a total volume of 10 ␮L. The PCR protocol used 38 cycles and each cycle consists of 94⬚C for 30 s, 54⬚C for 45 s, and 72⬚C for 60 s. Initial PCR was performed using P53F forward primer and different re C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

verse primers in different reaction tubes. Amplified product was sequenced in ABI 3100 Genetic analyzer (Applied Biosystems) using only forward primer. Unlike NAT1, each TP53 mismatch primer had two different replacements at the same np. Each sample was genotyped twice and misgenotyping frequencies (%) for both the experiments are tabulated as Ex-1 and Ex-2 (Table 4). For the same reason as mentioned earlier for NAT1, sequencing PCR to determine sequence of initial PCR products was done with only the forward primer. 2.6 Sequence alignment For all chromatograms, initial observations with PhredPhrap-Consed alignment software (http://www.phrap.org/ phredphrapconsed.html) were reconfirmed using Seqman (DNASTAR software package; http://www.dnastar.com). These two algorithms check the peak heights of two alleles at an SNP position in a heterozygote. If the peak height of one allele is 70% or more than that of the other allele, then it is read as heterozygote. Since misgenotyping of heterozygote samples are being studied, a little more conservative approach was taken. In some cases, peak heights of two alleles in eletropherograms were checked by manual curation. If the smaller peak height is 60% or more than the other, it was considered as heterozygote, or else it was considered to be homozygote. www.electrophoresis-journal.com

3568

N. De Sarkar et al.

Electrophoresis 2012, 33, 3564–3573

Table 4. TP53 primers with or without mismatches and misgenotyping profile

Primers

Forward Reverse Reverse Reverse Reverse Reverse Reverse Reverse Reverse Reverse Reverse Reverse Reverse

ID

P53F P53_MISMCH15.1G P53_MISMCH15.2A P53_MISMCH12.1A P53_MISMCH12.2C P53_MISMCH9.1T P53_MISMCH9.2A P53_MISMCH6.1A P53_MISMCH6.2C P53_MISMCH3.1A P53_MISMCH3.2G P53_NOS P53ex7rev

Sequence

% Misgenotyping at SNP sites

5 ATCTTGGGCCTGTGTTATCT 3 5 GTGGAGGGGTAGTAGTATGG 3a) 5 GTGGAAGGGTAGTAGTATGG 3a) 5 GTGGATGGATAGTAGTATGG 3a) 5 GTGGATGGCTAGTAGTATGG 3b) 5 GTGGATGGGTATTAGTATGG 3b) 5 GTGGATGGGTAATAGTATGG 3a) 5 GTGGATGGGTAGTAATATGG 3a) 5 GTGGATGGGTAGTACTATGG 3b) 5 GTGGATGGGTAGTAGTAAGG 3a) 5 GTGGATGGGTAGTAGTAGGG 3a) 5 GTGGATGGGTAGTAGTATGG 3 5 ATGAGAGGTGGATGGGTAGTAGTA 3

rs 12947788 (C/T) Ex-1

rs 12947788 (C/T) Ex-2

rs 12951053 (T/G) Ex-1

rs 12951053 (T/G) Ex-2

— 0 0 36 9 18 45 18 91 91 100 0 0

— 0 0 36 36 73 64 73 54 100 100 0 0

— 0 0 18 9 36 18 18 91 82 100 0 0

— 0 0 18 18 9 9 45 45 100 100 0 0

Ex-1 and Ex-2: results from duplicate experiments with 11 double heterozygotes. Regression-based trend analysis was found to be highly significant (p = 5.61 × 10−8 for rs12947788 and p = 6.82 × 10−8 for rs12951053). Mismatch nucleotides at different nps (nucleotide positions) are marked in bold and underlined; P53_MISMCH series and P53_NOS primers were located between the same np (i.e. 7577428–7577447 np of chromosome 17; NCBI, Entrez gene, Built 37.1) and were with or without mismatches, respectively. P53ex7rev is also a reverse primer without any mismatches. For each mismatch nucleotide position, two different bases were inserted at same np such as P53_MISMCH15.1G and P53_MISMCH 15.2A primers have two different bases (G and A) at same np. a) Replaced by a purine. b) Replaced by a pyrimidine.

2.7 Calculation of thermodynamic parameters using Mfold Using Mfold Two-State Hybridization server (http:// mfold.rna.albany.edu/?q=dinamelt), thermodynamic parameters such as ⌬G and Tm of template-primer hybrids were predicted. These predictions were done before and after extension of the reverse primers up to the first and second SNPs which were calculated separately. The estimations were made at 46⬚C, 50 mM Na+ , 2.5 mM Mg++ and 54⬚C, 50 mM Na+ , 1 mM Mg++ , for NAT1 and TP53, respectively. The ⌬G and Tm were also estimated for expected DNA secondary structures using Mfold web server (http://mfold.rna.albany.edu/?q=mfold) at the same salt concentrations and temperatures.

2.8 Statistical analysis Regression-based trend tests were performed to check whether shifting of mismatch position from 3 end to 5 end of the primer has any diminishing effect on the proportion of misgenotyping. Since each experiment was replicated twice, effect of replication on the proportion of misgenotyping was checked by regression-based analysis. In fact, no such significant effect was observed, so data sets from two replications were combined for analysis. The entire analysis was performed after transforming the proportion of misgenotyping appropriately (sine inverse root transformation) so as to satisfy all the assumptions required for a regression-based  C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

analysis. Moreover, this transformation gave a stronger conclusion about the effect of misgenotyping with the shifting of mismatch position. For TP53, misgenotyping frequencies for differential nt replacement at any particular np (i.e., P53_MISMCH6.1A and P53_MISMCH6.2C) were also compared between themselves to check the effect of nt replacement on misgenotyping frequency. Since no significant effect was observed so data sets for differential nt replacement and experiment replication were combined in the regression analysis. For heterozygous samples (Table 2), paired t-tests were performed to compare Ct values of two strands retrieved from the same sample. Ct values from homozygote samples (Table 3) were compared by Fisher’s two sample t-test since the comparable samples are independent. For homozygote samples, to compare the quantitative PCR efficiency (i.e. Ct values) by match/mismatch primers, t-tests for equality of means were performed (Supporting Information Table 2).

3 Results At TP53, correct genotypes at both SNPs (rs12947788 and rs12951053) of heterozygotes were obtained when the initial PCR in the sequencing method was done using forward primer P53F and reverse primer without mismatch (P53ex7rev or P53_NOS) or with mismatch at 15th np (P53_MISMCH15.1G or P53_MISMCH15.2A) (Table 4). However, some or all of the double heterozygotes (n = 11) were sequenced as homozygotes when other reverse primers, www.electrophoresis-journal.com

Nucleic acids

Electrophoresis 2012, 33, 3564–3573

having mismatches at either 3rd or 6th or 9th or 12th np from 3 end of the P53_NOS, were used in the initial PCR of sequencing method (Figs. 1C and 2). Misgenotyping was highest (∼100%) and lowest (0%) when the mismatches were at 3rd and 15th np, respectively (Table 4). Regression model based analysis revealed a significant decreasing trend in misgenotyping frequencies as the mismatch position shifted from 3 end to 5 end of the primer (p = 5.61 × 10−8 for rs12947788 and p = 6.82 × 10−8 for rs12951053). Genotyping was also performed with mismatch primers having two different nt mismatches at the same np (e.g. P53_MISMCH3.1A and P53-MISMCH3.2G). It was revealed that transition or transversion mismatch at the same np, yielded misgenotyping at both loci of TP53 but without any significant difference in misgenotyping frequencies after Bonferroni correction for multiple tests (data not shown). Similarly, at NAT1, all (n = 41) double heterozygotes (at rs1057126 and rs15561), determined by TaqMan method using the reverse primer NATC2, were correctly sequenced as heterozygotes, when the forward primer (NAT3212) and any of the three reverse primers viz. NAT1_NOS, MISMCH15C, and NATC2 were used in the initial PCR of the sequencing procedure (Table 1, Fig. 1B, and Supporting Information Fig. 2). But, some or all these double heterozygotes samples were misgenotyped as homozygotes when other reverse primers, having mismatches at either 3rd or 6th or 9th or 12th np in NAT1_NOS, were used in initial PCR of sequencing method. Like TP53, frequency of misgenotyping at NAT1 decreased from 100 to 0% with a significant trend as the position of mismatch shifted away from 3rd to 15th np from 3 end of the primer (p = 4.62 × 10−5 for rs15561). Significant trend of misgenotyping was also observed at rs1057126 in heterozygotes (p = 5.47 × 10−6 ), although the frequencies of misgenotyping were not the same as that of SNP at rs15561 in heterozygote. Three samples, which were homozygous at rs1057126 but heterozygous at rs15561, were also misgenotyped at heterozygous locus (data not shown). But, SNPs at both NAT1 (data not shown) and TP53 (Supporting Information Figs. 3 and 4) were genotyped correctly in double homozygote samples (n = 4 and n = 5, respectively) when the respective forward and any of the respective mismatch reverse primers were used for initial PCR. Although, presence of SNP is mentioned in dbSNP build 131 (http://www.ncbi.nlm.nih.gov/snp/) at forward primer NAT3212 binding site, no SNP was detected in the DNA sequence covered by distant primer set (i.e. NATWD_F and NATC2, Fig. 1D) for these 48 samples confirming that no SNP is present at NAT3212 or NAT1_NOS or other MISMCH primer-binding sites (Fig. 1B and data not shown) in the DNA samples. So, presence of DNA sequence variation within the primer-binding sites of genotypic reactions (i.e. NAT3212 and NAT1_NOS primers) has been ruled out. It is also desirable to select distant primers located in different LD block than primers of genotypic reactions to minimize such risk. It has to be mentioned here that no SNP has also been reported in the binding sites of any of the primers used for TP53 experiments as per dbSNP build 131 (http://www.ncbi.nlm.nih.gov/snp/). Misgenotyping at NAT1 was also detected by sequencing  C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

3569

when another set of reverse mismatch primers at the positions of 3apNAT_NOS and 6apNAT_NOS (starting three and six nts away from the SNP, rs15561) was used for PCR (Supporting Information Table 1, Supporting Information Figs. 1a and 5 and data not shown). In contrast to the observation with NAT1_NOS series of mismatch primers (Table 1), there was no increasing or decreasing pattern in misgenotyping frequency other than 3apNAT_MISMCH primer series for rs15561 (Supporting Information Table 1). But all matched primers at these positions (i.e. 3apNAT_NOS and 6apNAT_NOS) were always successful in fetching the correct genotype by sequencing. The NATC1 forward primer and 3apNAT_NOS and 6apNAT_NOS set of reverse primers (Supporting Information Fig. 1b) were also used in TaqMan assay to check whether presence of mismatch in these reverse primers could modify quantitative yields in PCRs (Fig. 3A–D). Primers without mismatch (i.e. 3apNAT_NOS) did not show any significant difference in the Ct values in the synthesis of two haplotype strands (i.e. A—A and T—C) from NAT1 double heterozygote. Some mismatch primers, depending on its distance from the SNP (i.e. rs15561) and the position of mismatch in it, amplified two haplotype strands with significantly different Ct values (Table 2). When the mismatch base in the primer (say, 3apNAT_MISMCH3) was closer to SNP (rs15561), one of the two haplotype strands was synthesized preferentially (approximately tenfold more, Fig. 3B) as observed in qualitative experiments (i.e. only one allele being observed in sequencing chromatogram which is termed as misgenotyping, Supporting Information Fig. 5). It was also observed that closer the position of mismatch from the 3 end of the primer, greater was the quantitative difference (i.e. Ct values) in the production of two chromosome/haplotype strands (Fig. 3A–D). Again when the mismatch reverse primers start six nts away from the SNP rs15561 (i.e. 6apNAT_MISMCH series), they did not show any significant difference in the Ct values for amplification of two haplotype strands from the heterozygote samples (Table 2). The 3apNAT_NOS and NATC2 set of reverse primers were also used in TaqMan assay to check how the quantitative yield (i.e. Ct values) of homozygote samples varies with different sets of primers (Supporting Information Table 2). Significantly reduced PCR products (i.e. more Ct value) were obtained when the mismatch primer 3apNAT_MISMCH3 was used instead of perfectly matched primer 3apNAT_NOS in TaqMan assay.

4 Discussions In both NAT1 and TP53, closer the position of the 3 end of the mismatch primer, it showed greater frequency of misgenotyping in heterozygote samples (Tables 1 and 4). The presence of mismatch at either 3rd or 6th or 9th or 12th np may lead to preferential amplification of one of the two haplotype strands from heterozygote samples. As a result, many of these samples were detected as homozygotes by sequencing. Probably, amplification of only one haplotype www.electrophoresis-journal.com

3570

N. De Sarkar et al.

Figure 2. Sequencing chromatogram of TP53 PCR products from a heterozygote sample using forward P53F and reverse primers (P53_NOS, P53_MISMCH15.1G, etc.). Positions of SNPs, rs12947788 (C/T) and rs12951053 (T/G), are shown by arrows. Sample’s heterozygote alleles are read as “N” when the P53_MISMCH15.1G, P53_MISMCH15.2A, and P53_NOS are used as reverse primers but, in the same sample, alleles are read as T at rs12947788 and G at rs12951053 (i.e. minor alleles) when other mismatch reverse primers were used. Reverse primers relevant to chromatograms are shown on the right side of the figure.

strand may have occurred and thus 100% misgenotyping of heterozygotes as homozygotes was observed when the mismatch is at 3rd np in the reverse primer which is located next to target SNP. However, frequency of misgenotyping decreased significantly as the mismatch position in the primer was relocated gradually from 6th to 15th np (Tables 1 and 4). To nullify the alignment algorithm-dependent errors, variant  C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Electrophoresis 2012, 33, 3564–3573

alleles at SNP positions were read-out by two different alignment algorithms (PhredPhrap and Seqman). In most cases, these two programs call the variants correctly, but in few cases manual curation was done to check the presence or absence of variant alleles with the criterion that the peak height of one allele should at least be 60% of the other to be read as a heterozygote. There was an overlap in the binding site of the TaqMan probe for NAT1 SNPs and the NAT1_NOS series of primers (Fig. 1A). Thus, qualitative observation of misgenotyping at NAT1 could not be corroborated by quantitative TaqMan assay using NAT1_NOS and its mismatch primer series as reverse primers. So, different sets of reverse primers, located three and six nts away from the target SNP (Supporting Information Fig. 1a and 1b), were used for genotyping and quantitative TaqMan assay. TaqMan method showed changes in Ct values (Fig. 3B and 3C) for the synthesis of two template strands from heterozygote samples. The same mismatch primers were also used for sequencing and it showed misgenotyping with 3apNAT_MISMCH3 and 3apNAT_MISMCH6 primers (Supporting Information Fig. 5). Changes in Ct values indicate that one of the two haplotype strands (i.e. T—C haplotype) in heterozygote is preferentially amplified compared to the other homologous haplotype strand. As a result, it is observed as a homozygote in the sequencing chromatograms. As a negative control for this experiment, TaqMan assay was also performed using 3apNAT_MISMCH3 mismatch primer and double homozygote samples. This mismatch primer could amplify both T—C and A—A haplotypes consistently from major and minor alleles double homozygotes, respectively. Like heterozygote sample, 3apNAT_MISMCH3 primer could also amplify T—C haplotype more efficiently than A—A haplotype from double homozygote samples (p = 0.000126) even though they were amplified separately (Table 3). Although mechanism is not fully understood, preferential amplification of one haplotype (i.e. T—-C haplotype over A—A haplotype) from heterozygote may be affected by polymorphic allele, which resides next to the primer-binding site (NAT1_NOS and mismatch primer series, Fig. 1B) and interrupts the hybridization between mismatch primer and its binding site. This preferential amplification is not probably due to the infidelity of used Taq polymerase, since according the technical report (https://www.roche-appliedscience.com/PROD_INF/index. jsp?id=iforu&iforu_page=search&catalogNumber=0473831 4001) by Roche Applied Sciences, the used FS Taq polymerase works with high fidelity at an annealing temperature range 45–65⬚C with equal efficiency. Moreover, there was no preferential amplification of any haplotype strand when perfect reverse primer (say, 3apNAT_NOS) was used in PCR of double heterozygote and homozygotes samples (Tables 2 and 3, respectively). Again, misgenotyping frequency reduced and/or became irregular (compare Table 1 and Supporting Information Table 1) when the location of mismatch primer was shifted from 0 to 3 and three to six nts away from the target SNP (rs15561). This might have www.electrophoresis-journal.com

Electrophoresis 2012, 33, 3564–3573

Nucleic acids

3571

Figure 3. TaqMan quantification curve for PCR products having A––A and T––C haplotype strands from NAT1 heterozygote sample using forward primer, NATC1 and reverse primers (A) 3apNAT_NOS, (B) 3apNAT_MISMCH3, (C) 3NAT_ MISMCH6, and (D) 3NAT_MISMCH9 (Supporting Information Fig. 1). 3apNAT_NOS: matched reverse primer; 3apNAT_MISMCH3: reverse primer with a mismatch at 3rd np in the primer 3apNAT_NOS; similarly, 3NAT_MISMCH6 and 3NAT_ MISMCH9 are reverse primers with mismatches at 6th and 9th np, respectively.

happened due to reduced effect of polymorphic allele on interaction between mismatch primer and its binding site. Like the previously published report [11], significantly less PCR products were obtained in TaqMan assay, when the mismatch primer (e.g., 3apNAT_MISMCH3) was used instead of perfectly match primer (i.e. 3apNAT_NOS) as the reverse primer. The mean Ct values were 26.48 and 26.02 cycles for PCR products from major (T/T and C/C) and minor (A/A and A/A) alleles double homozygotes, respectively, when NATC1 and 3apNAT_NOS (i.e. perfectly match) primer pairs were used (Supporting Information Table 2). But the mean Ct values changed to 30.12 and 32.08 cycles for PCR products from major (T/T and C/C) and minor (A/A and A/A) alleles double homozygotes, respectively, when 3apNAT_MISMCH3 mismatch reverse primer was used. One explanation could be that polymerase makes contact with  C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

primer-template duplex up to seven bases downstream from the primer’s 3 termini and two more upstream bases of the template located immediate to the 3 end of primer [12]. So, a single mismatch up to 7th nt from primer’s 3 end may destabilize the structure of the ternary complex. This may result in dissociation of DNA polymerase from the complex and decrease in the synthesis of PCR product (Supporting Information Fig. 6). Apart from low efficiency of PCR amplification, it is also reported that mismatches at two consecutive nps (i.e. 2nd and 3rd nts from the 3 end of the primer) might also cause false PCR amplification of “O” allele in ABO blood group gene, depending on the nature of nts present in these two positions [11]. Absence of primer extension was reported when a singlestrand DNA template and a primer with single-nt mismatch, within the five nts from 3 end, were used. Primer extension

www.electrophoresis-journal.com

3572

N. De Sarkar et al.

efficiency increased from 1.8 to 69.3% when the position of mismatch shifted from 7th to 13th np from 3 end of the primer [6]. Other studies also reported lowest amplification when the mismatch is at 3 end or close to it [5, 13]. Earlier reports observed that some of the mismatch base pairing between primer and template, such as G·T, G·A had better primer extension efficiency compared to other mismatch base pairing [6, 14, 15]. In a contradictory report, Yaku et al. (2008) hypothesized that pyrimidine/pyrimidine and purine/pyrimidine mismatches at 2nd np support better PCR amplification than purine/purine mismatches if both 2nd and 3rd np had mismatches from the 3 end of the primer [11]. Few other reports revealed that the mismatch base pairing such as G·G, T·G, A·G, etc. are thermodynamically more stable than T·T, A·A, C·C, etc. [16, 17]. So there would be differential thermal stability for different mismatch base pairs but stable mismatch base pair usually results in better extension. In this study, primer P53_MISMCH6.2C, having C·C mismatch with template at 6th np, misgenotyped 72.5 and 68% [average per SNP, i.e. (91 + 54) × 1/2 and (91 + 45) × 1/2, respectively] of heterozygotes as homozygotes compared to 45.5 and 31.5% [average per SNP, i.e. (18 + 73) × 1/2 and (18 + 45) × 1/2, respectively] when the mismatch with template was C·A at the same np of P53_MISMCH6.1A (Table 4). Misgenotyping may also be explained by the stability of the duplex formed between the mismatch primer and template strand in terms of Tm and ⌬G, during PCR [4,18]. In comparison to match primer-template duplex, the mismatch primertemplate duplex formation results in greater ⌬G and lesser Tm value. This implies mismatch duplex becomes thermodynamically less stable for amplification at the experimental annealing temperature. It is noted that the changes in Tm and ⌬G are different for two alleles in heterozygote during formation of duplex. At NAT1, major/minor alleles at the two SNPs (rs1057126 and rs15561) are T/A and C/A, respectively. The mismatch primer binds with both template strands of the heterozygote and lowers the Tm compared to perfectly matched primer (Supporting Information Table 3a). Then it extends up to the SNPs, rs15561 and rs1057126, and the difference in sequence of the extended primers makes the major allele (T—C haplotype) primer-template duplex more stable compared to the minor allele (A—A haplotype) primer-template duplex in terms of ⌬G and Tm (Supporting Information Table 3b and c) [19]. As a result, major allele template might have been amplified more in NAT1 and the PCR products from heterozygotes were mostly major allele DNA (i.e. T—C haplotype strand with MISMCH3G primer, Supporting Information Fig. 2). Again, the mismatch at 3rd np in the primer leads to greater hindrance to the Taq polymerase binding [3, 5, 6] compared to the other mismatches at 6th or 9th or 12th np in the primers. These result in differential lowering of primer and Taq polymerase-binding efficiency. As a result, all heterozygotes were misgenotyped as homozygotes when the MISMCH3G (Table 1) was used as the reverse primer. Possibly, binding of Taq polymerase to the template-primer duplex was not hindered when the mismatch is at 15th np of the primer. So, all heterozygotes were amplified and se C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Electrophoresis 2012, 33, 3564–3573

quenced successfully as heterozygotes without any misgenotyping (Table 1). In TP53, frequency of misgenotyping of heterozygote samples followed similar pattern like NAT. Although, in contrast to NAT1, the minor allele template was preferentially amplified and over-represented in sequencing chromatogram (i.e. T—G haplotype strand with P53_MISMCH3.1A and P53_MISMCH3.2G primers, Fig. 2) since minor allele template-primer duplex was thermodynamically more stable in terms of ⌬G and Tm (Supporting Information Table 4a–c). In TP53, replacement with two different nts was used for every mismatch position. Frequencies of misgenotyping were similar for these replacements at any np except for the mismatch primers P53_MISMCH6.1A and P53_MISMCH6.2C (although not significant via test for proportions with Bonferroni corrected p value cut off) (Table 4). This might be due to maximum difference in Tms (55.2⬚C and 51.7⬚C) for these two primer-template duplexes (Supporting Information Table 4a). In this study, the mismatch primer binds to both the template strands but, in real situation, an unknown variation may be present in one or both template strands. This study mimics only the latter kind of situation. So, it is needed to design an experiment where unknown variation will be present in one of the two template strands and the target SNP should be present within nine nts of that unknown variation site. One more limitation is the TaqMan assay used to determine the absolute quantity of two haplotype strands. The probes, applied here, could not be used with the series of NAT1_NOS reverse primers since the TaqMan probes, spanning both the SNPs (rs1057126 and rs15561), also partially cover the reverse primer-binding site (Fig. 1A and B). For this reason, 3apNAT_NOS and 6apNAT_NOS primer sets were used in TaqMan assays (Supporting Information Fig. 1b, Table 2), which were three and six nts away from the target SNP rs15561, respectively. Even though 3apNAT_MISMCH series of mismatch reverse primers resulted in misgenotyping with sequencing method but TaqMan assay with the same series of primers showed changes only in threshold cycles but not an allele drop (Supporting Information Fig. 5, Fig. 3B and C, and Supporting Information Table 1). So, TaqMan experiments need to be designed in such a way so as to use a probe for the NAT1 SNP rs1057126 only, so that quantitative PCR efficiency of NAT1_NOS series of primers with and without mismatches can be explored. Mismatches may occur naturally when the primerbinding site of template contains an unknown SNP or sequence variation. Cautionary steps should be taken to avoid presence of SNP at the primer-binding site, located next to a target SNP, during primer design. Otherwise, similar kind of misgenotyping may happen. Implications of this study along with similar previously published works are evidently quite huge. Mismatch primers may yield errors in genotype and allele frequencies that may lead to misinterpretation of association and population genetic studies. This result also suggests that researchers should be cautious while using www.electrophoresis-journal.com

Electrophoresis 2012, 33, 3564–3573

mismatch primers in PCR-RFLP to create a restriction site to identify alleles in heterozygote samples. PCR-based target enrichment methods such as RainDance Technologies, Fluidigm Access Array, Ion Ampliseq [20–22], etc. are the popular platforms for present-day’s targeted next generation sequencing technologies. In these methods, primers are designed to bind to the targeted region on DNA directly. Eventually mismatch/es may occur to any of these primer-template hybrids due to unknown variation or somatic mutations in the primerbinding site. So, there is a high chance of biased strand synthesis. Special caution needs to be taken while using these methods for target enrichment. Thus, the present work on misgenotyping is not only relevant for Sanger sequencing technologies but also for some of the next generation sequencing technologies. Authors are grateful to the all individuals who donated blood for DNA isolation and Prof. R. R. Paul for his constant help during sample collection. We appreciate technical assistance of Mr. Badal Dey during resequencing, Dr. Indranil Mukhopadhyaya during statistical analysis, Ms. Anindita Ray and Mrs. Anindita Chatterjee during final version of manuscript writing. Fellowships NDS and MM’s were supported by CSIR, New Delhi, India, respectively. The authors have declared no conflict of interest.

5 References

Nucleic acids

3573

[5] Sub, B., Flekna, G., Wagner, M., Hein, I., J. Microbiol. Methods 2009, 76, 316–319. [6] Wu, J. H., Hong, P. Y., Liu, W. T., J. Microbiol. Methods 2009, 77, 267–275. [7] Kirsten, H., Teupser, D., Weissfuss, J., Wolfram, G., Emmrich, F., Ahnert, P., J. Mol. Med. 2007, 85, 361– 369. [8] Teupser, D., Rupprecht, W., Lohse, P., Thiery, J., Clin. Chem. 2001, 47, 852–857. [9] Majumder, M., Sikdar, N., Ghosh, S., Roy, B. Int. J. Cancer 2007, 120, 2148–2156. [10] Doll, M. A., Hein, D. W., Anal. Biochem. 2002, 301, 328– 332. [11] Yaku, H., Yukimasa, T., Nakano, S., Sugimoto, N., Oka, H., Electrophoresis 2008, 29, 4130–4140. [12] Li, Y., Korolev, S., Waksman, G., EMBO J. 1998, 17, 7514– 7525. [13] Bru, D., Martin-Laurent, F., Philippot, L., Appl. Environ. Microbiol. 2008, 74, 1660–1663. [14] Allawi, H. T., SantaLucia, J. Jr., Nucleic Acids Res. 1998, 37, 2170–2179. [15] Petruska, J., Goodman, M. F., Boosalis, M. S., Sowers, L. C., Cheong, C., Tinoco, I. Jr., Proc. Natl. Acad. Sci. USA 1988, 85, 6252–6256. [16] SantaLucia, J. Jr., Hicks, D., Ann. Rev. Biophys. Biomol. Struct. 2004, 33, 415–440. [17] Urakawa, H., Fantroussi, S. E., Smidt, H., Smoot, J. C., Tribou, E. H., Kelly, J. J., Noble, P. A., Stahl, D. A., Appl. Environ. Microbiol. 2003, 69, 2848–2856. [18] Zuker, M., Nucleic Acids Res. 2003, 31, 3406–3415.

[1] Innis, M. A., Gelfand, D. H., Sninsky, J. J., White, T., PCR Protocols, a Guide to Methods and Applications, Academic Press Inc., Sandiago, CA 1990, pp. 3– 20. [2] Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., Erlich, H. A., Science 1988, 239, 487–491. [3] Huang, M., Arnheim, N., Goodman, M. F., Nucleic Acids Res. 1992, 20, 4567–4573. [4] Kwok, S., Kellogg, D. E., McKinney, N., Spasic, D., Goda, L., Levenson, C., Sninsky, J. J., Nucleic Acids Res. 1990, 18, 999–1005.

 C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

[19] Petruska, J., Goodman, M. F., J. Biol. Chem. 1995, 270, 746–750. [20] Valencia, C. A., Rhodenizer, D., Bhide, S., Chin, E., Littlejohn, M. R., Keong, L. M., Rutkowski, A., Bonnemann, C., Hegde, M., J. Mol. Diag. 2012, 14, 233–246. [21] Tewhey, R., Warner, J. B., Nakano, M., Libby, B., Medkova, M., David, P. H., Kotsopoulos, S. K., Samuels, M. L., Hutchison, J. B., Larson, J. W., Topol, E. J., Weiner, M. P., Harismendy, O., Olson, J., Link, D. R., Frazer, K. A., Nature Biotech. 2009, 27, 1025–1031. [22] Hollants, S., Redeker, E. J. W., Matthijs, G., Am. Assoc. Clin. Chem. 2012, 54, 1–8.

www.electrophoresis-journal.com