Genotyping errors

0 downloads 0 Views 3MB Size Report
May 21, 2014 - Non-invasive sampling and genotyping errors. • Causes of ..... stochastic allelic dropouts might not be detected by Hardy-Weinberg tests, but by ..... The guidelines are only valid under the following conditions: • A single target ...
Genotyping errors: Causes, consequences and solutions Mohamed Dadamouny Antje Gärtner Institute of Botany and Landscape Ecology Universität Greifswald Germany

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping errors • Pompanon F, Bonin A, Bellemain E, Taberlet P (2005) Genotyping errors: causes, consequences and solutions. Nature Reviews Genetics.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping errors

• • • • • •

Definition Non-invasive sampling and genotyping errors Causes of genotyping errors Quantifying genotyping errors Consequences of genotyping errors How to limit genotyping errors and their impact?

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Definition • A genotyping error occurs when the observed genotype of an individual does not correspond to the true genotype. • Genotyping errors can have strong consequences on the biological message that can be deduced from the data.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Definition • •

A genotyping error occurs when the observed genotype of an individual does not correspond to the true genotype. Genotyping errors can have strong consequences on the biological message that can be deduced from the data. Recently, the number of papers that deal with genotyping errors is increased. Distribution of papers on "genotyping errors" according to their publication year

Apparently, more attention is paid to genotyping errors. Genotyping Errors

subject

Genotyping errors are a concern for some research field only (linkage analyses, non-invasive methods). What about the other fields using genetic tools? (population genetics/genomics?) M. Dadamouny & Antje Gärtner 21.05.2014

The three different sampling methods • Destructive sampling. • Non-destructive sampling. • Non-invasive sampling.

• Destructive sampling. • The animal is killed in order to obtain the tissues necessary for genetic analysis. • This sampling strategy has been used extensively for isozyme studies, and for mtDNA analysis before PCR was discovered. • It has been abandoned by many researchers.

• Non-destructive sampling.

• The animal is often captured, and a biopsy or a blood sample is taken invasively. • However, some invasive sampling strategies do not require catching the animal. • For example tissues can be obtained from whales and some other large mammals by using biopsy dart guns.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Non-invasive sampling • This term should be restricted to situation where the source of DNA is left behind and is collected without having to catch or disturb the animal. • In the literature, non-destructive sampling is often improperly considered as noninvasive. • Catching a mammal (or a bird) and plucking a few hairs (or feathers) should not be considered as non-invasive, but rather as non-destructive. Non-invasive genetic sampling: only possible via PCR • Mullis KB, Faloona FA (1987) Specific synthesis of DNA in vitro via a polymerasecatalysed chain reaction. Methods in Enzymology, 155, 335-350. • Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science, 239, 487-491. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Potential of non-invasive genetic sampling: two opposing point of view • Non-invasive sampling can exploit the full potential of DNA analysis. – True for mtDNA. – Dominant opinion ten years ago. • Non-invasive sampling has serious limitations. – Many technical problems. – Possibility of genotyping errors.

Non-invasive sampling has serious limitations • Gerloff U, Schlötterer C, Rassmann K, Rambold I, Hohmann G, Fruth B, Tautz D (1995) Amplification of hypervariable simple sequence repeats (microsatellites) from excremental DNA of wild living bonobos (Pan paniscus). Molecular Ecology, 4, 515-518. • Taberlet P, Griffin S, Goossens B, Questiau S, Manceau V, Escaravage N, Waits LP, Bouvet J (1996) Reliable genotyping of samples with very low DNA quantities using PCR. Nucleic Acids Research, 26, 3189-3194. • Gagneux P, Boesch C, Woodruff DS (1997) Microsatellite scoring errors associated with noninvasive genotyping based on nuclear DNA amplified from shed hair. Molecular Ecology, 6, 861-868. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping errors: main difficulties in non-invasive sampling • Contamination. • Allelic dropout. • False alleles.

Contamination: Behind the possibility of detecting a single target molecule, there is also a possibility of detecting a single contaminant molecule. Working with non-invasive genetic sampling is similar to ancient DNA studies.

Allelic dropout: For a heterozygous individual, only one allele is present in the template and/or is amplified in the PCR reaction. This error produces a false homozygote.

False alleles: Artifacts can be generated during the first cycles of the PCR reaction, and can be misinterpreted as true alleles. Very difficult to discern from sporadic contamination. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Genotyping errors: example 1

5

10

15

20

25

30

35

40

Allele A Allele B

50 independent genotyping experiments using the same DNA extract (from a bear feces);

locus G10B.

Error Detection • Genotype errors can change inferences about gene flow. – May introduce additional recombinants • Likelihood sensitivity analysis – How much impact does each genotype have on likelihood of overall data Genotyping Errors

2 2 2 2 1 2 1 2 1 1 2 1 1

2 1 2 1 2 2 1 1 1 2 1 2 1

2 2 2 2 1 2

2 2 1 1 2 1 1

2 1 2 1 2 2

2 1 1 2 1 2 1

M. Dadamouny & Antje Gärtner 21.05.2014

Allelic dropout: mathematical model • The model is restricted to the genotyping of an individual bearing alleles A and B at an autosomal locus. • Many assumptions have been made. Allelic dropout: mathematical model assumptions • The DNA extract contains equal numbers of the alleles A and B. • A single target molecule can be amplified and detected. • Each single target molecule has the same probability of being amplified. • 100 PCRs and be performed using the DNA extract, and the target DNA molecules are distributed randomly among the 100 PCR tubes. • If the initial proportion between alleles A and B (A/B or B/A) in the PCR tube is greater than or equal to five, then only the most common allele will be detected. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

The problem of very small DNA samples: simulations Simulations for a heterozygote individual with alleles A and B. correct correct genotyping genotyping 3.5 3.5 pg pg of of template template DNA DNA per per reaction reaction tube tube 1: 1: B B tube 2: B tube 2: B tube tube 3: 3: -tube tube 4: 4: -tube tube 5: 5: A A tube 6: A tube 6: A tube tube 7: 7: A A tube tube 8: 8: A A tube tube 9: 9: -tube tube 10: 10:-tube tube 11: 11:ABB ABB tube tube 12: 12:AA AA

Genotyping Errors

tube tube 13: 13: -tube tube 14: 14: BA BA tube 15: BABB tube 15: BABB tube tube 16: 16: BB BB tube 17: tube 17: tube tube 18: 18: B B tube 19: A tube 19: A tube tube 20: 20: A A tube tube 21: 21: B B tube tube 22: 22: A A tube tube 23: 23: -tube tube 24: 24: --

14 14 pg pg of of template template DNA DNA per per reaction reaction tube 1: AABAB tube 13: AAAAB tube 2: BB tube 14: B tube 3: ABBBBB tube 15: AAAA tube 4: AABA tube 16:BBAAAB tube tube 5: 5: BBAAABA BBAAABA tube tube 17: 17:BABB BABB tube tube tube 6: 6: BBBB BBBB tube 18: 18:BAABAA BAABAA tube tube tube 7: 7: BAAB BAAB tube 19: 19:ABBBA ABBBA tube tube tube 8: 8: BAAA BAAA tube 20: 20:BBABA BBABA tube tube tube 9: 9: -tube 21: 21:BAB BAB tube tube tube 10: 10:AAB AAB tube 22: 22:BBA BBA tube tube tube 11: 11:BBB BBB tube 23: 23:-tube tube 12: 12:AABABBAB AABABBABtube tube 24: 24:AAAA AAAA

M. Dadamouny & Antje Gärtner 21.05.2014

Results of the simulations 100

PCR product (at least one allele)

% 80 60

correct genotyping (both alleles)

40

one cell contains about 7 picograms of DNA

20 0

0

5

10

15

20

25 30

35

40

45

50

template DNA per 55amplification (picograms)

!

DNA samples Probability of correct genotyping at a heterozygote microsatellite locus using very mall DNA samples

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Quantitative PCR Ref: Morin et al., 2001. Reliable genotyping of samples with very low DNA quantities using PCR. Nucleic Acids Research, 1996, Vol. 24, No. 16 3189–3194

Relationship between the initial amount of template DNA in the PCR and both the proportion of PCRs with amplification product (grey squares) and the proportion of PCRs with allelic dropout (black circles). Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Causes of genotyping errors • Very diverse, complex, and sometimes cryptic origins. • Grouping errors into discrete categories according to their causes is challenging. – DNA sequence. – Low DNA quantity or quality. – Biochemical artifacts. – Human errors.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

DNA molecules interactions

• Cause: DNA sequence flanking the marker – No or less efficient amplification because of a mutation in the target primer sequence (null allele) – Insertion or deletion in the amplified fragment (size homoplasy of different alleles) – In heterozygous individuals, preferential amplification of one allele when its denaturation is favoured (allelic dropout)

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Sample quality • Cause 1: Low DNA quality or quantity – In heterozygous individuals, amplification of only one allele (allelic dropout) – In heterozygous individuals, preferential amplification of the shorter allele (short allele dominance)

• Cause 2: Contamination of the DNA extract – Amplification of a contaminant allele (mistaken allele)

• Cause 3: Extract quality – No or less efficient restriction/amplification due to inhibitors (allelic dropout)

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Biochemical artifacts and equipments • Cause 1: Low quality reagents – Allelic dropout, mistaken alleles.

• Cause 2: Equipment precision or reliability – Allelic dropout, mistaken alleles.

• Cause 3: Taq polymerase errors – False allele.

• Cause 4: Lack of specificity – Mistaken allele.

• Cause 5: Electrophoresis artifacts – Size homoplasy of different alleles, mistaken alleles

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Human factor • Cause 1: sample manipulation – Confusion between samples (e.g. mislabelling or tube mixing) (mistaken allele(s))

• Cause 2: Experimental error – Contamination with an exogenous DNA or cross-contamination between samples (mistaken allele(s)) – Use of an inappropriate protocol (reagent forgotten, wrong hybridization temperature, primers, or concentrations of reagents) (allelic dropout, mistaken allele(s))

• Cause 3: Data handling – Misreading of the profile or misidentification of the fluorescent peak (mistaken allele) – Miscopying or confusion of the genotypes in the database (mistaken allele) – Computing data: bug in the database/analysis program (mistaken allele)

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Quantifying genotyping errors • Different estimates, based on replicates within a dataset, have been defined to quantify error rates. • Some metrics have been proposed for specific errors such as allelic dropouts or false alleles. • More global metrics, which take into account all types of detectable genotyping errors, are also commonly used although they have never been explicitly defined.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Quantifying genotyping errors • First, a reference genotype must be defined as the genotype that minimizes the number of errors in the comparison among replicates. • Several reference genotypes may exist. If only two replicates are performed and give contradictory genotypes, either one or the other can be considered as the reference. • The calculation of error rates is based on the number of mismatches between the reference genotype and the replicates.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Quantifying genotyping errors • n individual single-locus genotypes have been replicated t times. • For diploid individuals, 2nt alleles and nt loci are typed and can be compared to the reference. • Estimation of the error rates at the allelic, locus, multilocus, and reaction levels.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Mean allelic error rate ea

Mean error rate per locus el

ma ea  2nt

ml el  nt

• • The mean allelic error rate ea is the ratio between ma, the number of allelic mismatches, and 2nt, the number of replicated alleles.  • For microsatellite markers, the error rate• per allele can also be estimated for each particular allele to eventually point out • error-prone alleles (for example, alleles prone to dropouts).

The mean error rate per locus is the ratio between ml, the number of single locus genotypes including at least one allelic mismatch, and nt, i.e. the number of replicated single locus genotypes. This metric can also be estimated for each particular locus, to help identifying the errorprone loci. As it can be compared between studies and samples, it should become the standard metric.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014



Error rate per multilocus genotype eobs

mg eobs  nt • The observed error rate per multilocus genotype eobs is the ratio between mg, the number of multilocus genotypes including at least one allelic mismatch,  and nt, the number of replicated multilocus genotypes. • This metric is particularly informative for individual identification, parentage analyse or population size estimation. Genotyping Errors

Error rate per multilocus genotype l

eind  1  (1 ei ) i 1

• If genotyping errors occur independently among l loci (which is very unlikely), the error rate per multilocus genotype eind is deduced from the single-locus error rate ei at each locus i

M. Dadamouny & Antje Gärtner 21.05.2014

Error rate per reaction er

ml er  r

• The error rate per reaction er is the ratio between ml, the number of single-locus genotypes including at least one allelic mismatch and r, the total number of reactions. • This metric is equivalent to the mean error rate per locus when the PCR reaction  one locus or to the multilocus error rate when all loci are amplified in a involves single multiplex reaction.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Estimation of the error rates per allele and per locus, for four replicates (t=4) of three individuals (n=3) replicates Ind 1 Ind 2

Al 1 Al 2

Ind 3

Genotyped individuals

Al 1 Al 2

Al 1 Al 2

Genotyping Errors

1 A A A B A C

2 A B B B A C

3 B C B B A B

Reference genotype

4 A A A A B B A C

A A

Error rate per Error rate per allele locus

3/8

2/4

2/8

2/4

A C

1/8

1/4

mean

1/4

5/12

or

B B

M. Dadamouny & Antje Gärtner 21.05.2014

Example of error rates • Bonin et al. (2004): – Bear tissues: 0.008 per locus – Bear faeces: 0.019 per locus – AFLP: 0.019 to 0.026 per locus • Hoffman and Amos (2005): – 2000 antarctic fur seal genotyped at 9 microsatellite loci – 0.0013 to 0.0074 per locus – Human errors are the most important cause

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Example of genotyping error

Consequences of genotyping errors • Linkage and association studies. • Individual identification. • Population genetic studies. • • • •

May give false implikations for nature conservation . Status of populations. Level of heterozigosity. Bottle necks.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Linkage and association studies • Erroneous genotypes might markedly affect linkage and association studies by hiding the true segregation of alleles.

• The impact on the results is measured by experimental or simulation studies and can be serious even for low error rates (e.g. < 3%). • For example, in linkage studies, genotyping errors can affect the haplotype frequency and eventually lead to inflation of genetic map lengths. • Error rates as low as 3% have serious effects on linkage disequilibrium analysis, and a 1% error rate can generate a loss of 53-58% of the linkage information for a trait locus. However, modest error rates might be tolerable in situations that do not involve rare alleles, as in QTL studies.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Linkage and association studies • In association studies, because recombination is rare, errors mostly affect nonrecombinant genotypes, which are then erroneously interpreted as being the result of recombination. Errors therefore decrease the power for detecting associations. • The importance of the experimental design has also to be emphasised as it can generate errors that are not randomly distributed across phenotypes (i.e., differential errors). This can be the case when controls and cases are genotyped in different assays while investigating the genetic basis of a disease. Differential and non-differential errors can have opposite consequences on the rate of false positive in statistical tests of association.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Individual identification • Genotyping errors can strongly affect individual identification studies that are based on multilocus genotypes by erroneously increasing the number of genotypes observed in a population sample. • In census studies of rare or elusive species, the population size can be estimated based on the identified genotypes from non-invasive samples collected in the field (e.g., hair or faeces). In this context, genotyping errors can lead to a serious overestimate of population size. • A 200% overestimate of population size has been found with a 5% error rate per locus when using 7 to 10 loci for genotype identification (Creel et al., 2004). Such an overestimate obviously increases with the number of loci and with the number of samples per genotype.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Individual identification • Genotyping errors also have a huge impact in parentage analysis, generating wrong paternity or maternity exclusion. • Such information on population size and structure are required in conservation biology, and their inaccurate estimation due to genotyping errors could result in wrong decision in population management. • In forensic DNA analyses, a false multilocus genotype can prevent the identification of a corpse or lead to erroneous identification (or exoneration) of criminal offenders.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Population genetic studies • Most of the studies that take genotyping error into account in population genetics are those that use non-invasive samples, which are error-prone because of the low quality and/or quantity of DNA. • However, it has been demonstrated that even with high quality DNA the error rate might not be negligible. • The impact of genotyping errors remains largely unknown in this field, because very few studies have dealt with this topic until now. • Genotyping errors may lead to erroneous allele identification or allele frequencies, resulting in wrong Fst estimates, false migration rates, or false detection of selection or population bottlenecks.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Population genetic studies • Analyses based on allele frequencies will be less affected by errors than those based on individual identification (e.g., parentage analysis), but will be sensitive to sampling effects. • The apparent low impact of scoring differences has been demonstrated on an AFLP data set that was scored by two different scientists. The two scorers had only 38% of the marker loci in common, but the same biological conclusions about population genetic structure was extracted from the data. In this study, the robustness of the inferred biological message was certainly due to the redundancy of the information contained in the large amount of AFLP markers (more than 200 polymorphic loci screened by both scorers). • Population genomics studies looking for selected markers among several hundred markers would be very sensitive to the impact of genotyping error, especially if the errors are population-specific. There is a great need for studies on the impact of genotyping error in this new emerging field.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact? • The worse situation arises when a scientist realises at the end of a study that the data were not reliable due to genotyping errors, and that the dataset is not retrievable.

• Such situations are almost never reported in the literature, but their occurrence is probably not rare. • Therefore, it is important to take into account the possibility of genotyping errors when designing the experimental protocol.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact? Figure 2 | Flow chart that shows the important steps in a genotyping process for limiting the occurrence and effect of genotyping errors. The steps that end with a superscript letter (a–e) should be qualified as follows: a | The goal is to estimate the error rate associated with the samples, the method and the protocol used. This is done by replicating a sufficient number of samples. B | Deciding on an acceptable error rate depends on the error rate, the purpose of the genetic study, the genotyping method used, the ability to detect eventual errors and the cost in terms of money and time. c | The control study aims to find the cause of errors that did not exist in the pilot study.

d | The calculated error rate must be considered in the data analysis. e | The results should be published with a reliability index that Genotyping Errors is based on the error rate measured.

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact?

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact?

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact? • The strategy consists in demonstrating, via an appropriate procedure, that the data produced and the results obtained are reliable. • The diversity of case studies, error causes, and laboratory contexts makes it impossible to propose a universal and simple procedure.

• As a consequence, the possible solutions to limit the occurrence and the impact of genotyping errors are case-specific. • The optimal strategy will be determined by several factors, such as the biological question, the tolerable error rate, the sampling possibilities, the equipment and technical skills that are locally available, the financial support and time constraints.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

How to limit genotyping errors and their impact? • • • • • •

General recommendations. Limiting the production of errors during genotyping. Cleaning the dataset after genotyping. Analysing data taking into account the errors. Towards quality processes for genotyping. Practicals: establishing reliable experimental protocols (case studies).

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

General recommendations • A first step consists in checking that the genotyping experiments necessary to reach the scientific goal are realistic according to the sample quality and the technical skills available (bad sample quality and limited technical skills obviously influence the error rate). • A second step involves carrying out a pilot study designed to first evaluate the theoretical error rate compatible with the data analysis, and then to estimate the real error rate based on the analysis of a subset of the samples. • Finally, it is important to be aware of potential problems all along the experimental procedure, even after a successful pilot study, from sampling to data analysis.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

General recommendations

• Quality controls should be performed in real time during each step and each batch of experiments.

• They should also be diverse for being able to detect as many types of errors as possible. For example, highly reproducible errors such as null alleles cannot be detected by replicates, and require Hardy-Weinberg tests or inheritance studies. On the contrary, stochastic allelic dropouts might not be detected by Hardy-Weinberg tests, but by replicating the genotyping assays. • Control procedures are costly and time consuming. Thus the effort for reducing the error rate must be adapted to the foreseeable impact of the genotyping errors. • Because genotyping errors may be generated even with high quality standards, and because they cannot be all detected, efforts must be directed towards limiting both their production and their subsequent impact.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Limiting the production of errors during genotyping • Given that human factors can be the main issue during genotype production, the most efficient approach is to concentrate first on minimizing human error. • Only well-trained bench scientists/technicians should be involved, as suggested by quality assurance standards for forensic DNA testing laboratories. • Only standardized and validated procedures should be used. • Human manipulation should be reduced as much as possible according to the automation possibilities, from all handling and pipeting steps to allele scoring. However, for allele scoring, software packages are not yet sophisticated enough to prevent scoring errors. Semi-automated scoring followed by human visual inspection appears to be the most reliable procedure.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Limiting the production of errors during genotyping • Limiting genotyping errors during laboratory experiments requires the systematic use of an appropriate number of positive and negative controls, but also requires the implementation of replicates for real-time error detection and error rate estimation. • In every situation, even with high quality DNA, replicating five to 10% of the samples has been recommended, but the amount can vary according to the goal of the study and the potential impact of errors.

• As far as possible, these replicates have to be carried out blind and independently. • This involves implementing the blind process from the beginning of the experiment, by carrying out a systematic duplication of the samples during sample collection. Such a procedure will not only allow to detect all laboratory errors, but will also pick up handling errors at any stage of the analysis. Moreover, comparing blind samples and original experiments will produce a fair estimate of the error rate.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Limiting the production of errors during genotyping • When genotyping errors are highly probable, blind replicates are still necessary but not sufficient. The systematic replication of each genotyping assay (i.e., multipletube approach) may be required to define the consensus genotypes. • There is a trade-off between the cost of the experiments and the reliability of the genotypes. • One role of the pilot study is to determine the optimal number of replicates required. • In some cases, errors can also be detected by replicating the genotyping process using a different technology such as sequencing whose error rates are typically lower than standard genotyping technologies.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Cleaning the dataset after genotyping • Even if all erroneous genotypes detected during the experiments are removed, and eventually corrected after re-genotyping, some undetected errors will certainly remain in the data set. A part of them can still be detected or suspected by looking at the concordance with independent data. • The power of detecting errors by consistency with independent data can influence the strategy for limiting errors.

• It might be more efficient to retype erroneous genotypes detected by consistency checking than running a large proportion of blind replicates.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Cleaning the dataset after genotyping • Testing Hardy-Weinberg equilibrium is common to check the quality of the data, under the assumption that a high error rate implies disequilibrium. However, many other causes can lead to disequilibrium, including selection, inbreeding and population admixture. • Moreover, just a few types of error might produce disequilibrium, such as null alleles and allelic dropouts.

• Therefore there is still a need for other controls and replicates for detecting errors that are compatible with Mendelian inheritance and Hardy-Weinberg equilibrium.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Cleaning the dataset after genotyping • Several computer programs specifically designed to detect potential errors are now available.

• Most of them check for Mendelian consistency and/or Hardy-Weinberg equilibrium, and are commonly used for pedigree analyses and linkage studies. • Some others have been developed to track some kinds of errors that can be compatible with Mendelian inheritance or Hardy-Weinberg equilibrium. For example, some detect a spurious excess of recombinants in linkage studies and others focus on inconsistencies between replicates.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Cleaning the dataset after genotyping • Removing errors might not reduce bias, depending on the number and kind of errors detected and the bias each one creates. • For instance, when correcting Mendelian-incompatible genotypes by retyping or removing families in which they occur, the undetected errors can produce an excess of false positives for some family-based association tests. This problem has been addressed by developing an appropriate Likelihood Ratio Test based on a general genotype error model. • In general, taking into account the occurrence of errors in the analysis is crucial, especially for large or error-prone data sets.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Computer programs for detecting errors GEMINI PAWE PREST Pedcheck PedManager MENDEL SIMWALK Genocheck R/QTL CERVUS GIMLET RelioType Micro-checker DROPOUT PARENTE PAPA PseudoMarker TDTae LRTae Genotyping Errors

http://pbil.univ-lyon1.fr/software/Gemini/gemini.htm http://linkage.rockefeller.edu/pawe/ http://fisher.utstat.toronto.edu/sun/Software/Prest/ http://watson.hgen.pitt.edu/register/docs/pedcheck.html http://www.broad.mit.edu/ftp/distribution/software/pedmanager/ http://www.genetics.ucla.edu/software/ http://www.genetics.ucla.edu/software/ http://softlib.rice.edu/geno.html http://www.biostat.jhsph.edu/~kbroman/qtl/ http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html http://pbil.univ-lyon1.fr/software/Gimlet/gimlet.htm http://www.cnr.uidaho.edu/lecg/pubs_and_software.htm http://www.microchecker.hull.ac.uk http://www.fs.fed.us/rm/wildlife/genetics http://www2.ujf-grenoble.fr/leca/membres/manel.html http://www.bio.ulaval.ca/louisbernatchez/downloads_fr.htm http://www.helsinki.fi/~tsjuntun/pseudomarker/ ftp://linkage.rockefeller.edu/software/tdtae2/ ftp://linkage.rockefeller.edu/softare/lrtae/ M. Dadamouny & Antje Gärtner 21.05.2014

Towards quality processes for genotyping • In every scientific discipline, the reliability of the conclusions strongly depends on the quality of the data. • For geneticists, genotyping errors may strongly affect the results. • The protocol used for minimizing the occurrence of errors, the methods for error detection, and the estimated error rate should be provided for each study. • With this information, it will be possible to assign to each genotype a quality index, allowing the scientific community to have a critical view when unexpected results are published.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards quality processes for genotyping • More and more studies, often in the context of international programs, generate enormous datasets that cannot be produced in a single laboratory. • The reproducibility of genotyping becomes more and more important. • Even for markers known to be robust (SNPs, microsatellites, AFLPs), differences may appear among laboratories and over time within the same laboratory.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards quality processes for genotyping • Expression studies using microarray experiments are known to be errorprone, and the scientific community reacted in designing strict standards: the “Minimum Information About a Microarray Experiment” (MIAME) produces a checklist to guide authors and journal editors to ensure that data are made publicly available in a format that enables unambiguous interpretation and potential verification of the conclusion. • It includes several steps verifying for instance experiment design, sample preparation, and data measurement.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards quality processes for genotyping • Genotyping errors have been identified since the early beginning of molecular genetics. • Their consequences in statistical genetics were pointed out in 1957, and null alleles in blood groups have been recognised since 1938.

• They remained too often neglected in the past and it is clear that they merit much more attention according to their dramatic impact in some studies. • Recently, many papers have dealt with genotyping errors, and it seems that the scientific community begin to realise their importance.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards quality processes for genotyping • The fields of ancient DNA and gene expression suffered a crisis of confidence, with series of erroneous papers published in leading journals. As a consequence, these two scientific communities were able to set up strict standards that promoted data quality and solved the crisis.

• In population genetics, the situation is different because only a few erroneous papers have been published. Therefore, this community has not been apparently strongly pushed to establish strict standards. Another explanation for the delay in establishing strict standards might be related to the complexity of the problems. • According to the recent awareness about genotyping errors occurrence and about their potential impact, it can be predicted that more and more attention will be paid to these difficulties when designing experimental protocols and publishing results.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards a quality index • Goal: estimate a quality index associated to each sample. • This quality index should allow comparisons among samples, loci, and studies. • Restricted to the situation where the multiple-tube approach is used. • The estimation of the quality index (QI) is based on the analysis of the whole set of electropherograms produced when using the multiple-tube approach. • For each locus of a given sample, a QI is estimated using the following steps: – Step 1: estimation of the most likely consensus genotype. – Step 2: estimation of the score for each repeat. – Step 3: estimation of the quality index for the locus.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Towards a quality index • Step 1: estimation of the most likely consensus genotype after simultaneous observation of the electropherograms corresponding to the different repeats of this locus. An allele is considered only if it is present at least twice among the different repeats. • Step 2: estimation of the score for each repeat. If the electropherogram at one repeat corresponds to the consensus genotype, the score "1" is assigned, otherwise the score "0" is assigned, whetever the differences. • Step 3: estimation of the QI for the locus. The scores assigned to each repeat are summed, and divided by the number of repeats. • Step 4: estimation of the mean QI per locus and per individual. Additional rules • No signal is scored as "0". • Electropherograms with an additional allele are scored as "0". • If the less intense allele is less than 20% of the most intense allele, a score of "0" is given. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Quality index: examples Multiple-tube approach, 8 repeats 1 2 3 4 5 6 7 8

Genotyping Errors

1

Multiple-tube approach, 8 repeats 1

Step 1: consensus genotype

1

2

1

3

1

4

1

Step 2: score for each repeat

5

1

6

1

7

1

Step 3: quality index

1.00

8

0 1

Step 1: consensus genotype

0 0 Step 2: 1 score for each repeat 0 0 0

Step 3: quality index

0.25

M. Dadamouny & Antje Gärtner 21.05.2014

Quality indexes for loci, samples, and study Samples 1

2

3

4

5

mean

1

0.88

0.63

0.75

0.00

1.00

0.65

Locus 2

1.00

0.38

1.00

0.25

1.00

0.73

3

1.00

0.25

0.63

0.25

1.00

0.63

0.96

0.42

0.79

0.17

1.00

0.67

mean

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Guidelines for genotyping very small DNA samples • Multiple-tube approach. • Navidi W, Arnheim N, Waterman MS (1992) A multiple-tube approach for accurate genotyping of very small DNA samples by using PCR: statistical considerations. American Journal of Human Genetics, 50, 347-359. • Taberlet P, Griffin S, Goossens B, Questiau S, Manceau V, Escaravage N, Waits LP, Bouvet J (1996) Reliable genotyping of samples with very low DNA quantities using PCR. Nucleic Acids Research, 26, 3189-3194.

The guidelines are only valid under the following conditions: • • • • •

A single target molecule can be detected. The amount of template DNA is very low, in the picogram range, but is not accurately know. Confidence of 99%. Heterozygotes: an allele can be recorded only if it has been found at least twice. Homozygotes: an individual can be considered as homozygous only if eight independent experiments have shown the same allele.

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

"+A" artifact

…CGATCGTTAATCAGAATGCATACCGCA …GCTAGCAATTAGTCTTACGTATGGCG

Three solutions • Enzymatic treatment of the PCR product with T4 DNA polymerase to remove the additional "A". • Modification of the PCR parameters. • Modification of the 5' end of the non-labeled primer. Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

"+A" artifact Modification of the primer: principle

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

"+A" artifact Modification of the primer: result

Genotyping Errors

M. Dadamouny & Antje Gärtner 21.05.2014

Thanks for listening