Copy Number Variants in the Kallikrein Gene Cluster - DiVA portal

2 downloads 0 Views 474KB Size Report
Jul 22, 2013 - Two deletions were identified: one 2235-bp deletion in KLK9 ... 3000-8000 and 6000-14 000 years for the deletions in KLK9 and KLK15, ...
Copy Number Variants in the Kallikrein Gene Cluster Pernilla Lindahl1,2, Torbjörn Säll3, Anders Bjartell4, Anna M. Johansson5, Hans Lilja1,6,7,8, Christer Halldén2* 1 Department of Laboratory Medicine, Division of Clinical Chemistry, Lund University, Skåne University Hospital, Malmö, Sweden, 2 Biomedicine, Kristianstad University, Kristianstad, Sweden, 3 Department of Biology, Lund University, Lund, Sweden, 4 Department of Urology, Skåne University Hospital, Malmö, Sweden, 5 Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden, 6 Departments of Laboratory Medicine, Surgery (Urology), and Medicine (Genitourinary Oncology), Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America, 7 Nuffield Department of Surgical Sciences, University of Oxford, Oxford, United Kingdom, 8 Institute of Biomedical Technology, University of Tampere, Tampere, Finland

Abstract The kallikrein gene family (KLK1-KLK15) is the largest contiguous group of protease genes within the human genome and is associated with both risk and outcome of cancer and other diseases. We searched for copy number variants in all KLK genes using quantitative PCR analysis and analysis of inheritance patterns of single nucleotide polymorphisms. Two deletions were identified: one 2235-bp deletion in KLK9 present in 1.2% of alleles, and one 3394-bp deletion in KLK15 present in 4.0% of alleles. Each deletion eliminated one complete exon and created outof-frame coding that eliminated the catalytic triad of the resulting truncated gene product, which therefore likely is a non-functional protein. Deletion breakpoints identified by DNA sequencing located the KLK9 deletion breakpoint to a long interspersed element (LINE) repeated sequence, while the deletion in KLK15 is located in a single copy sequence. To search for an association between each deletion and risk of prostate cancer (PC), we analyzed a cohort of 667 biopsied men (266 PC cases and 401 men with no evidence of PC at biopsy) using short deletionspecific PCR assays. There was no association between evidence of PC in this cohort and the presence of either gene deletion. Haplotyping revealed a single origin of each deletion, with most recent common ancestor estimates of 3000-8000 and 6000-14 000 years for the deletions in KLK9 and KLK15, respectively. The presence of the deletions on the same haplotypes in 1000 Genomes data of both European and African populations indicate an early origin of both deletions. The old age in combination with homozygous presence of loss-of-function variants suggests that some kallikrein-related peptidases have non-essential functions. Citation: Lindahl P, Säll T, Bjartell A, Johansson AM, Lilja H, et al. (2013) Copy Number Variants in the Kallikrein Gene Cluster. PLoS ONE 8(7): e69097. doi:10.1371/journal.pone.0069097 Editor: Georgia Sotiropoulou, University of Patras, Greece Received February 25, 2013; Accepted June 4, 2013; Published July 22, 2013 Copyright: © 2013 Lindahl et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The work was supported by the Swedish Cancer Foundation (HL and AB), the Swedish Research Council (HL and AB), and European Union 6th Framework (P-Mark), Grant number LSHC-CT-2004-503011, David H. Koch, provided through the Prostate Cancer Foundation; the Sidney Kimmel Center for Prostate and Urologic Cancers; R33 CA 127768-03, R01 CA160816 grants to HL, and P50-CA92629 SPORE grant to H. I. Scher from the National Cancer Institute; the National Institute for Health Research Oxford Biomedical Research Centre based at Oxford University Hospitals National Health Service Trust and University of Oxford to HL, funding (grant no. 11-0624) to HL from the Swedish Cancer Society; and FiDiPro program award to HL from TEKES, Finland. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: Specification of patents: HL, Stenman UH. Assay of Free and Complexed Prostate-Specific Antigen (PSA). 1996; European Patent No. 540573. United States Patent No. US 5,501,983. Japanese Patent No. 2669566. HL, Lundwall Å, Lövgren J. Early Detection of Prostate Cancer (CAP) by Employing Prostate Specific Antigen (PSA) and Human Glandular Kallikrein (HGK-1). 1997; United States Patent No. US 5,614,372. HL, Lundwall Å, Lövgren J. Early Detection of Prostate Cancer (CAP) by determining a ratio involving Prostate Specific Antigen (PSA) and Human Glandular Kallikrein (HGK-1) concentrations. 1997; European Patent No EP0811164B1. HL, Stenman UH. Prostate-Specific Antigen (PSA) - proteinase inhibitor complexes. 1999; United States Patent No. US 5,912,158. HL, Stenman UH. Assay of Free and Complexed Prostate-Specific Antigen (PSA). 1999; United States Patent No. US 5,939,533. Pettersson K, HL, Nurmikko P, Lövgren T. Novel antibody, immunoassay and method for prostate cancer detection. 2009;European Patent No. 1320756. Pettersson; K, HL, Lövgren, T, Niemelä, P. Antibody, immunoassay and method for prostate cancer detection. 2011. United States Patent No. US 7,872,104 B2. The authors adhere to all the PLOS ONE policies on sharing data and materials, as detailed online in your guide for authors. * E-mail: [email protected]

Introduction

in general is more difficult to study than SNPs, much remains to be investigated regarding structural variants. The most easily detected structural variants are copy number variants (CNVs). The discovery and characterization of CNVs have largely been made using hybridization-based microarrays and nextgeneration sequencing [1]. A number of databases of CNVs have been established, including the Database of Genomic

Besides the extensively studied single nucleotide polymorphisms (SNPs) the human population harbors a large number of structural variants as well as short indels. Many of these appear to be neutral or nearly neutral whereas others have been associated with disease. Since this type of variation

PLOS ONE | www.plosone.org

1

July 2013 | Volume 8 | Issue 7 | e69097

CNVs in the Kallikrein Genes

conserved cysteine residues serving as template to enable correct folding, and the triad of catalytic codons [9]. KLKs are expressed in different tissues and have been implicated in a wide range of physiological processes. Multiple reports have suggested that KLK genes may be dysregulated in cancers. Several KLK genes have in previous studies shown significant associations with cancer risk and outcome, e.g. in breast cancer, ovarian cancer and prostate cancer [10,11]. Due to their early discovery, KLK1-3 have been extensively studied and prostate-specific antigen (PSA), encoded by KLK3, is currently the most widely used biomarker for prostate cancer [12]. The present study comprehensively investigates the KLK gene cluster on 19q13.4 for the presence of CNVs. Two deletions were detected and carefully characterized in study cohorts of mainly European ancestry in terms of their population frequencies, number of origins, ages, breakpoint sequences and their association with prostate cancer.

Variants (DGV; http://dgvbeta.tcag.ca/dgv/app/home? ref=NCBI36/hg18) which lists close to 300 000 CNVs that cover a large part of the genome. During the past 10 years, over 50 studies contributed information to this database and subsequent reports disseminated increasingly large data sets of CNVs. Many of these CNVs have been discovered in general surveys for variation, i.e. with no prior or primary selection for a specific phenotype. For example, Conrad et al. [2] investigated the occurrence of CNVs greater than ~500 bp in a subsample of the HapMap populations using microarrays with 42 million probes. They mapped 11 700 CNVs and generated reference CNV genotypes for approximately 5000 of these. During the last few years, there has been a tremendous increase in the number of reported CNVs as well as an increasing detail of their population frequencies. However, basic knowledge about their exact sizes and their breakpoint sequences are still missing at large. Conrad et al. [3] addressed this by characterizing more than 300 CNV breakpoints and identifying two major breakpoint signatures: 70% of the deletion breakpoints have 1-30 bp of microhomology, whereas 33% of deletion breakpoints contain 1-367 bp of inserted sequences. The co-occurrence of microhomology and inserted sequence was low, 10%. This suggests the presence of at least two major different mutational mechanisms. Exact definition of deletion breakpoints is critical for determining the precise functional impact of a CNV as well as allowing the construction of CNV-specific PCR assays aimed for identifying disease-associations. Naturally, the effects of CNVs vary with their position, and there is growing interest in understanding the relationship between specific CNVs in the human genome and variation in different phenotypes. CNVs are involved in many genomic disorders, both monogenic diseases and complex disorders [4]. Disease-associated CNVs have been discovered when candidate genes for different diseases have been investigated, but a number of studies have also used information about known CNVs to investigate their association to disease. This strategy is exemplified by Craddock et al. [5] who investigated more than 3400 CNVs for their association with disease in 16 000 cases of 8 common diseases and 3000 shared controls. Although multiple associations were identified, common CNVs were concluded to be an unlikely source of substantial contributions to the genetic basis of common disease. Other studies have, however, identified a large number of associations with both rare and common CNVs in different diseases such as attention deficit hyperactivity disorder [6], Crohns disease [7] and schizophrenia [8]. Human kallikrein-related peptidase (KLK) genes are located on chromosome 19q13, 4 in a 270-kbp region and are the largest contiguous group of protease genes within the human genome. The KLK locus consists of 15 genes (KLK1-15), with lengths ranging from 4.4 to 11.0 kbp. With the exception of the primate-specific duplication of a predecessor gene into KLK2 and KLK3, the KLK genes are transcribed from telomere to centromere. Still, all 15 KLK genes share the same exon/intron organization with 5 coding exons of equal lengths, and their proteins share a high degree of homology. They also share the same conserved translational start and stop sites, 10

PLOS ONE | www.plosone.org

Material and Methods Ethics statement This study uses 3 different materials to investigate the occurrence of CNVs in the kallikrein locus. The family material was obtained in a process including written informed consent from all participants. The second material was collected from volunteers in a process including verbal informed consent. These samples are completely anonymized. The third material was obtained in a process including written informed consent from all participants. These procedures have been approved by the Ethical Committee of Lund University and the Swedish Data Inspection Board.

Study subjects Three different study populations were investigated: the first is a family material, the second consists of anonymous unrelated individuals and the third consists of cases and controls for prostate cancer. To detect deletions based on the segregation of SNP markers in families, we used DNA isolated from whole blood from 190 individuals representing 40 threegeneration families. A majority of these families were represented by two grandparents, two parents and one child (study population 1). To search for the presence of CNVs by quantitative PCR, DNA from 285 unrelated individuals from the general population were investigated. In cases where a low frequency (0.8, MAF>0.05) and to these markers a limited number of previously investigated SNPs were added. In the screening for deletions using segregation of SNP markers, the necessary and sufficient criterion for deducing the presence of a deletion is that a parent-offspring pair is scored as ‘homozygous’ for different alleles. In reality they are then both heterozygous for the deletion and carry different alleles for the SNP investigated [13]. The subsequent characterization of the confirmed deletions used a subset of these SNP markers to analyze 285 unrelated individuals (Table S2). Genotypes were determined using the Sequenom MassARRAY MALDI-TOF system as previously described [14].

Haplotyping and age estimation Haplotypes of all deletion-carrying individuals were constructed using 68 SNP and 5 microsatellite markers distributed over the entire KLK locus. Two different methods were used to infer haplotypes in the deletion-carrying individuals. The first method involved the software PHASE v. 2.1 [16,17], using the deletions as dummy variables. The second was to manually infer the haplotypes. Since the data were obtained from single unrelated individuals, the haplotypes can be determined unambiguously only at homozygous loci. For the remaining loci, the haplotypes of the deletion-carrying individuals were constructed under the assumption that the deletion-carrying chromosomes also carried a common original haplotype that was consistently made as long as possible. Thus, only when two individuals were homozygous for different alleles were the haplotypes of the deletion-carrying chromosomes considered to be different. In every such case, the minor haplotype was considered to have experienced an exchange and the major haplotype was considered the original haplotype. This will be referred to below as the maximum principle and is explained in Text S1. The ages of the deletions were estimated using the software ESTIAGE [18]. This program uses the distribution of shared haplotype lengths and estimates the age to the most recent common ancestor (MRCA) of the sample, i.e. not to the origin of the mutation.

PCR quantification Relative copy numbers were determined using duplex TaqMan assays (Applied Biosystems, Foster City, CA, USA) and the CopyCaller Software (Applied Biosystems, v. 1.0). The duplex TaqMan PCR assays consisted of a FAM dye-labeled target assay and a VIC dye-labeled reference assay. The realtime PCR CT data were used by the CopyCaller Software to calculate sample copy number values by relative quantification using the comparative CT (∆∆CT) method. All genes were analyzed using at least one TaqMan assay. When a candidate CNV was found, a number of assays flanking the original assay were analyzed to confirm the finding and to roughly estimate the size of the CNV (Table S3). Four sample replicates were analyzed on an ABI Prism 7900HT Sequence Detection System (Applied Biosystems). The amplification was run using the following parameters: 95°C for 10 min, 40 cycles of 95°C for 15 s, and 60°C for 60 s. PCR reactions were performed in a total volume of 10 µl containing 10 ng template DNA, 5 µl TaqMan Genotyping Master Mix, 0.5 µl TaqMan Copy Number Assay, and 0.5 µl TaqMan Copy Number Reference Assay. The results were analyzed using the CopyCaller Software where the data was analyzed without a calibrator sample. Copy number predictions with less than 2 replicates and low quality data were eliminated from further study. Data of sufficient quality is defined as: 1) confidence values >95% and 2) Zscores 1%), the KLK9 deletion with an allele frequency of 1.2% and the KLK15 deletion with an allele frequency of 4.0%. In addition, both

PLOS ONE | www.plosone.org

7

July 2013 | Volume 8 | Issue 7 | e69097

CNVs in the Kallikrein Genes

Figure 2. Location of the breakpoint sequences of the deletion in the KLK9 gene. (A) An alignment of two repeated sequences of the KLK9 gene is shown together with their chromosome positions (Build 36.3). Dots highlight positions where the two sequences are different (unfilled dot shows a polymorphic nucleotide position). The line marked ‘seq’ is the result of the DNA sequencing of deletion-carrying individuals. Arrows indicate which of the homologous sequences agrees with the sequence from the deletion-carrying individuals: an arrow pointing up shows that the upper sequence agrees and an arrow pointing down shows that the lower sequence agrees. The differences between the two homologous sequences, indicated by boxes, limit the exact breakpoint position. (B) Breakpoints of the deletion are located between nucleotides 56 200 564-56 200 671 in intron 3 and nucleotides 56 202 799-56 202 906 in intron 2 of the KLK9 gene, which results in a 2235-bp deletion eliminating exon 3. doi: 10.1371/journal.pone.0069097.g002

Figure 3. Location of the breakpoint sequences of the deletion in the KLK15 gene. The deletion in the KLK15 gene is located between nucleotide 56 022 311 in intron 2 and nucleotide 56 025 704 in intron 1 of the KLK15 gene, which results in a 3394-bp deletion eliminating exon 2. doi: 10.1371/journal.pone.0069097.g003

marked accumulation of mutations in KLK9 and KLK15 compared with the other KLK genes, arguing against selection also at a shorter time scale.

PLOS ONE | www.plosone.org

When testing for association with PC no effect could be demonstrated. However, the power is limited due to the low frequencies of the deletion variants; a simulation shows that in

8

July 2013 | Volume 8 | Issue 7 | e69097

CNVs in the Kallikrein Genes

these deletions. Haplotyping of the CEU and YRI deletion chromosomes revealed the presence of the same KLK9 and KLK15 deletion haplotypes also in these populations. Given the allele frequency of a mutation, its age can also be estimated under the assumption of neutrality using the method of Kimura and Otha [21]. This method provides a means to calculate the overall expected age of a neutral allele. These calculations require an estimate of the long-term effective population size of the studied population. When all humanity is concerned, it is common to use 10 000 individuals as the effective population size. In the present case, the sample is taken from a limited section of humanity, and we have therefore used 5000 individuals, the same value used by Reish et al. [22]. Using the Kimura and Otha [21] expressions, the estimate of the overall expected age for the deletion in KLK9 was 1100 generations (33 000 years). For KLK15 the corresponding age estimate was 2682 generations (80 460 years). These deletion age estimates are fully compatible with the presence of the deletions both in the European and African populations. Taken together, the identical deletion haplotypes and the deletion age estimates strongly indicate that the origins of both deletions predate the out of Africa migration. With the exception of KLK8, KLK10, KLK11 and KLK12, all KLK genes have been reported to contain CNVs in the DGV. The reported CNVs partly overlap and vary both in size and frequency. Many have been detected in single individuals only. Deletions found in KLK9 and KLK15 have been observed in prior studies, but have not been fully characterized. For example, in a cohort of 2026 individuals, Shaikh et al. [23] identified two deletions in the KLK15 gene, one with an estimated size of 1739 bp in 22 individuals and one 5408-bp deletion in 2 individuals. In addition, a 2307-bp deletion in the KLK15 gene was detected in 38 out of 1184 individuals [24]. All three deletions were observed in heterozygous form and are probably identical with the KLK15 deletion reported in the present study. The varying size estimates could be due to the limited resolution of SNP chip data. Conrad et al. [2] observed a 1663-bp deletion in the KLK9 gene in 4 out of 450 individuals from the HapMap project. The position given for this deletion overlaps the position of the KLK9 deletion reported in the present study, and they probably represent the same deletion. MacArthur et al. [25] made an extensive search for loss of function (LoF) variants including CNVs in 185 subjects from the 1000 Genome project. After careful filtering they found that an average human carries approximately 100 LoF variants with 20-25 of these in homozygous form. These individuals are reported as phenotypically normal. They furthermore searched for common features of LoF tolerant genes and found that members of gene families, in particular those with highly homologous family members, are more likely to tolerate LoF mutations. In addition, MacArthur and coworkers [25] found mutations in several members of the KLK gene family, including KLK15. Thus, our observation concerning the deletions in KLK9 and KLK15 are in line with their findings.

Table 3. Summary table of the deletions in KLK9 and KLK15.

KLK9 Chromosome

Start

position

a

Stop

56 200 564-56 200 671 56 202 799-56 202 906 2235

Deletion size (bp)

KLK15 56 022 311 56 025 704 3394 0.040 (0.023,

Allele frequency

0.012 (0.006, 0.019)

Homozygotes

1/563

Single origin

Yes

Yes

258 (170, 394)

476 (320, 711)

95 (57, 158)

190 (126, 288)

b

Age estimation ,

b c

PHASE Maximum principle

0.056) 2/278

a. NCBI Build 36.3, chromosome 19. b. 95% confidence intervals within paranthesis. c. Generations.

the case of the KLK15 deletion the power is 0.26 for an odds ratio of 1.4. Still, the results clearly show that if effects exist they are limited. In addition, the sample used in the association test provides an independent population sample to the initial screening population, allowing a replication of the frequency estimates of the two deletions. Given that KLK9 and KLK15 are weakly expressed in most tissues including the prostate [20] it is not surprising that no copy number association with prostate cancer was found. The times to the MRCA were determined from two different estimates of the lengths of the common haplotypes in the chromosomes carrying the deletions. There are thus two size estimates per deletion and these estimates vary considerably. However, there are good reasons to believe that the correct size is within our estimates. Some of the haplotype configurations constructed by PHASE lead to overestimates of the times to the MRCA. On the other hand, the haplotypes constructed from the maximum principle are the longest possible for this data set and therefore most likely lead to underestimates of the ages. Using a generation time of 30 years, MRCA time estimates in the size ranges 3000-8000 years for KLK9 and 6000-14 000 years for KLK15 were obtained. This indicates that these deletions are likely to be found in other European populations. To screen for the occurrence of the deletions in other populations, the CEU (European) and YRI (African) populations in the 1000 Genomes data set were investigated. A KLK9 deletion mapping to the same locus and of exactly the same length as in the present study was found in both populations. In the CEU population the allele frequency was 0.022 and in the YRI population it was 0.006. A KLK15 deletion of the same length and position as reported here occurred in a frequency of 0.033 in the CEU population and a frequency of 0.014 in the YRI population. Thus, as suggested from the age estimates both deletions are spread in the European population. In addition, both deletions were detected also in the African population compatible with an early origin in human history of

PLOS ONE | www.plosone.org

9

July 2013 | Volume 8 | Issue 7 | e69097

CNVs in the Kallikrein Genes

Supporting Information Text S1. Description of haplotyping method. (DOC)

the

maximum

Assays used in the first screening for CNVs. We used one assay per gene. (B) Assays used to confirm the findings from the SNP study and the first part of the CNV study. (XLS)

principle

Table S4. Microsatellite markers analyzed in the kallikrein gene locus. (XLS)

Figure S1. Pattern of linkage disequilibrium (LD) for the HapMap CEU population. SNPs used to characterize the deletions in KLK15 and KLK9 with a frequency in the HapMap CEU population were used. LD is estimated and given as D'. Low LOD (