Human somatic cell mutagenesis creates genetically tractable sarcomas

6 downloads 0 Views 1MB Size Report
Aug 17, 2014 - human cells for mutagenesis could yield cancer genes that are less ...... The COSMIC (Catalogue of Somatic Mutations in Cancer) database.
Articles

Human somatic cell mutagenesis creates genetically tractable sarcomas

npg

© 2014 Nature America, Inc. All rights reserved.

Sam D Molyneux1, Paul D Waterhouse1, Dawne Shelton2, Yang W Shao1, Christopher M Watling1, Qing-Lian Tang1, Isaac S Harris1, Brendan C Dickson3, Pirashaanthy Tharmapalan1, Geir K Sandve4, Xiaoyang Zhang1, Swneke D Bailey1, Hal Berman1, Jay S Wunder3, Zsuzsanna Iszvak5, Mathieu Lupien1, Tak W Mak1 & Rama Khokha1 Creating spontaneous yet genetically tractable human tumors from normal cells presents a fundamental challenge. Here we combined retroviral and transposon insertional mutagenesis to enable cancer gene discovery starting with human primary cells. We used lentiviruses to seed gain- and loss-of-function gene disruption elements, which were further deployed by Sleeping Beauty transposons throughout the genome of human bone explant mesenchymal cells. De novo tumors generated rapidly in this context were high-grade myxofibrosarcomas. Tumor insertion sites were enriched in recurrent somatic copynumber aberration regions from multiple cancer types and could be used to pinpoint new driver genes that sustain somatic alterations in patients. We identified HDLBP, which encodes the RNA-binding protein vigilin, as a candidate tumor suppressor deleted at 2q37.3 in greater than one out of ten tumors across multiple tissues of origin. Hybrid viral-transposon systems may accelerate the functional annotation of cancer genomes by enabling insertional mutagenesis screens in higher eukaryotes that are not amenable to germline transgenesis. Retroviruses and transposons have been used separately as somatic cell insertional mutagens to identify cancer drivers in model organisms1,2. Slow transforming retroviruses were first administered to newborn mice to identify cellular oncogenes that are activated during leukemogenesis3 and were used later in large-scale screens to decipher collaborative cancer gene networks using animals with tumor-prone genetic backgrounds1. Transposons are another class of selfish genetic elements that have been adapted for genome engineering and global mutagenesis. Particular success has been realized using Sleeping Beauty (SB), a transposon resurrected from salmonid fish4, which functions as a two-part system requiring transposase activity for mobilization. SB transposition from chromosomal concatamers in mice was a major milestone and showed that transposons can be employed as cancer insertional mutagens for forward genetic screens in mammals, allowing driver genes to be identified using model species2,5. Tissue-specific expression of transposase then adapted this approach to the generation of cancer subtypes and addressing cell-oforigin questions6–8. Given the influence of chromosome and genomic architecture on the spectrum of alterations in cancer9,10, the use of human cells for mutagenesis could yield cancer genes that are less accessible in mice. Porting the tools of insertional mutagenesis to human tissues would offer a means to broadly corrupt the normal somatic cell genome, generate human cancers from primary cells and map the causative mutations.

RESULTS Hybrid vectors for cancer forward genetic screens in human cells Human primary cell mutagenesis has, so far, used mostly ionizing radiation and chemical carcinogens, which are less amenable to forward genetic screens. Insertional mutagenesis in primary human cells is hindered by the refractory nature of these cells to transfection and transformation. To overcome this obstacle, we constructed a vector called Lentihop that combines the transduction efficiency of lentiviruses for both dividing and nondividing primary cells11 with the high mobility of Tc1/Mariner cut-and-paste transposons12 (Online Methods). Briefly, we installed SB transposon repeats, flanking gainor loss-of-function elements, between the long terminal repeats of an HIV-1–derived self-inactivating lentiviral vector (LV) backbone (Fig. 1a)11. We paired this with an optionally Tet-regulatable lentiviral vector to coexpress a hyperactive SB transposase and GFP (Fig. 1d)13,14 and developed a PCR-based assay to detect SB mobilization from Lentihop (Fig. 1b). The SB repeats did not block key processes during retroviral packaging or transduction, and doxycycline (Dox) treatment was efficient at inducing transposition from genomically integrated proviruses in Lentihop-transduced HEK 293T cells (Fig. 1e and Supplementary Fig. 1a,b). SB mobilization could also be detected in Lentihop-transduced human cells representing key lineages: mesenchymal (bone derived) and epithelial (ovarian and

1Department of Medical Biophysics, Ontario Cancer Institute, University of Toronto, Toronto, Ontario, Canada. 2Digital-biology Center, Bio-Rad Laboratories, Pleasanton, California, USA. 3Department of Pathology and Division of Orthopaedic Surgery, Mount Sinai Hospital, Toronto, Ontario, Canada. 4Department of Informatics, University of Oslo, Blindern, Oslo, Norway. 5Max-Delbruck-Center for Molecular Medicine, Berlin, Germany. Correspondence should be addressed to R.K. ([email protected]).

Received 23 October 2013; accepted 23 July 2014; published online 17 August 2014; doi:10.1038/ng.3065

964

VOLUME 46 | NUMBER 9 | SEPTEMBER 2014  Nature Genetics

Articles a

e

Lentivirus

SIN

EF1α

IR/DR

cPPT

IR/DR

SD

SIN

0

1

2

3

4

5

6

7

8 Time (d) SB in provirus NS

(G)

SB mobilized

SB transposon

cPPT

IR/DR

b

SA SV40 pA

pLentihop vectors

SIN

IR/DR

SA

SIN

GFP MFI

1,600

(L)

Induction

1,200 800 400 0

2,248- or 1,851-bp PCR product

Washout 0

2

0

1

2.5

4 Time (d) 5

7.5

6 10

8 20

50 Dox (ng/ml) SB in provirus NS

SB transposase 202-bp PCR product

SB mobilized 1,600

npg

© 2014 Nature America, Inc. All rights reserved.

cPPT

SIN

GFP MFI

SIN

c

800 400 0

Footprint SB transposition assay primers

ddPCR copy-number assay

d

f

Induction

1,200

0

20

Bone (p3) (p5)

(G) (L) NS

40 60 Dox (ng/ml)

80

100

Mammary Mammary BRCA-Mut Ovarian SB in provirus

pLVCT-GFP/SB100X Tet-o SIN

cPPT

CAG

GFP

i SB100x WPRE Tet-o SIN

SB mobilized –

+

+



+



+

Regulatable (Dox) –

+

Constitutive (Virus)

Figure 1  Operational characteristics of integration-proficient hybrid mutagenesis vectors. (a) Lentihop vector constructs for GoF (G) and LoF (L) gene disruption. cPPT, central polypurine tract; SIN, LV self-inactivating long terminal repeats; IR/DR, SB inverted and direct repeats; SD, splice donor; SA, splice acceptor; SV40 pA, simian virus 40 virus poly(A). (b) Detection of transposon mobilization from genomically integrated LV proviruses using PCR (1,851-bp band, LoF provirus; 2,248-bp band, GoF provirus; 202-bp band after SB mobilization). The primer locations are indicated (red lines, transposition assay primers; green lines, ddPCR primers). (c) Representative noncanonical footprint of SB transposition from LV proviruses (red) obtained by Sanger sequencing of PCR-amplified junctions in HEK 293T cells. (d) Lentiviral vector for constitutive or regulatable SB transposase expression. Tet-o, tetracycline operator; CAG, cytomegalovirus/chicken β-actin promoter; i, internal ribosomal entry site; SB100X, hyperactive SB transposase; WPRE, woodchuck hepatitis virus post-transcriptional regulator element. (e) Time- and doxycycline dose–dependent mobilization of SB from integrated Lentihop proviruses in HEK 293T cells co-transduced with SB100X and tTR-KRAB expression vectors measured by semiquantitative genomic PCR. FACS for GFP expression (coexpressed with SB100X) was performed in triplicate. Error bars, s.d. NS, nonspecific band. MFI, mean fluorescence intensity. (f) Lentihop transposition in primary human mesenchymal cells (bone) and epithelial cells (mammary, both normal and BRCA mutant (BRCA-Mut); ovarian). Lentihop-transduced cells were engineered to express SB100X (constitutive) or co-transduced with tTR-KRAB expression (regulatable) and treated with 100 ng doxycycline for 10 d to enable mobilization. p3 and p5 indicate the cell passage number after Dox induction. The PCR reaction detects proviral resident SB (1,851-/2,248-bp, L/G) and SB-mobilized (202-bp) bands.

mammary) (Fig. 1f). DNA sequencing yielded hallmark footprints of SB transposition from within proviruses (Fig. 1c). In contrast to integrase-deficient hybrid LV-SB vectors that have been developed to improve safety during gene therapy15,16, Lentihop leverages integrase proficiency for elevated mutagenesis. We explored whether human somatic cell insertional mutagenesis could enable a forward genetic screen to functionally annotate genes that drive sarcomagenesis (Fig. 2a). Mesenchymal cells are the presumed cell of origin of sarcomas. Building on established blueprints for transforming human cells17,18, we created a genetic background that was reminiscent of the mutations found in sarcomas in primary cells derived from bone tissue explants of patients with hip replacement (Fig. 2a and Supplementary Fig. 2a). Briefly, we engineered the cells to suppress the p53 and Rb pathways and overexpress both the Myc oncogene and hTERT, which we followed with the addition of SB100X transposase to enable SB transposition (T5RM-SB cells; Fig. 2a). To enhance the gene disruptive capacity of the Lentihop LV and SB components, we employed both constitutively active and regulatable transposase Nature Genetics  VOLUME 46 | NUMBER 9 | SEPTEMBER 2014

expression to enable the production of a variety of insertion types (SB resident in LV provirus, LV provirus alone or SB mobilized; Fig. 2a) across the genome in the context of either long- or short-term transposon mobilization. Profiling 15 microsatellite short-tandem repeat loci confirmed the identity of the transduced cells as matching the original patient explant (Supplementary Table 1). Transduction with Lentihop (LH) viruses generated T5RM-SB-LH and T5RM-SB/Tet-LH(+/–Dox) cells. We expanded these sensitized but tumor-naive cell populations briefly in culture and then functionally screened them for tumorigenic growth in vivo. Strikingly, 32 of 51 injected sites developed tumors in mice within 4 weeks (Fig. 2b and Supplementary Fig. 3a). In comparison, control cells (T5R and T5RM) did not form tumors over the same time period, unless HRAS (p.Gly12Val) was present (T5R-Ras and T5RM-Ras cells; Supplementary Fig. 2b,c). Lentihop mutagenesis generates clinically relevant tumors We investigated whether the histology of tumors arising from mutagenized bone-derived cells matched with the sarcoma subtypes found 965

npg

© 2014 Nature America, Inc. All rights reserved.

Articles Figure 2  Human myxofibrosarcomas induced by viral-transposon mutagenesis. (a) Schema for using human tissue explant cultures in a viral-transposon mutagenesis screen for sarcoma driver genes. (b) Flow chart of the experimental design for the Lentihop screen tumorigenesis assays. Parental cells harboring SB100X transposase (T5RM-SB) were transduced with Lentihop vectors with or without a Tet repressor (tTR-KRAB) construct. The resulting cell populations were injected subcutaneously at 5 × 105 cells into the flanks of nude mice, which were evaluated for tumors at 4 weeks, followed by harvesting. The numbers of mice and flanks used, as well as the tumors generated, are indicated. (c,d) Hematoxylin and eosin (H&E) stain showing nodular tumor growth with variable cellularity underlying the dermis (c) and spindle cell morphology with prominent myxoid stroma (d). (e) Immunohistochemistry of smooth muscle actin revealing patchy expression. (f,g) Electron microscopy showing that cells with irregular nuclear contours and cytoplasmic microfilaments at 7,400× (f) and an atypical tripolar mitotic figure (arrow) are found in H&Estained sections (g). (h,i) Immunohistochemistry confirming a lack of staining for S100 (h) and pan cytokeratin (i). Endogenous nerve and adnexal structures served as positive controls (arrows) in h and i, respectively. (j) H&E stain of a human superficial myxofibrosarcoma resected from the proximal arm of a 54-year-old male for comparison. There is similarity between the patient-derived myxofibrosarcoma (j) and the Lentihop-induced human sarcoma (d). Scale bars (c–e,h–j), 200 µm; 3 pm (f); 40 µm (g).

a

2. Generate oncogenic background in tumor-naive cells 1. Establish primary cultures

3. Transduce hTERT Lentihop vectors p53 Rb (+/– Dox) Myc SB100x (tTR-KRAB)

5. Extract tumor DNA

8. Next-generation sequencing + bioinformatics

6. Determine absolute copy number of LV and SB 7. Retreive LV and SB genomic flanks

b

T5RM-SB

T5RM-SB/Tet

T5RM-SB/Tet

T5RM-SB/Tet

+ Dox

c

d

e

f

g

h

i

j

T5RM-SB-LH

– Dox

10 flanks

10 flanks

(51 flanks)

31 flanks

(32 tumors)

17 tumors

4 weeks

in patients. Histologic examination revealed 9 tumors 6 tumors mesenchymal neoplasms with a narrow spectrum of overlapping morphologies. Most tumors were characterized by a proliferation of spindle cells of intermediate cellularity with a prominent myxoid stroma (Fig. 2c,d), with few distinguishing features across the tumor panel. Gross histology and nuclear and ultrastructural features, along with specific markers, were indicative of (myo)-fibroblastic differentiation and excluded epithelial, neuroectodermal or hematolymphoid differentiation (Fig. 2d–j and Supplementary Fig. 3b,c). Together the morphology and immunohistochemical features of these tumors corresponded to the myxofibrosarcoma or undifferentiated pleomorphic sarcoma spectrum according to the most recent World Health Organization classification (a representative patient sample is shown in Fig. 2j)19. We verified the presence of Lentihop vectors and transposition events in tumors by PCR (Supplementary Fig. 4). We observed tumor development from +Dox and −Dox T5RM-SB/Tet-LH cells (nine out of ten compared to six out of ten injections, respectively), suggesting that Lentihop is mutagenic in both the mobilized and unmobilized states. Thus, compound LV and SB insertional mutagenesis in sensitized primary human cells is capable of generating tumors with features of a clinically relevant cancer subtype. Tumors derived from Lentihop-mutagenized cells are polyclonal We analyzed the copy numbers of endogenous loci and genomically resident Lentihop elements in tumors from the screen using droplet 966

4. Tumorigenic growth in vivo

T5RM-SB sensitized cells

digital PCR (ddPCR). To calibrate this analysis, we measured the copy numbers of six endogenous loci selected for lack of copynumber variation in the 1000 Genomes Project data and high conservation across species (locus, chromosomal location: MRGPRX1, 11p15; CCL3, 17q12; Ch10p3, 10p3; Fan_Ch1, 1p34; ERBB2, 17q12; and RPP30, 10q3); we carried out these experiments in a reference human female diploid DNA pool from six individuals (Fig. 3a) and a single female (lymphoblast cell reference DNA; Supplementary Fig. 5a). We obtained precise copy-number resolution (near integer values) for known diploid genes (Ch10p3, Fan_Chr1 and ERBB2) and higher values for genes with elevated copy numbers in some human populations (MRGPRX1 and CCL3; Fig. 3a and Supplementary Fig. 5a). For the subsequent analyses, we selected and validated RPP30 as a copy number–stable control for normalization (Supplementary Fig. 5b,c). Normal copy-number states were evident in sensitized cells from the Lentihop screen before and immediately after mutagenesis, although we noted a potential lossof-heterozygosity event at 16q24.1 (D16S539) during culture (Supplementary Table 1). In contrast, ddPCR analysis of the six endogenous loci in the tumors revealed the emergence of nondiploid copy-number values (Fig. 3b), indicating chromosomal instability and genomic copy-number alterations. Further, the VOLUME 46 | NUMBER 9 | SEPTEMBER 2014  Nature Genetics

Articles b 3.5

c

3.0

LoF + GoF cassettes (copy number)

7 6 5 4 3 2 1 0

Copy number

Copy number

a

2.5 2.0 1.5 1.0 0.5

MRGPRX1

CCl3

RPP30

RPP30

Ch10p3 Fan_ch1 ERBB2 ERBB2 ERBB2 RPP30

RPP30

RPP30

0

ch10p3 Fan_ch1

MRGPRX1 CCL3

Targets over reference

npg

© 2014 Nature America, Inc. All rights reserved.

8

Pearson R = 0.998

6 4 2

Tn+ Tn– 0 1

Pearson R = 0.943

Tn+ Tn– 0

1

2 3 4 5 6 7 8 LV provirus (copy number)

9

2

Max outlier

3 4 5 6 7 8 SB transposon (copy number)

f

e 10 9 8 7 6 5 4 3 2 1 0

Copy number

SB transposon (copy number)

d

10

0

Ch10p3 ERBB2 Fan_ch1

Min outlier

12

20 18 16 14 12 10 8 6 4 2 0

SB Rxn

10

LV

SB Min outlier

1

2

9

LV 3

1

2

3

Antisense insertion (intergenic) chr4:185,230,918-185,231,003

LoF GoF Max outlier

Antisense insertion (CHFR) chr12:133,425,443-133,425,527

g

LV

1

2

3

4

5

6

7

8 9 Chromosome

10

10

11

12

13

14

15 16 17 18 19 20 21 22

SB

X

Figure 3  Genome-wide insertional mutagenesis generates polyclonal, genomically unstable human sarcomas. (a) Copy numbers of endogenous reference loci in pooled normal human reference DNA measured by ddPCR and normalized against either RPP30, ch10p3 or Fan_Ch1. (b) Box plots of copy number for the selected endogenous loci in Lentihop-induced myxofibrosarcomas (n = 16, including Tn+ (T5RM-SB-LH cells) and Tn− (T5RM-SB cells) controls) showing deviation to nondiploid states, indicating genomic instability (line, median; box, interquartile range; bars extend to 1.5× the interquartile range; stars show the minimum (min) and maximum (max) outliers). (c) ddPCR measurement of Lentihop vector elements in tumors. A very strong correlation (Pearson R = 0.998) between total gene disruption element copy number (GoF + LoF) and SB transposon copy number (probed at the IR/DR) supports the accuracy of the assays. (d) Correlation between genomically integrated SB and LV provirus copy numbers in the tumors (Pearson R = 0.943). Triplicate reactions for each target assay (lentivirus (LV), transposon (SB), GoF, LoF, MRGPRX1, CCL3 and ERBB2) were run in duplex reactions with reference assays (Ch10p3, Fan_Ch1 and RPP30). (e) Box plots of absolute copy numbers of the various Lentihop vector elements in Lentihop sarcomas. The box elements are as described in b. (f) Simultaneous retrieval of SB and LV genomic insertion sites from a single tumor sample using LAM-PCR (numbers indicate the nested PCR stage (Rxn)). The fragments indicated were isolated and Sanger sequenced followed by mapping of the retrieved sequences to the human genome. (g) Genome-wide SB and LV insertion density maps from induced human sarcomas generated by Illumina sequencing. Blue bars represent the LV insertion density, and red bars represent the SB density (the bar height represents positive counts in either direction).

observation non-integer values suggested the presence of tumor subclones with variable copy-number states. We next developed ddPCR assays to measure the copy number of each genomically integrated component of the Lentihop screen, including LV proviruses, SB transposons and gain-of-function (GoF) and loss-of-function (LoF) cassettes. We observed a near-perfect correlation (R = 0.998; Fig. 3c) between total SB transposon and aggregate gene disruption cassette (GoF + LoF) copy-number values. We also saw a strong correlation (R = 0.943; Fig. 3d) between total LV provirus and SB transposon copy numbers in the tumors. These results suggest that deviations from the average viral copy number between the pre-injection cells and the tumors were contributed mainly by Lentihop during in vivo selection rather than by the expansion of rare pre-existing clones with higher background virus copy number that were already present in the sensitized cell population. The sensitized cells contained an average 6.10 LV copies per cell, which increased by 1.62 copies after Lentihop transduction. After injection, the Nature Genetics  VOLUME 46 | NUMBER 9 | SEPTEMBER 2014

tumor cells had a range of 4.65–17.30 LV copies per cell (8.37 ± 1.02 (mean ± s.e.m.), including controls; Fig. 3e). We observed higher copy numbers for the GoF version of Lentihop (2.51 ± 0.69 (mean ± s.e.m.)) than the LoF version (0.41 ± 0.11), contributing to an average total SB copy number of 2.77. Non-integer copy-number values for the Lentihop components in most samples demonstrated that the tumors were polyclonal (Supplementary Table 2). Genomic LV and SB insertion sites in Lentihop-induced tumors We next mapped the genomic locations of LV and SB insertion sites. To extract a large number of insertions from the Lentihopinduced sarcomas, we employed a pooled strategy using linear amplification–mediated PCR (LAM-PCR)20,21. Sanger sequencing of nested PCR fragments yielded both LV and SB insertions from individual tumors (Fig. 3f), demonstrating the feasibility of mapping both viral and transposon insertions sites from a single sample. Illumina next-generation sequencing of the product pool identified 967

Articles a

b

c

Enrichment in FAIRE-Seq regions (–log10 P)

log2 median-centered gene expression intensity

Figure 4  Insertional preferences of Lentihop LV SB 3.0 LV and SB elements. (a) Expression levels of 15 2.5 Genes genes (y axis, log2 median-centered gene 2.0 Intergenic 10 Significance Coding/UTR exons expression intensity) are correlated with the 1.5 Introns 1.0 5 number of FAIRE regions found within 20 kb 0 kb 0.5 1 kb upstream and downstream of the transcriptional 0 CpG Islands 0 2.5 kb (+/– flank) start site (TSS) of each gene (x axis). (b) LV LV SB 5 kb 0 1–3 4–6 >7 10 kb and SB insertion sites analyzed against FAIRE sites within DNA 20 kb of each TSS FAIRE-positive regions (hyperbrowser Monte LINE Repetitive LTR Global Carlo test, P < 0.05). (c) LV insertions were elements SINE NS NS significantly enriched in genes, introns, short Enrichment 5.0 × 10–4 0 kb P NS NS interspersed nuclear elements (SINEs) and 2.5 × 10–3 1 kb Depletion TSS 2.5 kb (+/– flank) promoter regions but were under-represented 5 kb in repeat elements and intergenic regions Dels Amps 10 kb 16 2 kb (hyperbrowser Monte Carlo test, P < 0.05). 5 kb Upstream In contrast, SB insertions showed a weaker 10 kb 12 from genes 20 kb preference for genes and significant enrichment 50 kb in introns and long interspersed nuclear 8 2 kb 5 kb elements (LINEs) but were under-represented Downstream 10 kb 4 from genes in promoter regions, exons and intergenic regions 20 kb 50 kb (P < 0.05). (d,e) Monte Carlo–based global 0 1 × 10–3 0.05 P (d) and local (e) enrichment analyses of tumor Sarcoma Blood, brain, Cross-cancer melanoma, Enrichment insertions in patient sarcoma SCNAs, the cancer Carcinomas Depletion types indicated and cross-cancer SCNA regions. The global analysis indicates the overall enrichment of insertions across all regions tested, whereas the local analysis identifies the number of individual enriched regions. NS, not significant; Dels, deletions; Amps, amplicons.

npg

© 2014 Nature America, Inc. All rights reserved.

e

Local enriched SCNA regions

d

2,443 non-redundant insertions across the panel of samples (2,086 LV and 357 SB; Supplementary Table 3). These insertions spanned every chromosome (Fig. 3g) and mapped to a set of 1,461 genes (UCSC known genes; 1,201 LV, 290 SB and 30 both LV and SB). We retrieved a number of SB insertions from T5RM-SB/Tet-LH(–Dox) tumors, suggesting that transposase expression was present in some tumor cells, possibly because of the persistence of a small number of tTR-KRAB (Tet-on tetracycline repressor fused to the KRAB transcriptional repression domain of human Kox1)/dsRED2–negative cells after FACS purification, or random silencing of this vector. We validated barcode retrieval and insertion sites for randomly selected, individually isolated insertion bands from across the tumor panel (Supplementary Table 4) and re-identified genes that were found by Illumina sequencing. At the transcriptional level, PCR amplification of insertion sites from cDNA targeting chimeric transcripts containing SB or LV sequences was unsuccessful, possibly because of the polyclonality of the tumors. To investigate the insertional preferences of Lentihop LV and SB elements, we performed formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-Seq) to characterize the open chromatin regions throughout the genome of the pre-injection cells and integrated this with gene expression profiles from the pre-injection cells and tumors (Supplementary Fig. 6). At a global level, we could not detect correlations between gene expression levels relative to the positions of individual insertion sites (i.e., comparing probe levels 5′ and 3′ of SB or LV sites) by microarray. With respect to chromatin regions, integration of the data showed that gene expression increased with the number of FAIRE regions that were within 20 kb of a transcriptional start site (Fig. 4a), which is consistent with the known relationship between open chromatin and transcription. We observed that LV insertion sites were enriched in open chromatin sites in the tumors (P < 0.05; Fig. 4b) using Monte Carlo–based analyses22, and we did not observe this enrichment for SB insertions; similarly, LV sites were located significantly closer to FAIRE regions than would be expected by chance (LV, P < 1.0 × 10−3; SB, P = 0.059). We also interrogated LV and SB insertion sites in the tumors against an extensive collection of genomic elements comprising gene features, promoter regions and 968

repetitive sequences (Supplementary Table 5; their preferences are illustrated in Fig. 4c). Examining the proximity relationship between SB and LV insertion sites indicated that they are closer than expected by chance (P < 1.0 × 10−3), suggesting that local hopping may occur when SB is mobilized from genomically integrated LV proviruses by SB100X transposase. Together these data demonstrate that the distinct insertional patterns of lentiviruses and transposons are maintained during hybrid mutagenesis, thus offering the advantages of both low bias and gene-focused coverage in a single genetic screen. Hybrid mutagenesis pinpoints genes relevant to human cancers Cancer genome projects are comprehensively cataloging genomic alterations in human tumors, but functional filters are required to delineate causative changes. To investigate the relevance of the tumor insertion data to human cancers in an unbiased manner, we compared the Lentihop insertional profiles to a recently published compendium of genomic alterations found in tumors from patients (Fig. 4d,e)23,24. We observed that Lentihop insertions were globally enriched across recurrent somatic copy-number aberration (SCNA) peaks in sarcomas (P = 5.0 × 10−4; Fig. 4d) but not selectively for myxofibrosarcomas (data not shown). In contrast, insertions were globally under-represented (P = 2.5 × 10−3) in SCNA regions that are specific to other solid and blood cancers (i.e., those distinct from sarcoma SCNAs or those shared across cancer types). Seventeen deletions and four amplicons featured an over-representation of LV and SB insertions in the sarcoma SCNA set (10% false discovery rate; Fig. 4e and Supplementary Table 6), with only one deletion and three amplifications present in the non-sarcoma set. Given that the majority of SCNAs present in individual cancer types are shared among several cancer types23, we also performed cross-cancer analyses. This method yielded four deletions and four amplifications that were over-represented by LV and SB insertion sites, although we did not observe global enrichment in this set (P = 0.49). These data imply that tumorigenic Lentihop insertions in mesenchymal cells may target similar genes to those found in recurrent copy-number alterations in diverse human cancers, with a general preference for sarcomaassociated regions. VOLUME 46 | NUMBER 9 | SEPTEMBER 2014  Nature Genetics

Articles

npg

© 2014 Nature America, Inc. All rights reserved.

Genes that accumulate multiple insertions in a non-random fashion are suggestive of alterations that are under positive selection during tumorigenesis and can be exploited to identify cancer drivers2. We used a gene-centric analysis to search for common insertion genes in the Lentihop-induced sarcomas (Online Methods). This method yielded 80 genes (72 LV, 9 SB and 1 both LV and SB; 10% false discovery rate) distributed across 20 chromosomes (Fig. 5a and Supplementary Table 7). We repeated this analysis using Lentihop insertions from the sensitized pre-injection cells and found that 29 genes were enriched, including 24 LV, 5 SB and 0 in both LV and SB (Supplementary Table 8); 12 of these genes overlapped with the 80 candidates identified across all samples (Supplementary Tables 7 and 8) and may represent genes with insertional biases or those under selection during cancer sensitization, mutagenesis and expansion in vitro. Functionally, the 80 common insertion genes were enriched for factors that are involved in alternative splicing and ubiquitin conjugation (Supplementary Table 9). We found that three of the top five hits were known tumor suppressors (CHFR, DOCK4 and CADM1; Supplementary Table 7) and displayed sense and antisense LV insertions (Fig. 5b). CHFR is silenced by methylation in multiple cancers25,26, and the Rap GTPase activator DOCK4 is mutated or deleted in prostate and ovarian cancer 27. Notably, CADM1 (also called TSLC1), a tumor-suppressor gene that is frequently inactivated through promoter hypermethylation28,29, contained both LV and SB insertions in sense and antisense orientations (Fig. 5b). We found that NUMB, a gene that has been identified as a candidate tumor suppressor through an in vivo lymphoma RNA interference screen in mice30, also incurred LV insertions in multiple orientations (Fig. 5b), as did STAG2, which was recently shown to drive aneuploidy after its deletion or mutational inactivation in human tumors31. Integrating the 80 common insertion genes with recurrent SCNAs identified above (Fig. 4e) pointed to putative drivers of amplicons and deletions (Supplementary Table 7), for example, GAB2 (Fig. 5b and Supplementary Fig. 7). This gene was targeted by SB insertions that aligned directionally with its promoter between exons 1 and 2 (Fig. 5b). We examined whether the human cell mutagenesis data could act as a filter to find genes that frequently acquire point mutations in cancers. We compared the Lentihop common insertion genes to data from COSMIC32 (Catalogue of Somatic Mutations in Cancer), which curates extensive information on somatic mutations from literature, as well as data from The Cancer Genome Project and The Cancer Genome Atlas Project33–37. This analysis revealed significant enrichment for genes that are frequently mutated in patients in multiple cancers (25/80 genes, >60 coding mutations, P = 3.0 × 10−4, hypergeometric test; Fig. 5c and Supplementary Tables 7 and 10). We examined the relationship between the common insertion genes and validated cancer genes, as curated in the Cancer Gene Census, but did not observe significant overall enrichment (4/80 genes, P = 0.11; Fig. 5c), Nature Genetics  VOLUME 46 | NUMBER 9 | SEPTEMBER 2014

a

40 Insertion density

30 20

SLC2A1 UBR4

VPS45

EIF2B3

KCNT2

10 0

0 10 20 30 40 50 60 70 80 90 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 0 22 0 23 0 24 0

Figure 5  Established cancer driver genes identified in the screen. (a) Genes enriched for LV or SB insertions superimposed on the overall insertion density map of chromosome 1 divided into 1-Mb bins. Genes were identified using a gene-centric common insertion site analysis instituted in hyperbrowser after the application of a multipletesting correction (false discovery rate). (b) Established tumor suppressors or candidate driver genes significantly enriched for LV and/or SB insertion sites (colored arrowheads, insertion sites; boxes, exons; black arrows indicate the promoter and gene orientation). (c) Venn diagrams of common insertion genes identified in Lentihop tumors analyzed against genes carrying >60 mutations in COSMIC (hypergeometric test P = 0.0003) or the Cancer Gene Census (P = 0.11).

Chromosome 1 position (Mb)

b

SB

LV

CHFR

DOCK4

NUMB

STAG2

CADM1

GAB2

c

Common insertion genes

55

COSMIC genes (>60 mut)

Common insertion genes

3,980

76

25

P = 0.0003

Cancer gene census

4

455

P = 0.11

nor did we identify a number of genes that were recently found to be mutated in myxofibrosarcomas24. One possible reason for this result is the scale of the screen, which likely functionally evaluated only 5–10% of the genome or less based on the number of unique genes with insertions that were retrieved. Among the COSMIC cross-cancer frequently mutated set, we pinpointed UBR4, SPTAN1 and HDLBP. UBR4 (also known as ZUBR1) displays 309 mutations, >73% of which are either missense or nonsense substitutions (Fig. 6a), across its exons in multiple cancer types and encodes the retinoblastomaassociated p600 protein interactor of the papillomavirus type 16 E7 oncoprotein (Fig. 6a)38. SPTAN1, encoding α-II spectrin, displays 153 mutations in diverse tumors, >83% of which are nonsynonymous substitutions or frameshifts (Fig. 6a), and has been reported as a predictor of poor survival in breast cancer39. The mutational distribution for STPAN1 was mirrored in the Cancer Cell Line Encyclopedia data, which revealed 94 mutations in cancer cell lines, 61 of which were missense and 10 of which were truncating40. HDLBP encodes vigilin, an RNA-binding protein that has been implicated in the induction of heterochromatin and control of c-FMS expression41,42. The screen also captured ADARB2 (Supplementary Table 7), which belongs to a class of double-stranded RNA (dsRNA)specific adenosine deaminases that is linked to retention of 969

Articles LC

Zf_UBR

b

Mutation type

5183

UBR4

Max 6

Amp Del

10

Sub nonsense

Ins/Del

SPEC

efhand_Ca_insen EFh

SH3

SPTAN1

Sub missense Sub synonymous

2452

Ins inframe

Max 5

Ins frameshift

Substitution

Del inframe Ins/Del

Del frameshift KH

mRNA expression (RMA, log2)

Substitution

Pearson R = 0.431

9

8

7

6

Complex

1268

HDLBP

Other

Max 3

5

Substitution

–2.5

–2.0

–1.5

–1.0

–0.5

0

0.5

1.0

1.5

DNA copy number (log2 ratio)

100

d

T5RM shSCRAM

50

P = 0.025

0 0

20

shSCRAM

shSPTAN1 shUBR4 40

60

shHDLBP

Time after injection (d)

80

U2OS

shHDLBP

P < 0.001

* 60 40 20 0 shSCRAM

shHDLBP

Figure 6  HDLBP is a new candidate tumor suppressor that is somatically altered in diverse human cancers. (a) UBR4, SPTAN1 and HDLBP sustain diverse mutations in human cancers. Somatic mutations from COSMIC are mapped to amino acid sequences (the SMART db–predicted protein secondary structures are indicated as LC, low complexity region; Zf_UBR, UBR-type zinc finger; SPEC, spectrin repeats; SH3, Src-homology 3; EFh, EF-hand; efhand_Ca_insen, EF-hand without calcium binding ability; KH, K homology). On the mutation diagrams, vertical lines represent the locations of substitutions (Sub); red and blue arrows indicate the locations of insertions and deletions (Ins/Del), respectively. The circles to the right depict the spectrum of various mutation types within the mutation set for each gene. (b) Paired copy-number and expression data for Cancer Cell Line Encyclopedia lines. Pearson’s correlation was calculated across all samples (R = 0.431). Blue circles, deletion samples (copy number 0.25); black circles, normal copy number. Box-and-whisker plots show the copy-number or mRNA expression distributions relative to the cell line panel (line, median; box, interquartile range; bars extend to 1.5× the interquartile range). (c) Survival curves for mice injected subcutaneously with 5 × 10 5 sensitized cells (T5RM) transduced with shRNA knockdown vectors containing either control (shSCRAM), shHDLBP (P = 0.006), shSPTAN1 (P = 0.139) or shUBR4 (P = 0.009). P values were obtained using log-rank test (overall, P = 0.025). Corresponding tumors are shown in the adjacent images. For each vector, ten replicate flanks were evaluated (n = 5 mice, two flanks injected each). (d) Images (left) and quantification (right) of colony-formation assays after stable transduction of U2OS osteosarcoma cells with shSCRAM or shHDLBP retroviral vectors performed in triplicate. Student’s t test (P < 0.001). Error bars, s.d. Scale bars (d), 1 cm.

npg

© 2014 Nature America, Inc. All rights reserved.

c

Tumor-free mice (%)

Ins/Del

Number of colonies

a

inosine-containing RNAs in the nucleus43. These two candidates were connected by a predicted functional relationship. Altered adenosine-to-inosine RNA editing profiles in human cancer have been observed, where the loss of ADARB2 expression correlates with the grade of malignancy in glioblastoma multiforme44. Vigilin binds inosine-containing RNAs with high affinity and has been found in complex with ADAR proteins, suggesting their collaboration for heterochromatin formation45. For HDLBP, we identified 98 mutations in patients with cancer (>80% of which were either missense, nonsense or frameshifts; Fig. 6a) and 57 in cell lines (33 missense and 9 truncating). We also found that HDLBP was commonly lost (13.8% overall) as part of a highly significant (q = 7.77 × 10−41) cross-cancer deletion peak containing 19 genes on 2q37.3, which overlaps with deletion peaks in cross-epithelial tumor panels, breast cancer and many others (Supplementary Fig. 8a and Supplementary Table 11). An examination of paired expression and copy-number data from close to 1,000 cell lines in the Cancer Cell Line Encyclopedia revealed a subset bearing HDLBP loss, and expression of this gene was strongly correlated (R = 0.52 for copy number 90% confluence was reached at the time of transfection. Lipofectamine 2000 (Invitrogen) was used to transfect the transfer and packaging plasmids (psPAX2 (Addgene #12260), pMD2.G (Addgene #12259) or the Lenti-X Packaging System (Clontech)). The medium was changed 14 h after transfection, and lentiviral supernatants were harvested once, at 36–40 h after transfection. Supernatants were centrifuged and then filtered through a 0.45-µm low-protein binding syringe filter unit (PALL Corporation) to remove debris and any live cells. Supernatants were further concentrated 100-fold using Lenti-X Concentrator (Clontech), resuspended in fresh medium and immediately used or frozen at −70 °C. Supernatants and concentrated supernatants were titered when required using HT1080 cells (ATCC #CLL-121) or with Lenti-X GoStix (Clontech). Retroviral supernatants were produced similarly by transient transfection of transfer and envelope plasmids (pMD2.G) into GP2-293 cells (Clontech), which stably express the gamma-retroviral Gag and Pol proteins. Additional methods. All other methods can be found in the Supplementary Note, along with all oligonucleotide sequences used in this study (Supplementary Tables 12–16).

npg

© 2014 Nature America, Inc. All rights reserved.

ONLINE METHODS

doi:10.1038/ng.3065

Nature Genetics