Expression profiling and identification of novel genes involved ... - BCL

21 downloads 4193 Views 1MB Size Report
Published online December 19, 2003. Expression ... 300 Longwood Ave., Boston, MA 02115; E-mail: [email protected].harvard.edu ... differential expression across the time course were hierarchically clustered using a self- ..... replication, including genes coding for primases (Prim1, Prim2), polymerases (Pola1, Pola2,.
The FASEB Journal express article 10.1096/fj.03-0568fje. Published online December 19, 2003.

Expression profiling and identification of novel genes involved in myogenic differentiation Kinga K. Tomczak,*,‡ Voichita D. Marinescu,†, ‡ Marco F. Ramoni,†, ‡ Despina Sanoudou,*, ‡ Federica Montanaro,*, ‡ Mei Han,*, ‡ Louis M. Kunkel,*, ‡ Isaac S. Kohane†, ‡, Alan H. Beggs*, ‡ *

Genomics Program and Division of Genetics; Children's Hospital Boston; †Children's Hospital Informatics Program; ‡Harvard Medical School, Boston, MA Corresponding author: Alan H. Beggs, Ph.D., Genetics Division, Children’s Hospital Boston, 300 Longwood Ave., Boston, MA 02115; E-mail: [email protected] ABSTRACT Skeletal muscle differentiation is a complex, highly coordinated process that relies on precise temporal gene expression patterns. To better understand this cascade of transcriptional events, we used expression profiling to analyze gene expression in a 12-day time course of differentiating C2C12 myoblasts. Cluster analysis specific for time-ordered microarray experiments classified 2895 genes and ESTs with variable expression levels between proliferating and differentiating cells into 22 clusters with distinct expression patterns during myogenesis. Expression patterns for several known and novel genes were independently confirmed by real-time quantitative RT-PCR and/or Western blotting and immunofluorescence. MyoD and MEF family members exhibited unique expression kinetics that were highly coordinated with cell-cycle withdrawal regulators. Among genes with peak expression levels during cell cycle withdrawal were Vcam1, Itgb3, Itga5, Vcl, as well as Ptger4, a gene not previously associated with the process of myogenesis. One interesting uncharacterized transcript that is highly induced during myogenesis encodes several immunoglobulin repeats with sequence similarity to titin, a large sarcomeric protein. These data sets identify many additional uncharacterized transcripts that may play important functions in muscle cell proliferation and differentiation and provide a baseline for comparison with C2C12 cells expressing various mutant genes involved in myopathic disorders. Key words: C2C12 myoblasts • myogenesis • microarrays • cluster analysis • Expressed Sequence Tags

M

yogenesis is a highly ordered process that can be subdivided into a sequence of temporally separable events: proliferation of mononucleate myoblasts, irreversible cell-cycle withdrawal, cell fusion to form multinucleate myotubes and subsequent maturation of myotubes into various classes of myofibers. Successful myogenic differentiation requires numerous interactions between diverse cellular processes involving growth factors, hormones, receptors, transcription factors, kinases, and histone deacetylases, to name a few (1– 7). These events are regulated by a complex series of transcriptional control points in myogenic differentiation involving two major families of transcription factors: the MyoD family of muscle regulatory factors (MRFs) (8, 9) and the MEF2 family of MADS-box transcription factors (10). Furthermore, the decision on whether myoblasts should continue to proliferate or switch into the

differentiation pathway is regulated by the balance of positive and negative cell-cycle regulators (11). In order for MRFs to play their roles, the negative regulators of muscle differentiation must be inactivated. Withdrawal from the cell cycle requires expression of cyclin-dependent kinase inhibitors (12, 13), leading to fusion and maturation of multinucleate myotubes. This process is associated with expression of late differentiation markers such as myosin heavy and light chains, muscle creatine kinase, and the acetylcholine receptor, which are among the downstream targets of MRFs and MEF2 family factors. In vivo, these myotubes then mature into various classes of myofibers, a process that requires further adjustment and specialization in gene expression. Although myogenesis has been widely described, many steps and interactions between various genes in this process are not yet fully understood. Moreover, there are certainly many genes involved in muscle cell proliferation/differentiation that are not yet known. The rapid development of expression microarray technology has provided new possibilities to view myogenesis in a genome-wide perspective. In the present study, high-density oligonucleotide microarrays were used to investigate transcriptional changes occurring in C2C12 cells during myoblast proliferation and myotube formation. The advantage of using C2C12 myoblasts is that these cells provide a well-established and reproducible model of myogenesis (14, 15). Furthermore, many muscle-specific genes and proteins have been studied in these cells, allowing validation of observed mRNA expressions for some genes based on previously published data. Recent reports have described expression profiling in C2C12 cells (16–18). Moran et al. presented analysis of ~11,000 murine transcripts in duplicate analyses of four time points of C2C12 cell differentiation (16). The expression patterns of 629 genes that exhibited significant differential expression across the time course were hierarchically clustered using a selforganizing map-based algorithm that identified 16 distinct patterns of expression. Although none of these results were confirmed by traditional biochemical methods, the predictable expression patterns of many known muscle-specific genes demonstrated the validity of this microarray approach to studying myogenesis. Shen et al. conducted a more focused analysis of events during C2C12 myoblast cell-cycle withdrawal using cDNA microarrays containing ~10,000 nonredundant murine IMAGE clones (17). This report identified 17 differentially expressed genes that were not previously known to be involved in muscle differentiation or cell-cycle withdrawal. Finally, Delgado et al. have studied transcriptional events occurring within the first 24 h of serum withdrawal and initiation of differentiation. They have identified 12 temporally related clusters of Affymetrix MG_U74Av2 probe sets from ~1500 whose expression was altered over the one day time course (18). In the present study, we report the analysis of 24,290 probe sets for murine genes on Affymetrix MG_U74Av2 and Cv2 microarrays over eight time points spanning 12 days of C2C12 cell myogenic differentiation assayed in triplicate. To leverage the time-dependent information in this muscle development experiment, we used CAGED 1.1, a new computational tool specifically designed for hierarchical cluster analysis of temporal microarray experiments using a Bayesian approach (19). Two thousand eight hundred and ninety five probe sets that met requirements for reproducible signal detection and fold change over the time course were clustered into 22 distinct expression patterns that composed four main groups. Group IV consisted of clusters of transcripts whose average expression was only minimally changed (–2.5 to + 2.0 fold) over the time course. In contrast, groups I through III contained clusters of transcripts whose average expression increased by greater than 2.5 fold (up to 545 fold), decreased by more than 3.5 fold, or whose kinetics exhibited a spike of expression at day 0. The transcripts in these later three

groups included 1000 known or annotated genes while 395 probe sets represent unannotated or unnamed genes, cDNA clones, or Expressed Sequence Tags (ESTs), providing a rich catalog of novel transcripts whose expression is modulated during cell-cycle withdrawal and differentiation of C2C12 myoblasts. Real-time quantitative RT-PCR, Western blotting and indirect immunofluorescence studies confirmed the GeneChip data for several representative genes and ESTs, and identified one instance where transcription and protein levels were inversely correlated. A Microsoft Access database containing data on all 2895 probe sets is available for download (http://www.chb-genomics.org/beggslab/) allowing detailed browsing and viewing of the C2C12 myogenic transcriptome. MATERIALS AND METHODS Cell culture C2C12 mouse myoblasts (American Type Culture Collection, Manassas, VA) at passage 6 were maintained in growth medium (DMEM supplemented with 20% Fetal Bovine Serum, Certified; GIBCO, Invitrogen Corporation, Carlsbad, CA) in 8% CO2. Cells were plated at a density of 104/cm2 into 100-mm-diameter dishes coated with 0.15% gelatin. Twenty four hours later (day – 2), cells showed 40−50% confluency. After 72 h (day 0), cells (90−100% confluent) were switched into differentiating medium (DMEM containing 2% Horse Serum; GIBCO, Invitrogen Corporation, Carlsbad, CA) that was subsequently changed every 24 h. At each time point, the cells were harvested and processed independently from three replicate plates. Fusion indices (where greater than zero) were calculated as the mean of the percentages of nuclei in multinucleate myotubes from random fields of three plates. Between 414 and 977 nuclei were counted for each data point. RNA extraction Total RNA was extracted and purified from C2C12 cells on days –2, –1, 0, 2, 4, 6, 8, and 10 using an RNeasy kit (Qiagen, Valencia, CA) according to the manufacturer’s instructions. RNA concentration and purity were assessed using a Beckman Coulter spectrophotometer DV 600. All RNA samples used in these experiments had 260/280 ratios between 2.13 and 2.17. The 3′/5′ signal ratios for Affymetrix control probe sets ranged from 1.01 to 1.3 for β-actin and from 0.78 to 1.03 for GAPDH. Microarray experiments Affymetrix mouse MG_U74Av2 and MG_U74Cv2 oligonucleotide-based arrays (GeneChips) were used as described previously (20). Seven time points (days –2, –1, 0, 2, 4, 6, 10) were hybridized in triplicates on both chips, while day 8 was only hybridized on MG_U74Av2. Microarray data analyses The raw data were normalized and analyzed using GeneChip Affymetrix Microarray Suite 5.0 software. The signal values and the “Present,” “Absent,” or “Marginal” calls were computed for all 24,422 probe sets. After all genes with “Absent” calls across the entire data set were filtered out, 9454 probe sets remained. The mean values of three replicate chips from each of the seven time points were used as an input for CAGED 1.1 (http://genomethods.org/caged), a program specifically designed for cluster analysis of temporal microarray experiments (19). Day 8 data

were not included in the CAGED analysis as only MG_U74Av2 chips were hybridized to targets from this time point. The analytical task of CAGED, a program implementing a Bayesian method for model-based clustering of gene expression dynamics, was to partition the genes into clusters sharing a similar behavior throughout the seven different time points. The 9454 probe sets with at least one “present” call were further filtered by removing all probe sets where none of the averaged signals for any time points had a ratio greater than 2.0 or less than 0.5 relative to the signal for the day –2 time point. This approach yielded 2895 probe sets with expression signal values ranging between 0.003 and 747.714, with an average expression of 4.543. The ratios of each time point to day –2 were natural logarithm transformed. A relational Microsoft Access database was created containing the expression data from our experiment, CAGED results, and Affymetrix annotation tables, including fields such as gene name, gene symbol, GenBank, Unigene, Locuslink, chromosome localization, GeneOntology, identifiers from different databases (Swissprot, InterPro), as well as information regarding Orthologs/Homologs and association with particular GenMAPP pathways (http://www.affymetrix.com/analysis/download_center.affx). This extended relational database was used for further analysis (available for download at http://www.chbgenomics.org/beggslab/). For the probe sets in each cluster, the number of occurrences of a Gene Ontology (GO) term in a given GO category (biological process, cellular component, and molecular function) was counted by a program that used the GO annotations provided by the NetAffx Analysis Center (21) as input. Since a gene can be involved in several biological processes or have multiple molecular functions, the total number of occurrences of GO terms in these GO categories per cluster may be greater than the total number of genes in a cluster. Also, reflecting the annotation in the Affymetrix Annotation Table, the program does not take into account “parent−child” relationships between GO terms and, therefore, counts the instances of each GO term independently of the ones of its “children” or “parent(s).” Thus, in the case of the biological process category, to overcome this limitation, we manually chose only the most frequently appearing biological categories for each cluster and filtered them in the Access database. The Bayesian clustering provided a statistically robust basis to view the natural dynamic structure of the transcriptome. However, we were also interested in identifying the subset of genes that most closely matched the expression profile of a specific small set of genes implicated in muscle differentiation. For 53 genes, selected based on their known role in muscle differentiation or distinct expression pattern, and 15 uncharacterized transcripts from groups IIII, up to 50 nearest neighbors with correlation coefficients above 0.9 were calculated using the correlation-coefficient as similarity measure. The detailed list of nearest neighbors and all other supplemental tables are available as supplementary data at http://www.chbgenomics.org/beggslab/. Protein extraction and immunoblotting Cells were washed twice with ice cold PBS and scraped into 300 µl of sample buffer. Lysates were denatured for 5 min at 95°C, sonicated with Branson Sonifier 250 to shear DNA and centrifuged for 10 min at 16,000 rcf. Soluble proteins were quantified by Lowry’s method using the Protein Assay Kit (Sigma). Twenty-five micrograms of whole cell extracts were resolved on 10−20% SDS-polyacrylamide gels and transferred onto polyvinylidene difluoride (PVDF) membranes (Immobilon-P membrane, Millipore, Bedford, MA) using a Trans Blot Cell (Bio-

Rad, Hercules, CA). Membranes were probed with the following primary antibodies: rabbit anticaveolin-3, (Affinity Bioreagents; 1:1000 dilution), rabbit anti-aquaporin-1, (Chemicon International, Temecula, CA; 1:100), mouse anti-β-dystroglycan, monoclonal antibody (1:50), rabbit affinity-purified polyclonal anti-α-actinin-3 (1:100), rabbit anti-pan-actin (Sigma-Aldrich; 1:100). After incubation with HRP-conjugated AffiniPure Donkey Anti-Rabbit IgG (1:5000), AffiniPure Donkey anti-mouse IgG (1:10000) as appropriate, reactive proteins were visualized with Supersignal West Pico Chemiluminescent substrate (Pierce, Rockford, IL). Immunofluorescence C2C12 cells were fixed in methanol at –20°C for 10 min, washed twice with PBS (5 min each wash) then blocked for 30 min with a blocking solution (PBS with 10% fetal bovine serum; GIBCO, Invitrogen Corporation, Carlsbad, CA). Fixed cells were incubated with primary antibodies overnight at 4°C in humidified chamber. The following primary antibodies were used: rat anti-caveolin-3 affinity-purified IgG, (Affinity Bioreagents; 1:500), rabbit affinity-purified polyclonal anti-α-actinin-3 (1:100), mouse monoclonal anti-myosin heavy chain (fast), (Novocastra; 1:30), rabbit anti-aquaporin-1 (Chemicon International, Temecula, CA; 1:100), mouse anti-cranin (α-dystroglycan) monoclonal antibody (Novocastra; 1:50). After three washes in PBS, cells were incubated for 30 min with the secondary antibodies Alexa 488 anti-rabbit IgG and Alexa 594 anti-mouse IgG (Molecular Probes, Eugene, OR; 1:300) and washed three times. Nuclei were stained with 4’, 6-Diamidino2-phenylindole (DAPI) (100 ng/ml). Multiplex real-time quantitative RT-PCR RNA was treated with DNase I Amplification Grade (Invitrogen). cDNA synthesis preparation was performed using the SuperScript First-Strand synthesis system for RT-PCR (Invitrogen). As controls, “RT-” samples, containing appropriate quantities of all reagents with the exception of reverse transcriptase were also included to evaluate potential genomic contamination. Real-time quantitative RT-PCR was carried out using the TaqMan probe-based chemistry (Applied Biosystems) on an ABI Prism 7700 Sequence Detector System. The 5′-reporters for the caveolin3 and ESTs were 6-carboxyfluorescein (FAM) and the endogenous control Pre Developed Assay 18S rRNA was VIC (Biosource International). Primers and probes were designed using Primer Express 1.5 (Applied Biosystems) so that they would overlap with the Affymetrix target sequence. All primers, probes, and the target sequences were verified by BLAST to ensure that each sequence was unique for the specific target gene. Cav3/Forward 5′CACAAGGCTCTGATCGCCTC-3′, Cav3/Reverse 5′-TCCGTGTGCTCTTCGGTCA-3′, Cav3/probe 5′-FAM-TGAAGGAGCCCCAGCCTCACAATG-TAMRA-3′, Darp/Forward 5′CACACTGAGGAGTCCTGCACC-3′, Darp/Reverse 5′-AGCTGGCATGGGACCAGAG-3′, Darp/probe FAM-TGCTTTGGGACAATGCCAACAGTATACATCC-TAMRA-3′, ”sequence K”/Forward 5′-CATTATGTGCTCAACCCTGGG-3′, ”sequence K”/Reverse 5′-CTCTG TCAGCCATCCGCATT-3′, ”sequence K”/probe 5′-FAM-CGCCCGTGCGCTTCCACCTTAMRA-3′, ”sequence T”/Forward primer 5′-AGGTGCCCTTGGACAATTTG-3′, ”sequence T”/Reverse 5′-ACCTCTGGCTCCTCCCTCC-3′, ”sequence T”/Probe 5′-FAMAAGGAGCGTACCCTTATGCTGGGCA-TAMRA-3′. All of the samples were run in duplicate. The RT controls with no cDNA template included to assess specificity, showed no appreciable amplification (CT more than 36, data not shown). The amplification reactions (50 µl) contained

50 ng cDNA or control, TaqMan universal PCR master mix (25 µl), 10 µM primers (2 µl each), 10 µM probe (1 µl), and 2.5 µl of predeveloped 18S rRNA. The following thermal cycling conditions were used: 50°C for 2 min, 95°C for 10 min followed by 95°C for 10 s, and 60°C for 1 min for 40 cycles. Data were collected and analyzed with Sequence Detector v1.7a. Relative quantification method of ∆∆Ct was used as described in the manufacturer’s manual (http://www.appliedbiosystems.com/support/tutorials/7700amp/). RESULTS To study changes in gene expression and identify new genes and transcripts involved in muscle cell proliferation and differentiation, RNA from C2C12 myoblasts was isolated at 8 time points (days –2, –1, 0, 2, 4, 6, 8, 10). The cells were first harvested 24 hours after plating (day –2) when cells were ~40−50% confluent and actively dividing. A second time point at day –1 was included to better characterize gene expression during the proliferative phase. Upon reaching 100% confluence on day 0 (fusion index = 0), the growth medium was replaced with differentiation medium. By day 2, 9 +/− 3% of nuclei were in multinucleate cells and by day 4, long myotubes were formed (fusion index = 40 +/−5%). By day 6, the fusion index was 48 +/−5%, and some myotubes were sporadically twitching. The time course was continued until day 10 (fusion index = 52 +/−3%). Cluster analysis CAGED 1.1 analysis of 2895 probe sets remaining after filtering as described in “Materials and Methods” grouped these into 22 unique clusters containing between 5 and 787 probe sets (Fig. 1). Overall, four main categories of clusters can be distinguished. The first (Group I) includes three clusters of genes that have a trend of decreasing expression over the time course (Supplemental Table 1). Genes in these clusters had high expression values at day –2 and day –1 when the cells were rapidly dividing and decreased expression by –1.45 to –335.2 fold (average = –22.7) over the 12-day time course. Group II contains five clusters of genes for which the expression levels peaked on day 0 followed by a temporary or permanent decrease on day 2 (Supplemental Table 2). At day 0, contact inhibition and serum deprivation stimulated cell-cycle withdrawal, inducing the myoblasts to subsequently fuse. Group III includes the nine clusters of genes whose expression was very low or undetectable at the beginning of the time course and progressively increased after day 0 (Supplemental Table 3). Fold changes by day 10 ranged from +1.47 to +687.6 (average = +19.3). Between days 2 and 10, myoblasts ceased proliferating and the majority of them fused into postmitotic myotubes and began sarcomere assembly. The fourth group contains clusters of genes with relatively flat expression profiles (Supplemental Table 4). The mRNA expression levels of these genes fluctuated enough to pass the twofold increase or decrease threshold, but they did not change dramatically in any single direction over the time course. These transcripts represent primarily housekeeping genes, ubiquitous transcription factors or other genes that are required to be expressed for general cell function such as protein synthesis, energy production or transcription. In the following analysis, we focused our attention on the clusters in Groups I−III as these exhibited the most dramatic changes over time and are likely to include the majority of genes that play key roles in myogenesis.

Clusters of genes down-regulated in differentiating cells (Group I) Clusters 1−3, containing a total of 467 probe sets, represent transcripts whose expression was relatively high in proliferating cells at day –2, but decreased over the time course to the point where the majority were scored “Absent” by Affymetric MAS 5.0 software. These genes, expressed only in the rapidly dividing and noncontact-inhibited cells, represent an important set of transcripts for understanding cell-cycle withdrawal events and differentiation (Fig. 2). Cluster 1 contains transcripts with the most dramatic decrease in expression over time (Figs. 1 and 2). The 14 probe sets in this cluster have an average fold change day 10/day –2 of –57.0 and include growth factors (proliferins Plf, Plf2, and Mrpplf3), transcription regulators (fos-like antigen 1 Fosl1, distal-less homeobox 1 Dlx1), signal transduction genes (Akap12 and urokinase plasminogen activator receptor Plaur) and two plasma membrane genes (aquaporin 1 Aqp1 and amphiregulin Areg). Clusters 2 and 3 contain a total of 453 probe sets, including 204 with GO annotations, comprising mostly cell cycle-related genes (n=40) and regulators of transcription (n=33) or ion transport (n=23). Among them are 64 probe sets for genes involved in processes related to nucleic acid metabolism and 37 for genes related to protein metabolism. Cell-cycle-related genes in these clusters include five cell division homologue genes Cdc2a, Cdc6, Cdc25c (two probe sets), Cdc45l and Cdc7l1 and cyclins Ccna2, Ccnd1 (three probe sets), Ccne2, Ccnf. Eleven of the cell-cycle related probe sets represent transcription factors (e.g., E2F transcription factor 1, retinoblastoma-like 1, transcription factor Dp 1). Other transcription factors include distal-less homeobox (Dlx2, Dlx3), forkhead box (Foxc2, Foxm1), activating transcription factor 3 and muscle specific myogenic factor 5 (Myf5). These clusters contain 24 probe sets involved in DNA replication, including genes coding for primases (Prim1, Prim2), polymerases (Pola1, Pola2, Pold1, Pold2, Pole2, Polr2a), origin recognition complex members (Orc1l, Orc6l), and ribonucleotide reductases (Rrm1, Rrm2). There are also three DNA packaging genes Hmgb1, Hmgb2, and Hmgb3, and three genes playing a role in DNA recombination (e.g., Rad51, Rad51ap1). 133 ESTs, RIKEN clones, cDNAs and hypothetical proteins (28% of total number of probe sets in clusters 1−3) represent potentially novel proteins that may play important roles in proliferating cells (Supplemental Table 1, Fig. 2). Clusters of genes that peak on day 0 (Group II) Five clusters (numbered 4−8) exhibit unique kinetics, with expression levels rising to a temporary peak at day 0 (Fig. 2). They contain a total of 266 probe sets with a fold change between day 0 and day –2 greater than 2, out of which 23% (n=62) are unannotated transcripts such as ESTs and RIKEN clones. Genes in clusters 4−8 are primarily involved in the following biological processes: signaling pathways (n=36), transcription (n=24), cell adhesion (n=14), cellcycle regulation (n=12) cell-cycle arrest or apoptosis (n=6). Proapoptotic genes are turned on at this stage, most likely due to serum withdrawal, while cell adhesion transcripts are up-regulated, as many of them may be involved in myoblast fusion. Cluster 4 includes 24 probe sets with an average fold change of 2.2 (day 0/day –2) and –1.8 at the end of the time course (day 10/day –2) (Supplemental Table 2). Five of these genes are

integral membrane proteins (e.g., cell adhesion molecules such as integrin α-5), three represent extracellular matrix components (e.g., inhibin β-A). Two basic helix–loop–helix (bHLH) domain genes, both inhibitors of DNA binding (Id1, Id2), show the highest expression values on day 0 followed by a gradual decrease associated with serum withdrawal (22). Four other genes involved in signaling pathways are found in this cluster, including Dab2 (mitogen-responsive 96 kDa phosphoprotein) and prostaglandin E receptor 4 (subtype EP4) Ptger4. Seven uncharacterized transcripts (e.g., ESTs and RIKEN clones) clustered together with these genes. Some of them are likely to have biologically important function based on their conserved domain structure. For example, one of these putative novel genes (166394_at, AW124444, RIKEN cDNA 4933425C05) with a large increase on day 0 (fold change = 2.5) and rapid decline by day 2 (fold change −5.6 with “Absent” calls) codes for the mouse Hath6 (pending), basic helix–loop– helix transcription factor 6, also known as okadin (LocusLink ID 71093) that is the mouse ortholog of the human transcription factor Hath6 (LocusLink ID 84913). Hath6 contains a bHLH domain and shares significant homology with the bHLH factor Math6. Math6 was shown to promote neuronal vs. glial fate determination in the nervous system (23). Another bHLH gene Myod1, one of the major transcription factors involved in myogenesis, is scored “Present” starting at day –2 and its expression peaks on day 0. Myod1 is part of cluster 5, a relatively large cluster containing 161 probe sets. Most of the genes in cluster 5 do not show a significant increase or decrease in expression level during the 12-day time course, but are upregulated on day 0 (average fold change day 0/day –2 = 2.3 and day 10/day –2 = 0.4). Coexpressed with Myod1 are 13 other transcription factors in this cluster, such as Six1 (two probe sets), Sox4 and Junb. Three genes in this cluster are involved in cell-cycle arrest or apoptosis (Gas2, Gadd45b, Pmp22), while 14 others are involved in various intracellular signaling (e.g., Prkcd, Socs2), cell−cell communication (e.g., Gjb2) and adhesion (e.g., Itgb3, Vcam1). Clusters of genes up-regulated at differentiation (Group III) Nine clusters, numbered 9–17, contain 615 probe sets for genes whose expression was upregulated 1.47 to 687.6 fold over the time course. The majority of these transcripts were scored “Absent” in proliferating cells, and the expression signals increased rapidly by day 2 when myoblasts began fusion. For some of these genes the shift in expression values started earlier at day 0. Five of the nine clusters (numbered 9−13) contain 255 of the most dramatically upregulated genes at differentiation, 47% of which are known to be expressed in muscle. Cluster 9 (average fold change day 10/day –2 = 545.2) consists of only six probe sets that are very highly up-regulated in differentiated C2C12 cells (Fig. 1, Supplemental Table 3). Four of these genes are muscle-specific and have additional probe sets that are found in other clusters because they showed similar kinetics but smaller fold changes. Probe sets for Cav3 and Atp2a1 are found in both clusters 9 and 10, while clusters 9 and 12 both contain additional probe sets for Actc1 and Tnnt1. In addition to these genes, cluster 9 includes one more known muscle-specific gene, embryonic myosin (Myh3) and one uncharacterized EST (167088_r_at, AV215683). Cluster 10 (average fold change day 10/day –2 = 122.2) is remarkable for its high content of muscle-specific genes (Supplemental Table 3). Thirty-three of 36 probe sets in this cluster are known to be expressed in various muscle types, mostly in skeletal muscle. Nearly half of the transcripts encode sarcomeric proteins (n=17). Among these are six myosins (Myla, Mylf, Mylpf, Myh1, Myh4, Myh7), five different isoforms of troponins (Tncc, Tncs, Tnni2, Tnni1, Tnnt3),

myosin binding protein H and two probe sets for myomesin 2. In addition, there are two dystrophin glycoprotein complex (DGC) genes (Sgca, Sgcg), known to be expressed in myotubes but not myoblasts (24). Not surprisingly, this cluster contains another gene known to interact with the DGC- caveolin-3 (Cav3) (25). Furthermore, cluster 10 contains genes for myoglobin, insulin-like growth factor 2 and the muscle-specific transcription factors myogenin and Mef2c (2 probe sets). It is noteworthy that in this highly muscle-specific cluster there is only one EST (166748_at, AI553396) and one RIKEN clone (167436_i_at, AV242737). Cluster 11 shows a similar trend in expression kinetics to cluster 9, but it contains a much higher number of genes with a smaller average increase in expression (average fold change day 10/day –2 = 23.9). Although it contains a few sarcomeric genes (8%) (e.g., tropomyosin 1, skeletal muscle actin Acta1), this cluster is very heterogenous and includes 15 enzymes, such as muscle specific phosphofructokinase and creatine kinase (2 probe sets), nonmuscle-specific proteases Klk13, Klk26 and cathepsin C. There are also six calcium ion-binding genes (e.g., calsequestrin 1, calmodulin-like 4, delta-like 1) and Ryr1, the sarcoplasmic reticulum calcium channel ryanodine receptor gene. Other cluster 11 genes are known to be involved in the process of myogenesis code for follistatin (26) and cardiac morphogenesis Xin (27). A unique pattern of expression was observed for the genes grouped in cluster 15. Twenty-six of 31 probe sets in this cluster exhibited decreased expression at day –1 or 0, despite the overall increase of the intensity signal in differentiated cells compared with proliferating cells. The average fold change between day 0 and –2 was –1.2, while average fold change day 10/day –2 was 3.5. Affymetrix MAS 5.0 scored 11 of these probe sets “Present” throughout the time course. Nearly half of the transcripts in this cluster are ESTs, cDNA or RIKEN clones, but 18 genes were previously characterized. They include cell adhesion genes encoding extracellular matrix proteins such as nephronectin and procollagens (Col1a1, Col8a1) and six genes coding for membrane associated proteins (e.g., G protein-coupled receptor kinase 5 Gprk5, sodium/potassium/chloride transporter Slc12a2 and 7-dehydrocholesterol reductase Dhcr7). Moreover, there are three genes involved in transcription: TEA domain family member 4 (Tead4), B cell leukemia/lymphoma 6 (Bcl6), and myogenic factor 6 Myf6 (Herculin). It is noteworthy, that although Myf6 shows a –2.41 decrease of expression on day 0, all of the expression values by day 2 are scored “Absent.” Beginning day 4, however, the intensity signals increase 7.5 fold and are scored “Present” throughout the rest of the time course, which is expected as this is a late marker of myogenic differentiation (28, 29). Expressed sequence tags In this study, 927 ESTs and other unknown transcripts clustered together with known, often wellcharacterized genes. Three hundred and sixty eight of these exhibit large changes in gene expression over the time course and are in Groups I−III (clusters 1−17). Many of these putative novel genes have either some homology to known genes or conserved protein domains. The relational database (available at http://www.chb-genomics.org/beggslab/) is a source of available information on these transcripts and their expression over the time course. Out of the total 368 ESTs, 244 are annotated with SwissProt entries, 246 have known chromosome localization and 114 show protein similarities by BLASTP. Furthermore, 135 of the ESTs have conserved protein domain entries predicted by InterPro. This information can serve as a tool to further characterize these novel gene transcripts.

Three uncharacterized transcripts belonging to cluster 11, which were highly up-regulated and contained interesting predicted sequence domains, were chosen for confirmatory real-time quantitative RT-PCR studies. One (“sequence T”) encoded immunoglobulin domains with homology to titin (probe sets 168291_f_at, AV271877 and 167624_f_at, AV244107), another (165773_at, AI853437), now designated Darp (30), is an ankyrin repeat-containing protein expressed in skeletal and cardiac muscle and brown adipose tissue. The third probe set (133932_at, AI503993) corresponds to the cDNA clone termed mouse hypothetical protein LOC216790 mRNA (GenBank BC046431). The corresponding protein contains a protein kinase domain and is a fragment of a putative ortholog of the human KIAA1639 protein with which it shares 87% amino acid identity. Real-time quantitative RT-PCR assays for each confirmed the up-regulation seen by GeneChip analysis, although the absolute fold change values varied somewhat between the two methods with RT-PCR appearing to have a greater dynamic range (Fig. 3). Further analysis of the “sequence T” gene suggests this may code for an interesting novel muscle protein. The day 10/day –2 fold changes for probe sets 168291_f_at and 167624_f_at were 31.7 and 32.3, respectively. “Sequence T” corresponds to the expressed sequence AW822216 (LocusLink ID 98733, UniGene cluster Mm.197693) that is predicted to encode a 943 amino acid protein (GenBank accession NP_849215) containing one fibronectin type III domain and seven immunoglobulin domains. This protein is a potential ortholog of the human KIAA0657 protein with which it shares 77% identity and 84% similarity. In addition the mouse protein shows 27% identity (46% similarity) with human obscurin (31), and 25% identity (41% similarity) with titin (32). Thus, “sequence T” is most likely a fragment of a larger gene that may represent a novel member of the immunoglobulin repeat-containing gene family. Nearest neighbors gene analysis was conducted for probe set 168291_f_at (“sequence T”) to identify other transcripts with similar expression patterns during the time course. The 50 most highly correlated probe sets across the data set all had correlation coefficients of 0.96 or greater (http://www.chb-genomics.org/beggslab/). Not surprisingly, the vast majority of the nearest neighbors (n=41) belong to Group III (clusters 9−13). As expected, the nearest neighbors of 168291_f_at included the second probe set for the same gene. Furthermore, the nearest neighbors list contains 19 transcripts with “muscle development” GO annotations. The list includes many skeletal and cardiac sarcomeric proteins such as four myosins (Myla, Mylpf, Mylft, Myh3), two actins (Acta1, Actc1), myomesin (Myom2) (3 probe sets), myosin binding protein H (Mybph), tropomyosin-1 (Tpm1), six probe sets for various troponin isoforms (Tnni1, Tnnt2, Tnnt3, Tncc, Tncs) and a probe set for the cardiac/fast calcium transporter ATPase (Atp2a1). Microarray data validation To examine how transcriptional patterns, determined by GeneChip analysis, correlate with proteins levels and localization, selected genes were analyzed at the protein level (Fig. 4). Several known muscle membrane and sarcomeric genes that show changing expression patterns in differentiated cells were chosen for the analysis. One of the most highly induced genes in fused cells was caveolin-3 (Cav3), a caveolar scaffolding membrane protein mutated in limb girdle muscular dystrophy type-1C (33). Two probe sets for Cav3 are present in the data set, and they belong to clusters 9 and 10 with day 10/day –2 fold changes of 646.7 and 109.4, respectively. This gene was validated in three different ways: by RT-PCR,

immunohistochemistry and Western blot (Fig. 3A, 4A−C). The relative quantity of RNA in RTPCR correlated well with results from GeneChip. Caveolin-3 protein was undetectable by Western blotting on days –2, –1, and 0, but levels increased gradually starting at day 2. Immunohistochemistry confirmed this finding with expression limited to differentiated myotubes later in the time course. Double staining with antibodies to the sarcomeric proteins, α-actinin-3 (cluster 12) and myosin heavy chain (fast) (cluster 10), suggests that both of them are absent in proliferating cells but are highly expressed beginning day 2 (Fig. 4A, B, D). In contrast, the dystrophin glycoprotein complex member dystroglycan-1 (cluster 11), was detected by Western blot beginning day –1 and GeneChip analysis scored this probe set as present throughout the time course (Fig. 4A, B, E). Aquaporin-1 (Aqp1), a down-regulated transcript from cluster 1, represents an example where transcript and protein levels were not directly correlated. Western blot analysis revealed two bands: a 28 kDa, unglycosylated form and a 45 kDa, glycosylated form (Fig. 4A, B, E). Interestingly, the unglycosylated form of aquaporin-1 protein correlated very well with mRNA expression levels on the microarray chip. However, over time, levels of the glycosylated form accumulated causing total protein levels to be inversely related to the transcript levels. Immunohistochemistry studies confirmed that aquaporin-1 was present throughout the time course but mostly in mononucleate cells that did not fuse (Fig. 4E). DISCUSSION Temporal patterns of gene expression during C2C12 cell myogenic differentiation Myogenesis is a complex cascade of discrete developmental steps that are under strict transcriptional control. The early stages of this highly ordered process were recapitulated by the differentiating C2C12 cells, which progressed through a predictable pattern of myogenic events during the 12-day time course. The length of our experiment allowed us to observe events whose dynamics were only apparent over a long time period. The primary MRFs, Myf5, and Myod1 were scored “Present” throughout the entire time course (day –2). However, Myf5 expression (cluster 3) decreased gradually while Myod1 transcripts peaked on day 0 (cluster 5) (Fig. 5A). In contrast, the secondary MRFs, myogenin (cluster 10) and Myf6 (cluster 15), were induced only later in the time course. Myogenin was turned on at day 0 and had the highest expression values on day 2. Myf6 was scored “Absent” in proliferating cells but was expressed by day 4. Four probe sets for Mef2c were up-regulated by day 2 as well. Thus, as expected from previous studies (16–18), each transcription factor exhibited unique expression kinetics. Despite significant differences in methodologies, a previous analysis by Moran et al. (16) of C2C12 cell gene expression over four days of differentiation produced similar results for many highly induced muscle-specific genes (Supplemental Tables). These workers used one-way nested ANOVA tests and a 2.5-fold change threshold to select 629 probe sets from 5303 transcripts scored “present” on Affymetrix Mu11K GeneChips. Data on these probes sets were grouped into 16 clusters of two to 92 transcripts each, using a self-organizing map algorithm (34) that requires the user to select the number of centroids and their configuration. In contrast, CAGED uses a constant statistical measure to identify clusters based on maximizing the probability that two trajectories belong to a given model (19), resulting in the identification of 22 distinct clusters containing from five to 787 probe sets. A notable difference between these analyses is a lack of any apparent enrichment of immune regulation genes in the CAGED-

derived clusters of the present analysis. Of the 2895 filtered probe sets, only 28 were annotated as belonging to the inflammatory response pathway by GenMAPP and these were distributed among 13 different clusters without any apparent enrichment for one or another pattern of expression. Of six inflammatory response genes identified by Moran et al. as exhibiting a transient peak of expression on day zero (cluster 1,4 of Moran et al.) two (Gro1/j04596_s_at/95348_at; J04596 and Cxc15/Scyb5/U27267_s_at/98772_at; U27267) were grouped together in cluster 6 by CAGED and one probe set for Vcam1 (92559_at; U12884) was in cluster 5. However, three others (Il18, Saa3 and Cc19/Scya9) were scored absent in the present analysis. Such variation between cognate probe sets from different generations of Affymetrix GeneChips has been well documented (35), making independent validation by complementary methods necessary for genes of particular interest in any given study. At day –2, the cultures contained rapidly proliferating mononucleate myoblasts, but by day 0, the cells were confluent and becoming quiescent. Cell cycle withdrawal was essentially complete by day 2. The GeneChip analysis confirmed that this process was associated with reductions in transcript levels for a number of genes involved in protein and nucleic acid metabolism, transcription, cell-cycle regulation, cell signaling, and ion-transport (Fig. 2). Cyclins, together with the cyclin-dependent kinases (cdks) are major components of the cell-cycle regulatory system (22). As predicted, the cyclins Ccna2, Ccnb2, Ccnd1 were significantly down-regulated between day –1 and day 0 while Ccnd3 expression increased between days –1 and +2 and remained high throughout the rest of the time course (Fig. 5B). Furthermore, Ccnd3 is a member of cluster 14 together with two probe sets for the Retinoblastoma (Rb) gene. Active Rb family members and cyclin-dependent kinase inhibitors need to be up-regulated for permanent withdrawal of myoblasts from the cell cycle (36). Conversely, the negative regulators of cell division, Cdkn1a (P21) and Cdkn1c (P57), are required for myogenic differentiation (37) and in our experiment were induced between days –1 and 2 (Fig. 5B). The switch from a proliferative to a differentiative state in this time course occurred primarily between day –1 and day 2 and was associated with a transient peak of expression for a large class of transcripts clustered into Group II (Fig. 5C). These included many genes involved in cell adhesion, cell-cycle arrest, signaling, and transcription (Fig. 2). Genes in these clusters include Vcam1, Itgb3, Itga5, and Vcl. Vcam1 is known to be involved in the process of cell fusion (38) while Itga5 (fibronectin receptor α) is an integrin important for proper adhesion (39). The cytoskeletal protein vinculin (Vcl), which plays a critical role in the assembly of microfilament/plasma membrane junctions at cell contacts, is also known to take part in the process of cell adhesion (40). Interestingly, several genes that belong to this group were not previously associated with a role in skeletal muscle differentiation. For example, the prostaglandin E receptor 4 (Ptger4), which is involved in bone formation (41), cell signaling, and the immune response (42), is up-regulated specifically on day 0 (fold change 2.1). As various prostaglandins are known to be involved in skeletal muscle fusion (43–45), Ptger4 represents a good candidate for a novel gene involved in this process. Between days 2 and 10, differentiating myotubes matured considerably, increasing in number and size as well as in complexity. Sarcomere assembly occurred throughout this period leading to the development of a functional contractile apparatus as evidenced by significant numbers of spontaneously contracting myotubes between days 6 and 10. Many known muscle-specific genes were significantly up-regulated at the start of this phase, with transcription remaining at high levels throughout the remainder of the time course. Markers of late differentiation, such as genes

coding for proteins involved in the assembly of the sarcomeric contractile apparatus including various isoforms of myosins, troponins, myomesins and many others, were scored “Absent” by MAS 5.0 early in the time course but “Present” after myogenic fusion began (i.e., day 2 and later). In contrast, a number of transcripts involved in excitation contraction coupling, including 14 transcripts with GO classification as calcium ion binding proteins (such as ATPase Atp2a2) were mostly scored “Present” from the beginning of the time course (i.e., at day –2) but their expression increased dramatically on day 0 and remained up-regulated for the remainder of the time course (Fig. 5D). Moreover, clusters in this group contain the voltage-dependent calcium channels (Cacna1s, Cacna2d1, Cacnb1 and Cacng1), acetylcholinesterase, ryanodine receptor 1 and cholinergic receptors, nicotinic (Chrna1, Chrnd, Chrng) and many other genes important for cellular excitability and contraction. In this group, cluster 10 is particularly enriched in structural muscle genes, enzymes, and myogenic transcripts (Supplemental Table 3). Therefore, it is worth noting in this cluster a gene not known to be involved in skeletal muscle development: collagenous repeat-containing sequence (Corcs, pending), a mouse homologue of a recently cloned human gene CORS26 that has been implicated to play a role at early stages of bone formation (46). C2C12 cells are multipotential mesenchymal cells capable of transdifferentiating into either adipose (47, 48) or osteoblast (49, 50) lineages under specific culture conditions. However, a search of GO annotations among the 2895 probe sets representing differentially expressed transcripts revealed only one adipocyte specific (Pparg, fold change day 10/day –2 = –19.6) and one osteoblast/clast specific (Tob1, fold change day 10/day –2 = 2.9) gene in the filtered data set. Thus, myogenic differentiation appears to be the primary developmental pathway modeled in this system. It is worth noting that multiple probe sets of the same gene were occasionally assigned to two distinct clusters. However, in each of these cases, the patterns of kinetics over the time course were similar and the different clusters always belonged to the same groups. For example, two probe sets of calpain 3 (GenBank ID X92523) are in cluster 12, and a third probe set (GenBank ID AF091998) is in cluster 13. This disparity may be explained by variability in the clone source of a target sequence or variation in the quality of the Affymetrix probes (35). In some instances, different probe sets may represent splice variants or may cross-hybridize to multiple members of conserved gene families or transcripts with different poly-A sites. In particular, Affymetrix probe sets designated with an “f” in the name may recognize multiple members of a sequence family (e.g., myosin probe sets 162101_f_at, 170347_f_at, and 98616_f_at from cluster 10, Supplemental Table 3). In these instances, if defining the identity of a particular isoform is critical, isoform-specific RT-PCR assays, such as those described above for “sequences T” and “K,” are necessary. Novel genes Of the 2895 probe sets included in the CAGED analysis, 927 (32%) represent transcripts for uncharacterized genes. These were distributed among almost all clusters representing all four Groups of expression kinetics. Multiple probe sets for unknown genes were classified together with known genes involved in every step of muscle differentiation. For example, we identified and confirmed the expression patterns for several transcripts that appear interesting based on their predicted protein domain structure, pattern of expression or cluster localization (Fig. 3). “Sequence T,” represented by probe sets 168291_f_at and 167624_f_at (Unigene cluster Mm.197693) is one such transcript whose corresponding amino acid sequence contains

immunoglobulin and fibronectin 3 domains with similarities to amino-terminal sequences of titin and obscurin. However, the predicted “sequence T” transcript also encodes a unique region, not homologous with any known genes. RT-PCR with primers designed in this specific region demonstrated that there is no expression in proliferating myoblasts, whereas the transcript is highly up-regulated on day 2, when the first myotubes are formed. These findings, together with the fact that many nearest neighbors of this Titin-like RIKEN clone are sarcomeric genes, make it an interesting candidate for further study. Summary Understanding the temporal expression of thousands of genes involved in muscle development is a challenging task. Despite the fact that the microarrays detect only changes in transcription levels, mRNA expression profiling remains a powerful technique. The present data demonstrate that a majority of genes previously studied at both mRNA and protein level in differentiating C2C12 cells correlate well with the expression patterns detected by Affymetrix GeneChips. Furthermore, with the exception of Aqp1, we saw good correlations between the GeneChip mRNA expression levels, RT-PCR results and detected protein levels for each of the transcripts analyzed. Cluster analysis using CAGED enabled us to group genes according to their expression dynamics over time and resulted in classification of transcripts with similar functions into the same clusters or clusters in the same groups. However, not all transcripts are annotated and many remain unknown for their potential role(s) in muscle differentiation. Supplemental Tables 1−4 list many unknown genes, particularly, those classified into Group III clusters, that likely represent novel genes modulated during myogenesis. Further characterization and more detailed studies of each will lead to a better understanding of the process of muscle differentiation. Moreover, the present data are available to serve as a basis for comparison with gene manipulation studies aimed at understanding human muscle diseases by over- or underexpressing normal and abnormal neuromuscular disease genes in vitro. ACKNOWLEDGMENTS We thank Leslie Frieden, Drs. Michal Tomczak and Victoria Petkova for much help and advice. We also thank Christa Abraham, Daniel Newburger, and Elizabeth Asch for excellent technical assistance and help with figures and Dr. Emmanuela Gussoni for many helpful comments and critical reading of the manuscript. This work was supported by NIH grants R01 AR44345 and P01 NS40828, the Muscular Dystrophy Association of the USA, the Joshua Frase Foundation and the Lee and Penny Anderson Family Foundation. LMK is an investigator of the Howard Hughes Medical Institute. REFERENCES 1.

Weintraub, H., Davis, R., Tapscott, S., Thayer, M., Krause, M., Benezra, R., Blackwell, T. K., Turner, D., Rupp, R., Hollenberg, S., et al. (1991) The myoD gene family: nodal point during specification of the muscle cell lineage. Science 251, 761–766

2.

Chambers, R. L., and McDermott, J. C. (1996) Molecular basis of skeletal muscle regeneration. Can. J. Appl. Physiol. 21, 155–184

3.

Anderson, J. E. (1998) Murray L. Barr Award Lecture. Studies of the dynamics of skeletal muscle regeneration: the mouse came back! Biochem. Cell Biol. 76, 13–26

4.

McKinsey, T. A., Zhang, C. L., and Olson, E. N. (2001) Control of muscle development by dueling HATs and HDACs. Curr. Opin. Genet. Dev. 11, 497–504

5.

Megeney, L. A., and Rudnicki, M. A. (1995) Determination versus differentiation and the MyoD family of transcription factors. Biochem. Cell Biol. 73, 723–732

6.

Perry, R. L., and Rudnicki, M. A. (2000) Molecular mechanisms regulating myogenic determination and differentiation. Front. Biosci. 5, D750–D767

7.

Friday, B. B., Mitchell, P. O., Kegley, K. M., and Pavlath, G. K. (2003) Calcineurin initiates skeletal muscle differentiation by activating MEF2 and MyoD. Differentiation 71, 217–227

8.

Yun, K., and Wold, B. (1996) Skeletal muscle determination and differentiation: Story of a core regulatory network and its context. Curr. Opin. Cell Biol. 8, 877–889

9.

Sabourin, L. A., and Rudnicki, M. A. (2000) The molecular regulation of myogenesis. Clin. Genet. 57, 16–25

10. Gossett, L. A., Kelvin, D. J., Sternberg, E. A., and Olson, E. N. (1989) A new myocytespecific enhancer-binding factor that recognizes a conserved element associated with multiple muscle-specific genes. Mol. Cell. Biol. 9, 5022–5033 11. Naya, F. S., and Olson, E. (1999) MEF2: a transcriptional target for signaling pathways controlling skeletal muscle growth and differentiation. Curr. Opin. Cell Biol. 11, 683–688 12. Andres, V., and Walsh, K. (1996) Myogenin expression, cell cycle withdrawal, and phenotypic differentiation are temporally separable events that precede cell fusion upon myogenesis. J. Cell Biol. 132, 657–666 13. Franklin, D. S., and Xiong, Y. (1996) Induction of p18INK4c and its predominant association with CDK4 and CDK6 during myogenic differentiation. Mol. Biol. Cell 7, 1587– 1599 14. Yaffe, D., and Saxel, O. (1977) Serial passaging and differentiation of myogenic cells isolated from dystrophic mouse muscle. Nature 270, 725–727 15. Blau, H. M., Chiu, C. P., and Webster, C. (1983) Cytoplasmic activation of human nuclear genes in stable heterocaryons. Cell 32, 1171–1180 16. Moran, J. L., Li, Y., Hill, A. A., Mounts, W. M., and Miller, C. P. (2002) Gene expression changes during mouse skeletal myoblast differentiation revealed by transcriptional profiling. Physiol. Genomics 10, 103–111 17. Shen, X., Collier, J. M., Hlaing, M., Zhang, L., Delshad, E. H., Bristow, J., and Bernstein, H. S. (2003) Genome-wide examination of myoblast cell cycle withdrawal during differentiation. Dev. Dyn. 226, 128–138

18. Delgado, I., Huang, X., Jones, S., Zhang, L., Hatcher, R., Gao, B., and Zhang, P. (2003) Dynamic gene expression during the onset of myoblast differentiation in vitro. Genomics 82, 109–121 19. Ramoni, M. F., Sebastiani, P., and Kohane, I. S. (2002) Cluster analysis of gene expression dynamics. Proc. Natl. Acad. Sci. USA 99, 9121–9126 20. Sanoudou, D., Haslett, J. N., Kho, A. T., Guo, S., Gazda, H. T., Greenberg, S. A., Lidov, H. G., Kohane, I. S., Kunkel, L. M., and Beggs, A. H. (2003) Expression profiling reveals altered satellite cell numbers and glycolytic enzyme transcription in nemaline myopathy muscle. Proc. Natl. Acad. Sci. USA 100, 4666–4671 21. Liu, G., Loraine, A. E., Shigeta, R., Cline, M., Cheng, J., Valmeekam, V., Sun, S., Kulp, D., and Siani-Rose, M. A. (2003) NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res. 31, 82–86 22. Walsh, K., and Perlman, H. (1997) Cell cycle exit upon myogenic differentiation. Curr. Opin. Genet. Dev. 7, 597–602 23. Inoue, C., Bae, S. K., Takatsuka, K., Inoue, T., Bessho, Y., and Kageyama, R. (2001) Math6, a bHLH gene expressed in the developing nervous system, regulates neuronal versus glial differentiation. Genes Cells 6, 977–986 24. Liu, L. A., and Engvall, E. (1999) Sarcoglycan isoforms in skeletal muscle. J. Biol. Chem. 274, 38,171–38,176 25. McNally, E. M., de Sa Moreira, E., Duggan, D. J., Bonnemann, C. G., Lisanti, M. P., Lidov, H. G., Vainzof, M., Passos-Bueno, M. R., Hoffman, E. P., Zatz, M., et al. (1998) Caveolin-3 in muscular dystrophy. Hum. Mol. Genet. 7, 871–877 26. Amthor, H., Connolly, D., Patel, K., Brand-Saberi, B., Wilkinson, D. G., Cooke, J., and Christ, B. (1996) The expression and regulation of follistatin and a follistatin-like gene during avian somite compartmentalization and myogenesis. Dev. Biol. 178, 343–362 27. Muntoni, F., Brown, S., Sewry, C., and Patel, K. (2002) Muscle development genes: Their relevance in neuromuscular disorders. Neuromuscul. Disord. 12, 438–446 28. Bober, E., Lyons, G. E., Braun, T., Cossu, G., Buckingham, M., and Arnold, H. H. (1991) The muscle regulatory gene, Myf-6, has a biphasic pattern of expression during early mouse development. J. Cell Biol. 113, 1255–1265 29. Patapoutian, A., Yoon, J. K., Miner, J. H., Wang, S., Stark, K., and Wold, B. (1995) Disruption of the mouse MRF4 gene identifies multiple waves of myogenesis in the myotome. Development 121, 3347–3358 30. Ikeda, K., Emoto, N., Matsuo, M., and Yokoyama, M. (2003) Molecular identification and characterization of a novel nuclear protein whose expression is up-regulated in insulinresistant animals. J. Biol. Chem. 278, 3514–3520

31. Young, P., Ehler, E., and Gautel, M. (2001) Obscurin, a giant sarcomeric Rho guanine nucleotide exchange factor protein involved in sarcomere assembly. J. Cell Biol. 154, 123– 136 32. Granzier, H., and Labeit, S. (2002) Cardiac titin: An adjustable multi-functional spring. J. Physiol. 541, 335–342 33. Galbiati, F., Razani, B., and Lisanti, M. P. (2001) Caveolae and caveolin-3 in muscular dystrophy. Trends Mol. Med. 7, 435–441 34. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. S., and Golub, T. R. (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 35. Nimgaonkar, A., Sanoudou, D., Butte, A. J., Haslett, J. N., Kunkel, L. M., Beggs, A. H., and Kohane, I. S. (2003) Reproducibility of gene expression across generations of Affymetrix microarrays. BMC Bioinformatics 4, 27 36. Halevy, O., Novitch, B. G., Spicer, D. B., Skapek, S. X., Rhee, J., Hannon, G. J., Beach, D., and Lassar, A. B. (1995) Correlation of terminal cell cycle arrest of skeletal muscle with induction of p21 by MyoD. Science 267, 1018–1021 37. Zhang, P., Wong, C., Liu, D., Finegold, M., Harper, J. W., and Elledge, S. J. (1999) p21(CIP1) and p57(KIP2) control muscle differentiation at the myogenin step. Genes Dev. 13, 213–224 38. Rosen, G. D., Sanes, J. R., LaChance, R., Cunningham, J. M., Roman, J., and Dean, D. C. (1992) Roles for the integrin VLA-4 and its counter receptor VCAM-1 in myogenesis. Cell 69, 1107–1119 39. Taverna, D., Disatnik, M. H., Rayburn, H., Bronson, R. T., Yang, J., Rando, T. A., and Hynes, R. O. (1998) Dystrophic muscle in mice chimeric for expression of alpha5 integrin. J. Cell Biol. 143, 849–859 40. Rudiger, M. (1998) Vinculin and alpha-catenin: shared and unique functions in adherens junctions. Bioessays 20, 733–740 41. Machwate, M., Harada, S., Leu, C. T., Seedor, G., Labelle, M., Gallant, M., Hutchins, S., Lachance, N., Sawyer, N., Slipetz, D., et al. (2001) Prostaglandin receptor EP(4) mediates the bone anabolic effects of PGE(2). Mol. Pharmacol. 60, 36–41 42. Kabashima, K., Sakata, D., Nagamachi, M., Miyachi, Y., Inaba, K., and Narumiya, S. (2003) Prostaglandin E(2)-EP4 signaling initiates skin immune responses by promoting migration and maturation of Langerhans cells. Nat. Med. 9, 744–749 43. Rossi, M. J., Clark, M. A., and Steiner, S. M. (1989) Possible role of prostaglandins in the regulation of mouse myoblasts. J. Cell. Physiol. 141, 142–147

44. Zalin, R. J. (1977) Prostaglandins and myoblast fusion. Dev. Biol. 59, 241–248 45. Entwistle, A., Curtis, D. H., and Zalin, R. J. (1986) Myoblast fusion is regulated by a prostanoid of the one series independently of a rise in cyclic AMP. J. Cell Biol. 103, 857– 866 46. Maeda, T., Abe, M., Kurisu, K., Jikko, A., and Furukawa, S. (2001) Molecular cloning and characterization of a novel gene, CORS26, encoding a putative secretory protein and its possible involvement in skeletal development. J. Biol. Chem. 276, 3628–3634 47. Yeow, K., Phillips, B., Dani, C., Cabane, C., Amri, E. Z., and Derijard, B. (2001) Inhibition of myogenesis enables adipogenic trans-differentiation in the C2C12 myogenic cell line. FEBS Lett. 506, 157–162 48. Grimaldi, P. A., Teboul, L., Inadera, H., Gaillard, D., and Amri, E. Z. (1997) Transdifferentiation of myoblasts to adipoblasts: Triggering effects of fatty acids and thiazolidinediones. Prostaglandins Leukot. Essent. Fatty Acids 57, 71–75 49. Ohyama, M., Suzuki, N., Yamaguchi, Y., Maeno, M., Otsuka, K., and Ito, K. (2002) Effect of enamel matrix derivative on the differentiation of C2C12 cells. J. Periodontol. 73, 543– 550 50. Yamaguchi, A. (1995) Regulation of differentiation pathway of skeletal mesenchymal cells in cell lines by transforming growth factor-beta superfamily. Semin. Cell Biol. 6, 165–173 Received July 21, 2003; accepted October 21, 2003.

Fig. 1

Figure 1. CAGED 1.1 clustering of gene expression patterns during myogenesis. (A) Gene expression profiles of 2895 probe sets distributed in 22 clusters (labeled at top left of each graph) and further grouped in four main groups (indicated at left). Plotted are the natural logarithms of the fold change (relative to day –2) vs. time in days. The day 8 time point is omitted, as data for the MG U74C chip were not available. Group I (clusters 1−3): genes with down-regulated expression signals in differentiating cells. Group II (clusters 4−8): genes with a peak of expression at day 0. Group III (clusters 9−17): genes up-regulated in fused cells. Group IV (clusters 18−22): genes with a relatively “flat” expression profiles. (B) Schematic representation of the time course: At day −2 and day −1, myoblasts are proliferating, at day 0, growth medium is replaced by fusion medium. At day 2, the first multinucleate myotubes are formed. Triplicate samples were analyzed at each of the indicated time points.

Fig. 2

Figure 2. Functional categorization of genes clustered by CAGED into Groups I−III. At left are shown the average profiles of all genes classified into the indicated clusters for Groups I−III (from top to bottom). Signal values for all members of a cluster were normalized as ratios of the first value (day –2), transformed on a natural logarithmic scale and computed as pointwise averages of all cluster members. At right are pie charts indicating total numbers of annotated genes grouped into the 11 most frequently identified functional categories by the Gene Ontology Consortium. Total numbers of probe sets and numbers of uncharacterized transcripts (“ESTs”) are also indicated for each group.

Fig. 3

Figure 3. Comparison of quantitative real-time PCR (RQ-PCR) and microarray data for caveolin-3, diabetesrelated ankyrin repeats (Darp) and selected ESTs. Data are normalized to values at day –2 and natural logarithmically transformed. Relative fold changes for Cav3 (A) and 168291_f_at (“Sequence T”) (C) obtained from microarrays were comparable to those obtained from RT-PCR, while fold changes for Darp (B) and 133932_at (“Sequence K”) (D) showed differences in magnitude, but not in direction, between both methods. Apparent fold changes for Darp and “Sequence K” are lower by GeneChip analysis than by RT-PCR, likely due to a greater dynamic range for the later method.

Fig. 4

Figure 4. Comparison of GeneChip data with protein levels and localization for selected genes. (A) Chart representing changes in mRNA levels detected by GenChip analysis and normalized by CAGED 1.1 as in Figure 1 (logarithmic scale). Aqp1 = aquaporin 1 (93330_at, L02914); Dag1 = dystroglycan 1 (101109_at, U43512); Actn3 = αactinin-3 (100879_at, AF093775); Cav3 = caveolin-3 (162297_s_at, AV023068); Myh = myosin heavy chain, polypeptide 1 (98308_at, AJ002522). (B) Immunoblot analysis demonstrates two forms of aquaporin 1: the 28 kD lower band represents unglycosylated protein, while the 45 kD and larger upper bands reflect glycosylated forms. Anti-β-dystroglycan antibody detects low levels of the protein at day –1, while actinin-3 and caveolin-3 antibodies are present in fused cells only. Anti-pan actin antibody (at bottom) detects both constant levels of cytoplasmic β actin and increasing levels of sarcomeric actins across the time course. Days when protein lysates were prepared are indicated at top (C) Representative pictures of immunohistochemistry of Cav3 (green), (D) double label immunofluorescence of Actn3 (green) and Myh, fast (red) (E) double-label immunofluorescence of Aqp1 (green) and α-dystroglycan (red). Nuclei were stained with DAPI. Each day of the time course was evaluated; representative pictures are presented (indicated at top).

Fig. 5

Figure 5. Transcriptional patterns of expression for selected genes involved in myogenesis. GeneChip signal values were processed by CAGED as described above. Represented are transcription patterns for myogenic transcription factors (A), cyclins and other cell cycle control genes (B), extracellular matrix-related and other Group II genes (C) and musclespecific excitation-contraction coupling and other ion channel genes classified into Group III clusters.