Genomic resources for chicken - Wiley Online Library

5 downloads 27718 Views 220KB Size Report
Feb 28, 2005 - E-mail: pba@email.arizona.edu. DOI 10.1002/dvdy. .... quences using the various BLAST al- gorithms. ... amplification templates. ESTs and ...
DEVELOPMENTAL DYNAMICS 232:877– 882, 2005

REVIEWS–A PEER REVIEWED FORUM

Genomic Resources for Chicken Parker B. Antin1,2* and Jay H. Konieczka2 The recent sequencing and draft assembly of a chicken genome has provided biologists with an invaluable research tool that complements a growing list of additional avian genomic resources. For many researchers, finding and using these resources is challenging, because information is presented through an increasing number of Web sites and browser navigation frequently requires specific knowledge and expertise. This primer provides an overview of online genomic resources for the chicken, including the Ensembl, UCSC, and NCBI annotated chicken genome browsers; expressed sequence tag and in situ hybridization databases; and sources for microarrays, cDNAs, and bacterial artificial chromosomes (BACs). Several short tutorials oriented toward the biologist with limited bioinformatics skills outline how to retrieve several types of commonly needed information and reagents. Developmental Dynamics 232:877– 882, 2005. © 2005 Wiley-Liss, Inc. Key words: avian; BAC; chicken; Ensembl; EST; gallus; genome; in situ hybridization; microarray; NCBI; UCSC genome browser Received 6 November 2004; Revised 7 December 2004; Accepted 8 December 2004

INTRODUCTION Since the beginning of the modern era of experimental biology more than one hundred years ago, the chicken has been the most widely used nonmammalian model organism. Research using the chicken has spanned a remarkably broad range of fields, from axis development, neurogenesis, limb and cardiovascular development, to somitogenesis, integument development, virology, immunology, cancer, and gene regulation (Lewis, 1919; Rawles, 1943; Hamburger and Hamilton, 1951; DeHaan, 1963; Cooper et al., 1966; Saunders, 1972; Schoenwolf, 1989; Rose, 2000; Bertocchini and Stern, 2002; Ruijtenbeek et al., 2002). The chicken is also unique among model research organisms because it is one of the most important and fastest growing human food sources (Rosegrant et al., 2001); the chicken research community consists of both biomedical and agricultural scientists. Recent completion of a draft chicken ge-

nome sequence, therefore, is impacting many areas of research. For many researchers identifying and using online genomics resources are challenging, because information is presented through a rapidly increasing number of Web sites and navigating their browsers frequently requires significant background knowledge and expertise. To assist avian biologists with limited bioinformatics skills and to provide a reference resource for more experienced users, this primer presents a brief overview of avian genomic resources plus short tutorials covering several common genomics-related tasks. Links to each Web site are provided in the text and are summarized in Table 1. An active links version of Table 1 is accessible at http://www3. interscience.wiley.com/cgi-bin/jabout/ 38417/OtherResources.html. AvianNet, the avian information network (http:// www.chicken-genome.org), also contains updated links to avian genomic resources and information.

1

BRIEF HISTORY OF AVIAN GENOMICS The history of efforts to develop avian genomic resources reflects the overlapping goals and needs of the biomedical and poultry research communities. Early efforts to map the genome were spearheaded by poultry scientists seeking to identify quantitative trait loci (QTL) in breeding flocks. Over the span of several decades, this effort resulted in increasingly detailed QTL maps and bacterial artificial chromosome (BAC) libraries that were crucial for rapid assembly of the genomic sequence (for review, see Dodgson, 2003). With a focus on experimentation, biomedical researchers sought to clone and identify cDNAs coding for proteins involved in specific cellular, developmental or physiological processes. In 1998, a white paper proposal to sequence a chicken genome was authored by members of the poultry, biomedical research, and genomics communities (http://www.

Department of Cell Biology and Anatomy, University of Arizona, Tucson, Arizona Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona *Correspondence to: Dr. Parker B. Antin, Department of Cell Biology and Anatomy, PO Box 245044, University of Arizona, Tucson, AZ 85724. E-mail: [email protected] 2

DOI 10.1002/dvdy.20339 Published online 28 February 2005 in Wiley InterScience (www.interscience.wiley.com).

© 2005 Wiley-Liss, Inc.

878 ANTIN AND KONIECZKA

TABLE 1. Summary of Links and Sites URL Chicken information portals AvianNet NCBI Chicken Genome Resources Active Links version of this table Genome browsers Washington University Genome Sequencing Center (WUGSC) University of California, Santa Cruz (UCSC) Ensembl NCBI Chicken Genome MapViewer cDNAs/ESTs/SNPs NCBI dbEST BBSRC ChickEST database ARK Genomics U.D. Chick EST database Bursal Transcript database TIGR G. gallus gene index Chicken variation database BAC resources MRC geneservice CHORI-261 BAC library Microrrays ARK-Genomics Affymetrix Fred Hutchison Cancer Center University of Delaware In situ hybridization GEISHA White paper/meeting reports Chicken Genome White Paper International Chicken Genome Workshop, March 11–13, 2003

chicken-genome.org/events/reports.html). In autumn 2002, the chicken genome was placed on the list of NIH funded genome projects, and in early 2003, sequencing was initiated at Washington University Genome Sequencing Center (WUGSC). The first International Chicken Genome Workshop was held in March 2003 at Hinxton, England (http://www.chicken-genome. org/events/icgwr2003.html) to discuss the chicken genome project and ancillary genomic resources under development. The International Chicken Genome Consortium was organized at this meeting to oversee and rally support for these efforts. It was also agreed that Dave Burt would organize AvianNet, a comprehensive portal to information on the chicken genome and chicken biology (http://www.chicken-genome.org). A

http://www.chicken-genome.org http://www.ncbi.nlm.nih.gov/projects/genome/guide/chicken/ http://www3.interscience.wiley.com/cgi-bin/jabout/38417/ OtherResources.html http://genome.wustl.edu/projects/chicken/ http://genome.ucsc.edu/cgi-bin/hgGateway http://www.ensembl.org/Gallus_gallus/ http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid⫽9031

http://www.ncbi.nlm.nih.gov/dbEST/ http://www.chick.umist.ac.uk http://www.ark-genomics.org/resources/chicken.php http://www.chickest.udel.edu http://pheasant.gsf.de/DEPARTMENT/DT40/dt40Transcript.html http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species⫽g_gallus http://chicken.genomics.org.cn/ http://www.hgmp.mrc.ac.uk/geneservice/reagents/products/ descriptions/chicken_BAC.shtml http://bacpac.chori.org/chicken261.htm http://www.ark-genomics.org/resources/chickens.php http://www.affymetrix.com/products/arrays/specific/chicken.affx http://www.fhcrc.org/shared-resources/genomics http://udgenome.ags.udel.edu/⬃cogburn http://geisha.biosci.arizona.edu http://www.chicken-genome.org/events/reports.html http://www.chicken-genome.org/events/icgwr2003.html

second chicken genomics meeting was held in April 2004 at the Stowers Institute in Kansas City, Missouri (see meeting report elsewhere in this issue). The third chicken genomics meeting will be held at Cold Spring Harbor Laboratory May 8 –11, 2005, and future annual meetings are scheduled for this site.

CHICKEN GENOME PROJECT AND GENOME BROWSERS The chicken genome is organized into 38 pairs of autosomes plus the two sex chromosomes Z and W and has a haploid size of 1.06 ⫻ 109 bp. The genome of a single female inbred red jungle fowl (Gallus gallus) was sequenced and assembled by the WUGSC (International Chicken Genome Sequencing

Consortium, 2004). A 6.6-fold whole genome shotgun coverage was obtained and then aligned to scaffolds of BAC end sequences using a detailed fingerprint map (Ren et al., 2003). The initial draft assembly was announced on March 1, 2004 (http://www.genome. gov/page.cfm?pageID⫽11510730), and updated assemblies are accessible by means of several Web portals. A physical map of the chicken genome has also been published recently (Wallis et al., 2004). The WUGSC site (http:// genome.wustl.edu/projects/chicken; Table 1) contains downloadable sequences for individual chromosomes in several formats, and annotated genome browsers are accessible at Ensembl (http://www.ensembl.org/ Gallus_gallus; Birney et al., 2004; Curwen et al., 2004; Stabenau et al.,

GENOMIC RESOURCES FOR CHICKEN 879

2004; Stalker et al., 2004), the University of California Santa Cruz (UCSC; http://genome.ucsc.edu/cgi-bin/ hgGateway; Kent et al., 2002), and the NCBI Chicken Genome Resource Page (http://www.ncbi.nlm.nih.gov/ genome/guide/chicken; Wheeler et al., 2003). The annotated browsers at Ensembl, UCSC, and NCBI provide sophisticated viewing and querying of the chicken genome. These browsers address the challenge of managing and displaying genomic sequence and the rapidly increasing amount of annotation. Each browser has particular strengths that derive from the specifics of database structure, programming languages, and associated databases that may also be housed within each browser. The UCSC browser, for example, is designed to maximize speed for rapidly displaying a wide array of data. NCBI’s MapViewer benefits greatly by being integrated into the larger family of NCBI databases, enabling users to rapidly retrieve a wide range of related data from within the NCBI framework. A particular strength of the Ensembl browser is its flexibility and the integration of sequence and annotation information for a large and growing list of genomes. The result is a complex database that, although slower than others for certain functions, provides a user-friendly suite of data displays. The entire Ensembl site can also be downloaded and modified by users. Despite these differences in attributes and also in organization and command structure, each of the annotated browsers provides generally similar capabilities. A tremendous amount of information can be obtained from these sites, and time invested learning how to navigate them through online documentation, help features, and tutorials (Fig. 1) will be richly rewarded. For each browser, the genome can be viewed in several ways. A graphical representation of the chromosomes (MapViewer on NCBI; the “Browse a Chromosome” section on the Ensembl Chicken Genome Browser home page) permits viewing of transcribed sequences along each chromosome, from a macroview to individual genes. Genes are shown in context along a short stretch of the chromosome so

that neighboring transcribed sequences on the plus and minus strands can be identified. Individual genes can be identified by searching using keywords or DNA or protein sequences using the various BLAST algorithms. Depending on the browser used, searches can be directed toward several additional types of information, including gene families, expressed sequence tags (ESTs), or unigene sets. A large amount of gene annotation can be displayed when a gene is viewed along a local stretch of chromosome, including related mRNAs, proteins, single nucleotide polymorphisms (SNPs), ESTs, and BAC ends. This information is derived from a large number of sources and can assist in evaluating gene structure and, in some cases, gene identity. Annotation is often directly linked to its source, providing a quick and convenient portal to gene associated information. Details about how to display and use this information are presented in the short tutorials below.

Genomic Resources Chicken genomic libraries. Several groups have prepared chicken BAC or fosmid genomic libraries. The BACPAC resources center at the Children’s Hospital Oakland Research Institute (CHORI; http://bacpac.chori. org) has constructed one chicken BAC library from the same female jungle fowl bird (UCD001 inbred 256) used for the chicken genome sequencing project. A chicken fosmid library and a turkey BAC library are also available. Three additional BAC libraries were prepared at Texas A & M from the same UCD001 inbred 256 chicken (http://hbz.tamu.edu), and an additional BAC library was prepared from a female White Leghorn chicken. BAC end sequences have been mapped to the genome and can be viewed in the genome browsers, enabling users to identify and order BACs containing specific genomic regions (see below). BACs provide an excellent source for cloning defined genomic sequences or for use as polymerase chain reaction amplification templates.

ESTs and cDNAs. ESTs are single-pass sequence reads obtained from cDNAs. Chicken EST sequences are housed at several sites. The NCBI dbEST database (http:// www.ncbi.nlm.nih.gov/dbEST/index. html) can be queried using keyword or DNA sequence and returns more than 500,000 sequences when queried with “gallus.” The BBSRC ChickEST database (http://www.chick.umist.ac.uk; Boardman et al., 2002) contains 340,000 ESTs derived from 21 cDNA libraries and can be searched using keyword, gene ID, or DNA sequence. ESTs have been assembled into cDNA contigs, and cDNAs can be ordered from ARK-Genomics (http://www. ark-genomics.org/resources/chickens. php) or the MRC Geneservice (http:// www.hgmp.mrc.ac.uk/geneservice/ reagents/products/descriptions/chicken_ BAC.shtml) after a simple registration procedure (see below). The BBSRC site provides several additional tools, including an in silico subtraction protocol in which sets of ESTs from tissue-specific cDNA libraries can be subtracted from one another. An RNAi site prediction algorithm and specific searching for noncoding RNAs are also available. The U.D. Chick EST Database at the University of Delaware (http://www.chickest.udel. edu) contains more than 40,000 ESTs searchable by various criteria, and corresponding cDNAs can also be ordered. The Institute for Genomic Research runs a Gallus gallus gene index (http://www.tigr.org/tigr-scripts/ tgi/T_index.cgi?species⫽g_gallus) containing almost 500,000 EST sequences that can be queried and compared in various ways. The Bursal Transcript database (http://pheasant. gsf.de/DEPARTMENT/dt40.html) also contains EST sequences. As discussed above, ESTs mapping to a particular gene can be graphically visualized in the context of gene structure on the NCBI, Ensembl, and UCSC annotated genome browsers.

Microarrays. Several groups have assembled microarrays containing subsets of chicken cDNA sequences. Affymetrix (http://www.affymetrix.com/products/ arrays/specific/chicken.affx) has constructed a GeneChip Chicken Genome

880 ANTIN AND KONIECZKA

array covering almost 33,000 transcripts corresponding to more than 28,000 chicken genes. The GeneChip also contains probe sets for detecting 684 viral transcripts. A 13,000 feature cDNA chip has been generated at the Fred Hutchinson Cancer Research Center (FHCRC) through the combined efforts of researchers at FHCRC, the University of Delaware, and the Roslin Institute (http://www.fhcrc. org/shared_resources/genomics/services. html). This chip is available for purchase from FHCRC or ARK Genomics. A 1,152 chick embryo cDNA array, a 5,000 cDNA chicken immune cDNA array, and a 4,800 cDNA chicken neuroendocrine array are also available from ARK Genomics. An 8,000 feature metabolic/somatic system chip, a 7,000 feature neuroendocrine/reproductive system chip, and the 14K DelMar Chicken Integrated Systems chip are available from Larry Cogburn at the University of Delaware (http:// udgenome.ags.udel.edu/⬃cogburn). It is also anticipated that, by mid 2005, the chicken research community will produce a long oligo array covering most expressed sequences.

In situ hybridization databases. A whole-mount in situ hybridization project called GEISHA (gallus EST in situ hybridization analysis; http:// geisha.biosci.arizona.edu; Bell et al., 2004), combines high throughput in situ hybridization with curation of the literature to assemble a comprehensive database of in situ hybridization patterns for genes expressed in the chicken embryo through day 4 of incubation. In situ hybridization information will be linked to genomic sequence information and will be searchable using several parameters, including keyword, sequence, and anatomical location. A complementary project is under way to generate a high-resolution three-dimensional (3D) gene expression atlas for the chicken embryo. Modeled after the mouse atlas project at the University of Edinburgh (http://genex.hgu.mrc.ac.uk; Baldock et al., 2003), this effort will combine high-resolution 3D mapping of gene expression patterns with a detailed anatomical atlas and visualization tools.

Fig. 1. Tutorials for retrieving several types of chicken genomic information.

GENOMIC RESOURCES FOR CHICKEN 881

SNPs. The International Chicken Polymorphism Map Consortium (2004) has developed a chicken genetic variation map by sequencing at 0.25x coverage the Chinese silkie, layer, and broiler breeds. Comparing these sequences with the 6.6-fold coverage sequence of the red jungle fowl has identified 2.8 million SNPs. The Beijing Genome Institute has launched the Chicken Variation Database (ChickVD; Wang et al., 2004; http://chicken.genomics.org.cn/index.jsp) to house and display information about SNPs, QTLs, and other types of genomic sequence variations. More than 74,000 potential SNPs have been identified in the EST sequences of the BBSRC database using the Polybase program for automated identification of SNPs in cDNA sequences. SNPs can also be viewed within their respective gene sequences in any of the annotated genome browsers.

REFERENCES Baldock R, Bard JBL, Burger A, Burton N, Christiansen J, Feng GJ, Hill B, Houghton D, Kaufman M, Rao J., Sharpe J, Ross A, Stevenson P, Venkataraman S, Waterhouse A, Yang Y, Davidson D. 2003. EMAP and EMAGE: a framework for understanding spatially organized data. Neuroinformatics 1:309 –325. Bell GW, Yatskievych TA, Antin PB. 2004. GEISHA, a high throughput whole mount in situ hybridization screen in chick embryos. Dev Dyn 229: 677–687. Bertocchini F, Stern CD. 2002. The hypoblast of the chick embryo positions the primitive streak by antagonizing nodal signaling. Dev. Cell 3:735–744. Birney E, Andrews TD, Bevan P, others. 2004. An overview of Ensembl. Genome Research 14:925–928. Boardman PE, Sanz-Ezquerro J, Overton IM, Burt DW, Bosch E, Fong WT, Tickle C, Brown WRA, Wilson SA, Hubbard SJ. 2002. A comprehensive collection of chicken cDNAs. Curr. Biol. 12:1965–1969. Cooper MD, Peterson RD, South MA, Good RA. 1966. Functions of Thymus System and Bursa System in Chicken. J. Exp. Med. 123:75–102. Curwen V, Eyras E, Andrew TD, Clarke L, Mongin E, Searle SMJ, Clamp M. 2004. The Ensembl automatic gene annotation system. Genome Res. 14:942–950. DeHaan RL. 1963. Organization of the cardiogenic plate in the early chick embryo. Acta Embryologiae et Morphologiae Experimentalis 6:26 –38. Dodgson JB. 2003. Chicken genome sequence: a centennial gift to poultry genetics. Cytogenet. Genome Res. 102:291–296. Fig. 1. (Continued.)

882 ANTIN AND KONIECZKA

Hamburger V, Hamilton HL. 1951. A series of normal stages in the development of the chick embryo. J. Morphol. 88:49 –92. International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432: 695–716. Kent JW, Sugnet CW, Furey TS, Roskin MK, Pringle TH, Zahler AM, Haussler D. 2002. The human genome browser at UCSC. Genome Res. 12:996 –1006. Lewis MR. 1919. The development of crossstriation in the heart muscle of the chick embryo. Bull. Johns Hopkins Hosp. 30: 176 –188. Rawles ME. 1943. The heart-forming regions of the early chick blastoderm. Physiol. Zool. 16:22–42. Ren C, Lee MK, Yan B, Ding K, Cox B, Romanov MN, Price JA, Dodgson JB, Zhang HB. 2003. A BAC-based physical map of the chicken genome. Genome Res. 13:2754 –2758. Rose SP. 2000. God’s organism? The chick as a model system for memory studies. Learn Mem. 7:1–17.

Rosegrant MR, Paisner MS, Meijer S, Witcover J. 2001. 2020 global food outlook: trends, alternatives, and choices. Washington, DC: International Food Policy Research Institute. Ruijtenbeek K, De Mey JG, Blanco CE. 2002. The chicken embryo in developmental physiology of the cardiovascular system: a traditional model with new possibilities. Am. J. Physiol. Regul. Integr. Comp. Physiol. 283:R549 –550. Saunders JW, Jr. 1972. Developmental control of three-dimensional polarity in the avian limb. Ann. NY Acad. Sci. USA 193:29 –42. Schoenwolf GC, Bortier H, Vakaet L. 1989. Fate mapping the avian neural plate with quail/chick chimeras: Origin of the prospective median wedge cells. J. Exp. Zool. 249:271–278. Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. 2004. The Ensembl Core Software Libraries. Genome Res. 14:929 –933. Stalker J, Gibbins B, Meidl P, Smith JC, Spooner W, Hotz H-R, Cox AV. 2004. The

Ensembl Web Site: Mechanics of a genome browser. Genome Res. 14:951–955. Wallis JW, Aerts J, Groenen MAM, Crooijmans RPMA, Layman D, Graves TA, Scheer DE, Kremitzki C, Fedele MJ, Mudd NK, Cardenas M, Higginbotham J, Carter J, McGrane R, Gaige T, Mead K, Walker J, Albracht D, Davito J, Yang SP, Leong S, Chinwalla A, Sekhon M, Wylie K, Dodgson J, Romanov MN, Cheng H, de Jong PJ, Osoegawa K, Nefedov M, Zhang H, McPherson JD, Krzywinski M, Schein J, Hillier L, Mardis ER, Wilson RK, Warren WC. 2004. A physical map of the chicken genome. Nature 432:761–764. Wang J, He X, Ruan J, Dai M, Chen J, Zhang Y, Hu Y, Ye C, Li S, Cong L, Fang L, Liu B, Li S, Wang J, Burt DW, Wong GK, Yu J, Yang H, Wang J. ChickVD: a sequence variation database for the chicken genome. Nucleic Acids Res 33:D438 –D441. Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeria E, Tatusova TA, Wagner L. 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Res 31:28 –33.