BAOJ Bioinformatics

5 downloads 11 Views 1MB Size Report
Enrichment portals are online resources like websites which facilitate gene set .... output set that should contain output data for both the search terms (RNA ...

BAOJ Bioinformatics Amit Das and Simanti Bhattacharya, BAOJ Bioinfo 2017 1: 1 1: 003

Review

Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies Amit Das1* and Simanti Bhattacharya1* Dept. of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, WB – 741235, India

*1

Abstract

Function and GO Cellular Component.

Biological pathways and interactions networks are immensely complex and comprise of a number of genes. Deregulation of any of the components of such pathways or networks leads to altered expression of a number of genes. Quantitative expression analysis techniques like microarray analysis, CHIP-Seq, qPCR etc. report list of genes and their respective changes in expression level. However the challenge lies to understand the underlying biology associated with these genes and relating it with the observed biological change. For example a microarray data can provide list of genes with altered expression level in patients with resistance to a particular drug. But that gene list provides no insight in to the biological phenomena responsible for the drug resistance. Gene enrichment analysis (GEA) is a technique that provides functional annotations of a set of genes. By incorporating statistical techniques in to its analytical pipeline, GEA has proved to be a very reliable and useful method of gene annotation and functional bioinformatics.

GWAS

Keywords: GSEA; Gene Enrichment; Computational Systems’ Biology; Microarray; Gene Ontology

Microarray technology is a high-throughput technology that facilitates one stop analysis of expression variations of thousands of genes. It uses a DNA chip which contains thousands of precisely defined spots for cDNA probes.

Key Terms and Definitions Enrichment Portal Enrichment portals are online resources like websites which facilitate gene set enrichment analysis using a set of user provided genes. Input can either be gene symbols of gene IDs and enrichments are typically done for GO terms, diseases, pathways, drugs etc. Gene A gene is a portion of DNA that can either code for a protein or noncoding RNAs like rRNA, tRNA etc. Alterations of gene expression or mutations of gene sequences are often associated with different biological phenomena or diseases. Gene Ontology Gene Ontology (GO) refers to the global bioinformatics approach to unify characteristics of genes or gene products. GO mainly has three different attributes i.e. GO Biological Process, GO Molecular BAOJ Bioinfo, an open access journal

GWAS stands for Genome Wide Association Studies. It is also known as Whole Genome Association Studies (WGAS). By examining genetic variations and associated phenotype across different populations, GWAS links different mutations or single nucleotide polymorphisms (SNPs) with diseases or other clinically important phenotypes. High-Throughput Technology In relation to biological sciences, high-throughput technologies mean automation of classical biological techniques in order to facilitate large scale experimentation, often handling thousands of samples or recording several hundred parameters without significant manual intervention. High-throughput screening or microarray technologies are few such examples. Microarray Technology

*Corresponding author: Amit Das, Dept. of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, WB – 741235, India, Tel: +919874387535; E-mail: [email protected] Simanti Bhattacharya, Dept. of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, WB – 741235, India, Tel: +91-9163774882; E-mail: [email protected] Sub Date: May 14, 2017, Acc Date: May 27, 2017, Pub Date: May 29, 2017. Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from Highthroughput Technologies. BAOJ Bioinfo 1: 003. Copyright: © 2017 Amit Das and Simanti Bhattacharya. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

Phenotype A phenotype is the observable outcome or characteristics of any individual. It is a result of all the interactions between genetic properties and environment. Systems Biology Systems biology is a modern era scientific approach to study biology from a systemic approach. Rather that picking out and understanding the impact of each and every individual unit of a biological entity; systems biology takes a holistic approach taking in out account all relevant units and their interplay.

Introduction Genes are the fundamental building blocks of our genetic material. Genes which are nothing but a stretch of DNA are the underlying codes for production of proteins and RNA (mRNA, rRNA, tRNA, mRNA etc). However the majority of human DNA, known as noncoding DNA which neither produce a protein nor a DNA/RNA are often known to have regulatory roles. The coding portion of human genome is thought to consist of roughly 35,000 different genes. Several genes associated with metabolic or genetic diseases are often found to be part of complex regulatory networks. Recent advances in gene expression analysis techniques like microarray, exome sequencing, qPCR etc. have given rise to enormous amount of data related to changes in altered expressions of genes under different contexts (for example metabolic or genetic diseases). Paired with the complementary advancements in sequencing technologies, researchers now have access to all possible variations of all the genes (at least for all the known model organisms) down to the level of single nucleotide polymorphism (SNP). With this exponential increase in genetic and expressions data, the major challenge is to make a biological relevance of that data. Gene enrichment analysis(GEA) is the technique which helps researchers to find out the biological relevance out of a set of genes. GEA is a computational technique that determines the association of a set of genes with standard well established biological terms known as ontologies. By determining the statistical confidence values of each association, GEA helps to bring in weighted correlation in to the analysis pipeline. Through the identification of associated pathways or processes, GEA helps to find out the meaning of large scale genetic data coming out an experimental study. The advantages of GEA includes: increase of signal to noise ratio, consideration of all genes within the set, identifying the effect of even a modest change in gene expression. To improve the statistical power of GEA analysis, several methods like GAGE, sub-GSE, SAM-GS have been developed. Combined with its inherent advantages, these statistical modifications have provided researchers technical flexibility to choose as per their requirement BAOJ Bioinfo, an open access journal

Page 2 of 8

and made GEA the most popular method for analyzing large scale gene expression data. Considering the wide variety of audience of the proposed book title, the authors propose to take a gradual approach to introduce GSEA. First the complexities of biological gene networks are briefly discussed to familiarize the readers with the challenges associated with analyzing set of genes in order to find out the underlying biology. With this context the main section begins. GEA is introduced and details of this technique, its challenges, advantages and limitations are discussed. The authors also discuss about two closely associated topics namely Gene Ontologies (GO) and Genome Wide Association Studies (GWAS).

Background Recent advances in high-throughput technologies like Next Generation Sequencing (NGS), Chip-Seq, Microarray technologies have led to generation of huge amount genetic information in a context dependent manner. Finding list of genes whose expression have been altered due to some constitutive mutation in the promoter or list of genes showing altered expression during certain drug resistance can easily be identified with the use of the above mentioned technologies. However the challenge that has come up in recent years is to find the biological meaning of the observed changes in gene expression. Traditional techniques of working with a small manageable subset of either highly over-expressed or highly under-expressed genes often have several drawbacks [1]. On top of its intrinsic limitations, discrete analyses of some specific genes never provide a systemic view of the entire scenario. Biological changes are a combined result of all the signaling pathways that occur within a cell. Alterations in a particular cellular behavior (for example resistance to a particular drug) seldom results from a single genetic change. Rather cascades of changes triggered by a single genetic event (for example loss of function mutation in P53 gene) are found to be the main causative agent [2]. Detail analyses to relate the observed biological change with its hypothecated genetic reason highlight the importance of underlying biological pathways, many of which become deregulated with the occurrence of the genetic change (i.e. loss of function mutation or deletion). Since a biological pathway or signaling cascade is a dynamic reaction, changes (qualitative or quantitative) in any of its reactants lead to altered flux of other reactants or products; leading to total disequilibrium of the signaling cascade. This in turn results in altered level of expression of a number of genes, all of which are different from the initial mutation containing gene. These set of altered genes may or may not be under direct influence of the initial causative gene. In the end, experiments conducted to understand an observed biological change end up reporting altered expressions of a set of genes, rather than a single one. Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

Enrichment analysis of pathways, biological processes, molecular functions or drugs or diseases using a set of genes helps researchers finding the hidden co-relation between a set of genes and the observed biological change. Depending on the nature of change (i.e. changes in the transcriptomic or proteomic level), observed alterations can be detected either in the RNA or protein level. Changes can also be observed at a particular cellular location i.e. within the nucleus or at cell surface or within the lysosome etc. The advantage of enrichment analysis is that as long as performed experiments report a set of gene, the enrichment algorithms do not get affected by the above mentioned cellular complexities. To highlight the level of cellular complexity, consider the interaction network of the master regulator protein p53. This protein is known to interact with hundreds of proteins or small molecules, many of which regulate each other’s state of activation, localization etc. (Figure 1A). The carbohydrate metabolism pathway (Figure 1B) consists

Page 3 of 8

of hundreds of components and there are several other molecules which regulate each and every step of this pathway. All these pathways or interaction networks (Figure 1A-B) are so tightly regulated that changes in any of its components will exert a massive effect on the entire system. While systems biology approaches (including GEA) try to address such issues, representation and analysis of data adds another level of complexity. The standard heat map based microarray data representation (Figure 1C) is ideally suited to have a glimpse of the overall data but it hardly provides any biological insight in to the data, especially when the number of genes goes beyond a few hundreds. As discussed previously, GEA helps in making the biological relevance out of such complex dataset. In the next section, theoretical basics of enrichment analysis will be discussed followed by a discussion on different types of enrichment analysis. The difference between traditional GSEA and other gene enrichment analyses will also be discussed.

Figure 1. Complexity of biological networks and challenges of interpreting biological data. (A) A representative interaction network of p53 protein. (B) Pathway network of Carbohydrate metabolism (Source: WikiPathways: WP1848_86536). (C) Representative heatmap of a microarray data.

BAOJ Bioinfo, an open access journal

Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

Gene Enrichment Analysis – Technical Details This section provides technical details of GEA. Beginning with a technical description on GO and GWAS, this section provides GEA methodological details. 6.1 GO and GWAS Two different terms are often associated with GEA, namely Gene Ontology (GO) and Genome Wide Association Studies (GWAS). While both are associated with GEA at different level, it is important to understand the meaning of both the terms and their association with GEA. Gene Ontology (GO): Imagine a situation where a researcher is trying to find genes associated with RNA synthesis. While human mind is trained to understand that ‘RNA synthesis’ and ‘transcription’ refers to the same cellular phenomena, a computer program or a database query algorithm treats the two phrases as two different search terms. As a result, different sets of output will be generated and both output sets will only be a subset of the ideal output set that should contain output data for both the search terms (RNA synthesis and transcription). GO consortium aims at solving this issue by using controlled vocabularies (ontology) against a particular biological entity (or phenomena) and assigning those ontology terms to its associated genes [3–5]. Presently GO attributes are available at three different levels i.e. GO Biological Process, GO Molecular Function and GO Cellular Component. GO terms have a parent child relationship (‘is a’ or ‘part of ’) where upper level terms are more general (Figure 2). Any gene associated with a lower level term is by default associated with its upper level term through this parent-child association. Since GO attributes and genes have one to one association, GO terms are often used for GEA.

Page 4 of 8

Genome Wide Association Studies (GWAS): It is a wellestablished fact that disease progression and drug response varies from person to person. When researchers tried to find the reason behind such varieties, no single genetic variation (deletion, duplication, inversion or single nucleotide polymorphism i.e. SNP etc.) could not be identified for most, if not all the cases. Rather researcher came up with a list of genetic differences (mostly SNPs) associated with each disease indication. Studies contributing towards finding and cataloging such disease associated genetic / genomic varieties are termed as GWAS. This relatively new field of study has led to new thinking in determining strategies for disease treatment, specifically in the line of personalized medicine. For GEA, GWAS can be a source of gene or SNP, depending on the research context and aim of study [6]. GWAS data are available from a range of online databases [7–9]000 single-nucleotide polymorphisms (SNPs.

Methodologies of GEA Biological processes are typically made of a number of genes. The basic foundation of GEA is that for an abnormal biological process, which is the focus of a study, its associated genes should have higher probability to be picked up by a high-throughput technology [10]. The gene set database that lies in the background of any GEA platform, contains a list of genes associated with each pathway or disease or drug or other biological phenomena or chemical compound. The researcher or the user of GEA provides a list of genes as input and the algorithm tries to find the number of matches (for example number of gene overlap) between each entity of the dataset and the input list. Thereafter statistical techniques like hyper-geometric distribution or Fisher’s exact test or Chi-square test or Z-Score etc. or a combination of such tests are applied to rank the outputs based on the probability of association of each dataset entry and the input list (Left panel of Figure 3). GSEA on the other hand requires expression profile of a gene set as an input (Right panel of Figure 3). The basic differences between GEA and GSEA will be discussed in a later section. Underlying statistical methods to filter the output data and basic background principle of GEA algorithms mostly separate GEA in to three different groups: Singular Enrichment Analysis (SEA)

Figure 2. Hierarchy of Gene-Ontology terms. Mitochondrial membrane and ER membrane is a membrane while mitochondrial membrane is also part of mitochondria. Mitochondria and membrane both are also part of a Cell.

BAOJ Bioinfo, an open access journal

SEA is the simplest type of GEA. It determines the outcome by iteratively testing each annotation term one by one in a linear manner. However the outcome of this algorithm is heavily dependent on input list selection by the user.

Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

Page 5 of 8

Figure 3. Differences between steps of GEA and GSEA. The major difference between GEA and GSEA is in the form of the input file. While GEA require a list of input genes only, GSEA asks for an expression profile of a set of genes as its input file.

Gene Set Enrichment Analysis (GSEA) The unique property of GSEA is its no cut-off policy i.e. GSEA considers all genes of a microarray experiment without applying any p-value based cut-off. Thus all genes and their expression changes are considered and no arbitrary factor is included in gene selection. Molecular Enrichment Analysis (MEA) Analytical DNA of MEA is very similar to SEA except the fact that MEA incorporates network-discovery algorithms by taking in to account term to term relationships. Considering the scope of this book chapter and its intended broad spectrum of audience, discussions on statistical technical details has been limited to this extent only. The next section will discuss on different types of GEA.

Types of GEA Depending on the research problem in focus, different types of gene enrichment strategies can be implemented. The most popular types of GEA are as follows: GO Enrichment This type of enrichment is performed for either GO Biological Processes or GO Molecular Function or GO Cellular Component [5]. Pathway Enrichment Gene enrichment for biological pathways. Source of biological BAOJ Bioinfo, an open access journal

pathways may come from popular databases like Wikipathways, Reactome, KEGG Pathways etc. Disease Enrichment GEA to find disease association against the input gene set. Disease gene are either manually curate or taken from public databases or may be a combination of both. Drug Enrichment Similar to disease enrichment but for drugs with drug associated genes coming from manual curation or public databases. Phenotype Enrichment GEA studies for species specific phenotypes (human or mouse mostly) for the input gene list. Other than the above mentioned GEA types, a number of other GEA can be performed, for example domain enrichment, literature enrichment, interaction enrichment, enrichment for transcription factor binding, mRNA etc. List of tools mentioned in Table 1 allows users to perform the GEA of their choice.

Difference between GSEA and GEA Overall GSEA or GEA are more or less similar but the term GSEA coined by Subramanian et al., [1] has some technical differences from GEA (Figure 3). For ease of understanding of the readers, the differences between the two terms are very briefly mentioned here. While GEA simply means finding the association of genes from a list of input genes by comparing the number of gene overlaps from

Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

an existing dataset containing list of genes associated with drug/ disease/pathway/GO terms etc, followed by output ranking based on hyper-geometric tests or other statistical methods. A number of tools (Table 1) can perform GEA and one can write their own codes also. On the other hand GSEA is a specific type of GEA which is available from a particular source and has its own background dataset [1] in the form of Molecular Signature Database (MSigDB) as well as the input for GSEA is a gene expression profile. The GSEA algorithm is dependent on the clustering of input genes. For GSEA analysis, only the official tool which is available from their official source (Table 1) is recommended. For better understanding and hands on experience of GSEA and GEA, the authors recommend the online GSEA tutorial (http://software.broadinstitute.org/gsea/ doc/desktop_tutorial.jsp) and the Topp Fun portal [18] respectively.

Present Challenges and Limitations of GSEA GEA has improved from all aspects over the last decade. As discussed previously several different types of enrichment can now be performed for a set of genes. However there lies a basic challenge which remained since the beginning of GEA. The main agenda of this challenge is whether to consider each gene of a set as an independent entity or linked to each other. While most of the GEA algorithms consider each gene as an independent entity, GSEA [1] considers them otherwise. Both approaches have inherent limitations and advantages [1,11,12]. It is fairly easy to understand that if a number of genes from the gene set of interest are found to be part of a single pathway, even moderate changes in each gene expression will lead to greater overall change. On the other hand

Page 6 of 8

certain genes which act on large number of downstream molecules will always be independent of the expression level of its downstream genes. However single large change in its expression level will have significant change on its downstream genes’ expression. Unfortunately there are not pre-defined solutions to this inherent limitation of GEA and it remains an open option to the researcher to decide which method to employ. Depending upon context of the experiment and its underlying research problem, a researcher has to determine the suitable method.

Future Research Directions Traditional bioinformatics discipline used to focus on generation, storage and retrieval of data related to biological systems. Thousands of biological databases created over the last 20 years were the outcome of such efforts. However the main issue with such ever increasing volume of data was to find out answer to the question –‘What to do with such biological data?’ In order to answer this question, a subset of bio-informaticians focused their efforts to make biological interpretation by combining multi-dimensional biological data (i.e. genomic, transcriptomic and proteomic data). Such efforts have led to the origin of a relatively new discipline called functional bioinformatics which aim to annotate biological aspects to genes or proteins. GEA is one such functional bioinformatic approach which has gained increasing relevance to interpret highthroughput biological data over the last decade. GEA has been addressed by different groups and made significant progress over the years. However it still continues to evolve in

Table 1. List of websites or standalone software facilitating enrichment analysis of gene sets. Program / Website name with address

Remark

ToppGene (https://toppgene.cchmc.org/enrichment.jsp)

Online portal: Associated features like candidate gene prioritization, relative weightage calculation of genes in networks are also present.

[23]

Enrichr (http://amp.pharm.mssm.edu/Enrichr/)

Online portal: GEO2Enrichr, A Google Chrome or Mozilla Firefox specific add-on is available to directly export data for analysis from the GEO database.

[24-26]

GSEA

Standalone software: The original GSEA tool.

[1]

WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt/)

Online portal: Registration and login are essential to access this tool.

[27]

EnrichNet (http://www.enrichnet.org/)

Online portal: Network based approach.

[28]

FunRich (http://www.funrich.org/)

Standalone software: Features different visualization mode of output data.

[29]

GeneTrail (http://genetrail.bioinf.uni-sb.de/index.php)

Online portal: Provides species specific enrichment outcome.

[30]

GORILLA (http://cbl-gorilla.cs.technion.ac.il/)

Online portal: Species specific GO term enrichment (although only a few species options are available).

[31]

DAVID Functional Annotation Tool (https://david.ncifcrf.gov/ summary.jsp)

Online portal: Well diversified functional annotation tool including enrichment for drugs, diseases, pathways etc.

[10, 32]

GOEAST (http://omicslab.genetics.ac.cn/GOEAST/index.php)

Online portal: Unbiased GO enrichment.

BAOJ Bioinfo, an open access journal

Reference

[33]

Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

many aspects, especially in terms of its applications. Previous approaches to integrate GEA in data analysis, especially gene expression data analysis, have been very effective. Even today GEA is closely associated with all kinds of high-throughput expression analysis. However what have been greatly missing is extending GEA in the drug-discovery pipelines i.e. application of GEA in translational research. Few studies already highlighted the benefit of incorporating GEA in drug-disease-target analysis and a few tools have also been developed to specifically aid such research [13-17]. Not only novel drug discovery, repositioning of existing or shelved drugs should benefit significantly (both financially and time wise) by incorporating GEA in to their pre-clinical research. Along with drug discovery and repositioning, GEA can also be employed during different phases of clinical trials. As mentioned by the US-FDA in its ‘Guidance for Industry’ (https://www.fda.gov/ downloads/drugs/guidancecomplianceregulatoryinformation/ guidances/ucm332181.pdf), several strategies can be employed by drug-development agencies or their sponsors in order to find the most suitable subset of population where the effectiveness of the drug could be readily demonstrated. With the advent of cheap, fast and reliable sequencing technologies, enrichment strategies (with the help of most significant genes associated with the disease profile) can be employed to predict the safety, tolerability and efficacy of drugs (or biologics) in the patient population. This in turn leads to data-driven patient stratification where the chance of achieving better efficacy (with acceptable or reduced toxicity) of desired drug can be obtained. Not only pharmacological benefit, enrichment based patient stratifications leads to reduced financial burden and lesser time penalty for the clinical trial sponsors. Citing several examples, Mandrekar and Sargent [18] have very well argued the context of employing enrichment in clinical trials study design. Several other studies have already employed enrichment in clinical trial study design [19-20]. Another area of extending the functional impact of GEA would be in structural and molecular bioinformatics that deal with molecular motions of proteins under simulated (naturemimicking) conditions via molecular dynamics (MD) simulations [21-22]. Starting with a set of genes obtained from data analysis of high-throughput technologies like micro-array analysis or next generation sequencing, it would impossible to prioritize a particular gene or a particular biological process or pathway or molecular function. As wet-lab (or experimental) studies now take help of several bioinformatic tools to streamline their area of focus (as well as to reduce the required number of experiments and its associated time and cost), computational biologists or bio-informaticians now can employ GEA on the outcome of high-throughput data in order to further stream-line their research efforts. For example, if GEA of a set of gene associated with an auto-immune disease highlights the BAOJ Bioinfo, an open access journal

Page 7 of 8

importance of Fc glycosylation, MD simulations of the Fc region of an antibody can provide atomic and structural level insight in to the condition. Utilization of GEA in such directions has not been well explored yet but definitely looks very promising.

List of Web Portals or Softwares for Gene Enrichment Analysis While the GSEA tool developed by Subramanian et al., [1] is traditionally the only tool performing GSEA, a number of tools have been developed by researchers across the globe facilitating functional enrichment analysis of genes. A few of them which have been proved to be reliable and undergo regular update have been mentioned in Table 1. While most of them take a list of genes (either Gene Symbol or Gene ID) as an input, some others have extra features which are mentioned in respective remarks section in Table 1.

Conclusion Over the years GEA has been proved to be a very reliable tool to characterize gene sets. It not only provides insight in to the biological context but also helps to understand the context from a systemic approach. Without GEA, analyses of genes would have remained limited to discrete approaches. Recent improvements in statistical methods used in GEA have provided more strength in to the outcomes of such analyses. The typical GSEA approach helps to remove bias by considering all the genes in the set and does not require any predefined cut-off for analyses. Other GEA techniques like pathway enrichment or GO enrichment provide greater flexibility to researchers by providing them with a varieties of enrichment tools that can be implemented as per requirement and does not require any pre-existing quantitative expression dataset.

References 1. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL ,et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43): 15545–15550. doi : 10.1073/pnas.0506580102. 2. Sturm I, Bosanquet AG, Hermann S, Güner D, Dörken B, et al. (2003) Mutation of p53 and consecutive selective drug resistance in B-CLL occurs as a consequence of prior DNA-damaging chemotherapy. Cell Death Differ.10(4): 477-484. doi: 10.1038/sj.cdd.4401194. 3. Gene Ontology Consortium TGO (2001) Creating the gene ontology resource: design and implementation. Genome Res 11: 1425–1433. doi: 10.1101/gr.180801. 4. Huntley RP, Sawford T, Martin MJ, O’Donovan C (2014) Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience 3(1): 4. doi: 10.1186/2047-217X-3-4. 5. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1): 25–29. doi: 10.1038/75556.

Volume 1; Issue 1; 003

Citation: Amit Das and Simanti Bhattacharya (2017) Gene Enrichment Analysis – Exploring Functional Relevance of Genes Obtained from High-throughput Technologies. BAOJ Bioinfo 1: 003.

Page 8 of 8

6. Fridley BL, Biernacka JM (2011) Gene set analysis of SNP data: benefits, challenges, and future directions. Eur J Hum Genet 19(8): 837– 843. doi: 10.1038/ejhg.2011.57.

20. Hu Y, Li L, Ehm MG, Bing N, Song K, et al. (2013) The Benefits of Using Genetic Information to Design Prevention Trials. Am J Hum Genet. 92(4): 547-57. doi: 10.1016/j.ajhg.2013.03.003.

7. Welter D, MacArthur J, Morales J, Burdett T, Hall P, et al. (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42 : D1001–6. doi: 10.1093/nar/gkt1229.

21. Zhang Y (2015) Understanding the impact of Fc glycosylation on its conformational changes by molecular dynamics simulations and bioinformatics. Mol Biosyst. 11: 3415-3424. doi: 10.1039/c5mb00602c.

8. Li MJ, Liu Z, Wang P, Nelson MR, Kocher JP, et al. (2016) GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 44(D1): D869–876. doi: 10.1093/nar/gkv1317.

22. Zhang Y, Ding Y (2016) Molecular dynamics simulation and bioinformatics study on chloroplast stromal ridge complex from rice (Oryza sativa L.) BMC Bioinformatics. 17: 28. doi: 10.1186/s12859-016-0877-0.

9. Beck T, Hastings RK, Gollapudi S, Free RC, Brookes AJ, et al. (2014) GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies. Eur J Hum Genet 22(7): 949–952. doi: 10.1038/ejhg.2013.274. 10. Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37(1): 1–13. doi: 10.1093/nar/ gkn923. 11. Tamayo P, Steinhardt G, Liberzon A, Mesirov JP (2016) The limitations of simple gene set enrichment analysis assuming gene independence. Stat Methods Med Res 25(1): 472–487. doi: 10.1177/0962280212460441. 12. Irizarry RA, Wang C, Zhou Y, Speed TP (2009) Gene set enrichment analysis made simple. Stat Methods Med Res 18(6): 565–575. doi: 10.1177/0962280209351908. 13. Chen L, Chu C, Lu J, Xiangyin Kong, Tao Huang, et al .(2015) Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug TargetBased Classification System. PLoS One 10:e0126492. doi: 10.1371/ journal.pone.0126492. 14. Jia Z, Liu Y, Guan N, Xiaochen Bo, Zhigang Luo, et al . (2016) Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery. BMC Genomics 17: 414. doi: 10.1186/s12864-016-2737-8. 15. Chen Y, Li L, Zhang G-Q, Xu R (2015) Phenome-driven disease genetics prediction toward drug discovery. Bioinformatics 31(12): i276–i283. doi: 10.1093/bioinformatics/btv245. 16. Napolitano F, Sirci F, Carrella D, di Bernardo D (2015) Drug-set enrichment analysis: a novel tool to investigate drug mode of action. Bioinformatics btv536. doi: 10.1093/bioinformatics/btv536. 17. Ellinghaus D, Jostins L, Spain SL, Cortes A3, Bethune J ,et al. (2016) Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat Genet 48(5): 510–518. doi: 10.1038/ng.3528. 18. Mandrekar SJ, Sargent DJ (2009) Clinical Trial Designs for Predictive Biomarker Validation: One Size Does Not Fit All. J Biopharm Stat. 19(3): 530–542. doi:10.1080/10543400902802458. 19. Mattssona N, Carrillob MC, Deanc RA, Devous MD Sr4, Nikolcheva T, et al. (2015) Revolutionizing Alzheimer’s disease and clinical trials through biomarkers. Alzheimers Dement (Amst). 1(4): 412-419. doi: 10.1016/j.dadm.2015.09.001.

BAOJ Bioinfo, an open access journal

23. Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37: W305–11. doi: 10.1093/nar/gkp427. 24. Kuleshov M V, Jones MR, Rouillard AD, Fernandez NF1, Duan Q, et al. (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. doi: 10.1093/nar/gkw377. 25. Chen EY, Tan CM, Kou Y, Qiaonan Duan, Zichen Wang ,et al. (2013) Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14: 128. doi: 10.1186/1471-210514-128. 26. Gundersen GW, Jones MR, Rouillard AD, Kou Y1, Monteiro CD, et al. (2015) GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions: Fig. 1. Bioinformatics 31(18): 3060–3062. doi: 10.1093/bioinformatics/ btv297. 27. Wang J, Duncan D, Shi Z, Zhang B (2013) WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 41: W77–83. doi: 10.1093/nar/gkt439. 28. Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A, et al. (2012) EnrichNet: network-based gene set enrichment analysis. Bioinformatics 28(18): i451–i457. doi: 10.1093/bioinformatics/bts389. 29. Pathan M, Keerthikumar S, Ang C-S, Gangoda L2, Quek CY, et al. (2015) FunRich: An open access standalone functional enrichment and interaction network analysis tool. Proteomics 15(15): 2597– 2601. doi: 10.1002/pmic.201400515. 30. Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, et al. (2007) GeneTrail--advanced gene set enrichment analysis. Nucleic Acids Res 35: W186–92. doi: 10.1093/nar/gkm323. 31. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z ,et al . (2009) GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10:48. doi: 10.1186/1471-2105-10-48. 32. Huang DW, Sherman BT, Lempicki RA . (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1): 44–57. doi: 10.1038/nprot.2008.211. 33. Zheng Q, Wang X-J. (2008) GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. Nucleic Acids Res 36: W358– 363. doi: 10.1093/nar/gkn276.

Volume 1; Issue 1; 003