Knowledge is knowing that a tomato is a fruit;. What is Knowledge? and wisdom is knowing not to put it in a fruit salad. [Miles Kington]. 3 ...
Bundeswehr Institute of Microbiology
Knowledge Digging in Coxiella burnetii Bioinformatics tools for the 21th century science Mathias C. Walter, Irmtraud Dunger, Benedikt Wachinger and Dimitrios Frangoulidis
What is Knowledge?
Knowledge is Power
[Francis Bacon]
2
What is Knowledge?
Knowledge is knowing that a tomato is a fruit; and wisdom is knowing not to put it in a fruit salad.
[Miles Kington]
3
Flow of Understanding Hierarchy of “Data” vs. Allocation of mental space
Hierarchy of understanding Connectedness
Vision
VISION
Values
Values
Wisdom Goal
Knowledge Model
Information Rules
Data
understanding principles
understanding patterns
Goal Data-handling technology allows human cognitive energy to shift upward.
KNOWLEDGE Model INFORMATION Rules
understanding relations
Data-handling technology system
Observations
→∞
Environment
WISDOM
DATA
Without technology, cognitive allocation mires in low-layer processes.
Index
ENVIRONMENT
→∞
Understanding
[adapted from Carpenter and Cannady, 2004]
[Ackoff, 1989; figure adapted from Scott A. Carpenter]
4
The gap between information and knowledge widens Data
Amount
Information
gap gap gap
Knowledge Domain-specific utility
Time [adapted from Heumann et al., Biomax]
5
Growth of biological data and tools 4.500
million 20
PubMed articles Sequences genomes Biological databases
4.000 18
3.500 3.000
16
2.500 2.000
14
1.500 1.000
12
Number of PubMed Articles
Number of Genomes/Databases
5.000
Where is the knowledge
500 0 1995
10 1997
1999
2001
2003
Year
2005
2007
2009 6
Classic research workflow
Pathogen infects
Literature Search Involved Genes?
Hypothesis
Experiment(s) Algorithm(s)
Analysis
Mechanism Interaction
Hypothesis Host
Aim of research
Methods
Result(s) 7
Data and Text Mining workflow Hypothesis
A
Hypothesis
B
+ Data from experiments
Thesis previously hidden
Full-text articles Free & licensed
Facts
Hypothesis
n
Field of Interest
Data and Text Mining
Knowledge 8
Data Store • Data is structured • Access is easy and fast Francisella
• Examples – – – – –
genomes phenotypes environmental properties functional annotation experimental data
Brucella
Rickettsia
Coxiella
Legionella
Yersinia
Burkholderia
• expression data • interaction data 9
Data Mining Discovery of new patterns and rules in large data sets. Puy lentils
• Clustering Red split lentils
MLVA (14)
MST (10)
6
3
Laird lentils
• Classification
Numbers of markers of three typing systems which discriminate the Coxiella isolates at best.
10 IS (32)
• Association rule learning
Marker
Phenotype
ms26, ms34, IS34
Host
ms03, ms28, COX56, IS31, IS37, IS43
Plasmid type
ms23, ms33, COX2, IS5
Province
10
Text Mining • Deriving high-quality information from text. • Returns a network of semantic relationships. 1. SST (Sentence Splitting and Tokenization) 2. POS (part-of-speech tagging) 3. SLR (Semantic role labeling) 4. PAS (predicate-argument structure)
how? subject
object
noun phrase
I eat tomato salad with a fork . PRP
PAS1
personal pronoun
A0
VERB
V
NOUN
NOUN
A1
PREPOSITION DT
determiner
NOUN
MNR (manner)
11
Text Mining - Example Treatment of chronic infection Eradication of C. burnetii in cases of chronic infection is hampered by the lack of bactericidal activity of numerous antibiotics. Chronic Q fever Q fever requires prolonged therapy for at least 18 months with a combination of doxycycline (100 mg twice daily) and hydroxychloroquine (200 mg three times daily). Hydroxychloroquine increases the pH within the acidic phagolysosomes, restoring the bactericidal activity of doxycycline. The bactericidal effect of this combination confers lower relapse rates.
5. NER (Named Entity Recognition) 6. RFEE (based on identified PAS) − Relationship − Fact and − Event Extraction
Coxiella burnetii
A. Gikas et al., Q fever: clinical manifestations and treatment. Expert Rev. Anti Infect. Ther. 8(5), 529–539 (2010)
Doxycycline
Legend Node colors Organism Temporal Disease Dose rate Cellular compartments Drug, chem. compound
100 mg twice daily
Arrow types increases decreases
Line colors combination is localized in activity
18 months
Q fever, chronic
& Hydroxychloroquine
pH 200 mg three times daily
phagolysosome 12
Text Mining Application
13
Text Mining Application Search
14
Text Mining Application drug
15
Search
Revisit • Shift to knowledge, wisdom and vision • Focus on novel aspects and hypotheses VISION With automated mining systems, cognitive allocation shifts to Wisdom and Vision.
Values WISDOM Goal KNOWLEDGE Model
D, I and K layers of a text and data mining system.
INFORMATION Rules
Without technology, cognitive allocation mires in low-layer processes.
DATA Index ENVIRONMENT [Ackoff, 1989; figure adapted from Scott A. Carpenter]
→∞ 16
→∞
Summary • • • • • •
Combination of data and text mining Time savings enormous Reveals hidden knowledge Shift to novel aspects Coxiella knowledge network Transferable to other organisms/sciences
17
Thank you for your Attention!
Knowone
18