Knowledge Digging in Coxiella burnetii

64 downloads 0 Views 2MB Size Report
Knowledge is knowing that a tomato is a fruit;. What is Knowledge? and wisdom is knowing not to put it in a fruit salad. [Miles Kington]. 3 ...
Bundeswehr Institute of Microbiology

Knowledge Digging in Coxiella burnetii Bioinformatics tools for the 21th century science Mathias C. Walter, Irmtraud Dunger, Benedikt Wachinger and Dimitrios Frangoulidis

What is Knowledge?

Knowledge is Power

[Francis Bacon]

2

What is Knowledge?

Knowledge is knowing that a tomato is a fruit; and wisdom is knowing not to put it in a fruit salad.

[Miles Kington]

3

Flow of Understanding Hierarchy of “Data” vs. Allocation of mental space

Hierarchy of understanding Connectedness

Vision

VISION

Values

Values

Wisdom Goal

Knowledge Model

Information Rules

Data

understanding principles

understanding patterns

Goal Data-handling technology allows human cognitive energy to shift upward.

KNOWLEDGE Model INFORMATION Rules

understanding relations

Data-handling technology system

Observations

→∞

Environment

WISDOM

DATA

Without technology, cognitive allocation mires in low-layer processes.

Index

ENVIRONMENT

→∞

Understanding

[adapted from Carpenter and Cannady, 2004]

[Ackoff, 1989; figure adapted from Scott A. Carpenter]

4

The gap between information and knowledge widens Data

Amount

Information

gap gap gap

Knowledge Domain-specific utility

Time [adapted from Heumann et al., Biomax]

5

Growth of biological data and tools 4.500

million 20

PubMed articles Sequences genomes Biological databases

4.000 18

3.500 3.000

16

2.500 2.000

14

1.500 1.000

12

Number of PubMed Articles

Number of Genomes/Databases

5.000

Where is the knowledge

500 0 1995

10 1997

1999

2001

2003

Year

2005

2007

2009 6

Classic research workflow

Pathogen infects

Literature Search Involved Genes?

Hypothesis

Experiment(s) Algorithm(s)

Analysis

Mechanism Interaction

Hypothesis Host

Aim of research

Methods

Result(s) 7

Data and Text Mining workflow Hypothesis

A

Hypothesis

B

+ Data from experiments

Thesis previously hidden

Full-text articles Free & licensed

Facts

Hypothesis

n

Field of Interest

Data and Text Mining

Knowledge 8

Data Store • Data is structured • Access is easy and fast Francisella

• Examples – – – – –

genomes phenotypes environmental properties functional annotation experimental data

Brucella

Rickettsia

Coxiella

Legionella

Yersinia

Burkholderia

• expression data • interaction data 9

Data Mining Discovery of new patterns and rules in large data sets. Puy lentils

• Clustering Red split lentils

MLVA (14)

MST (10)

6

3

Laird lentils

• Classification

Numbers of markers of three typing systems which discriminate the Coxiella isolates at best.

10 IS (32)

• Association rule learning

Marker

Phenotype

ms26, ms34, IS34

Host

ms03, ms28, COX56, IS31, IS37, IS43

Plasmid type

ms23, ms33, COX2, IS5

Province

10

Text Mining • Deriving high-quality information from text. • Returns a network of semantic relationships. 1. SST (Sentence Splitting and Tokenization) 2. POS (part-of-speech tagging) 3. SLR (Semantic role labeling) 4. PAS (predicate-argument structure)

how? subject

object

noun phrase

I eat tomato salad with a fork . PRP

PAS1

personal pronoun

A0

VERB

V

NOUN

NOUN

A1

PREPOSITION DT

determiner

NOUN

MNR (manner)

11

Text Mining - Example Treatment of chronic infection Eradication of C. burnetii in cases of chronic infection is hampered by the lack of bactericidal activity of numerous antibiotics. Chronic Q  fever Q fever requires prolonged therapy for at least 18 months with a combination of doxycycline (100 mg twice daily) and hydroxychloroquine (200 mg three times daily). Hydroxychloroquine increases the pH within the acidic phagolysosomes, restoring the bactericidal activity of doxycycline. The bactericidal effect of this combination confers lower relapse rates.

5. NER (Named Entity Recognition) 6. RFEE (based on identified PAS) − Relationship − Fact and − Event Extraction

Coxiella burnetii

A. Gikas et al., Q fever: clinical manifestations and treatment. Expert Rev. Anti Infect. Ther. 8(5), 529–539 (2010)

Doxycycline

Legend Node colors Organism Temporal Disease Dose rate Cellular compartments Drug, chem. compound

100 mg twice daily

Arrow types increases decreases

Line colors combination is localized in activity

18 months

Q fever, chronic

& Hydroxychloroquine

pH 200 mg three times daily

phagolysosome 12

Text Mining Application

13

Text Mining Application Search

14

Text Mining Application drug

15

Search

Revisit • Shift to knowledge, wisdom and vision • Focus on novel aspects and hypotheses VISION With automated mining systems, cognitive allocation shifts to Wisdom and Vision.

Values WISDOM Goal KNOWLEDGE Model

D, I and K layers of a text and data mining system.

INFORMATION Rules

Without technology, cognitive allocation mires in low-layer processes.

DATA Index ENVIRONMENT [Ackoff, 1989; figure adapted from Scott A. Carpenter]

→∞ 16

→∞

Summary • • • • • •

Combination of data and text mining Time savings enormous Reveals hidden knowledge Shift to novel aspects Coxiella knowledge network Transferable to other organisms/sciences

17

Thank you for your Attention!

Knowone 

18