A Bayesian hierarchical model for identifying epitopes ...

0 downloads 0 Views 1MB Size Report
Modelling issues. Results. Model Comparison. Conclusions. A Bayesian hierarchical model for identifying epitopes in peptide microarray data. Serena Arima1.
Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model for identifying epitopes in peptide microarray data. Serena Arima1 Valentina Pecora and Luca Tardella

Greco Italian meeting on Statistics September 2010 Porto San Paolo Sardegna 1

Dip. di metodi e modelli per l’economia, il territorio e la finanza, Sapienza Università di Roma, Italy

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Post-genomic Era: Proteomics From genomic microarray to ... protein!2 In the post-genome era, proteomics has attracted more and more attention due to its potential in understanding biological functions and structures at the protein level.

2 Parmigiani, G., Garett, E.S., Irizarry,R.A. and Zeger,S.L. (2003) The Analysis of Gene Expression Data. Springer.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

⇒ Epitope: 379-QAFDSH-384

Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

⇒ Epitope: 379-QAFDSH-384

Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

⇒ Epitope: 379-QAFDSH-384

Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

⇒ Epitope: 379-QAFDSH-384

Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

⇒ Epitope: 379-QAFDSH-384

Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Peptide microarray in allergology: the EGG PROJECT

Objectives: study allergenic epitopes of ovalbumin (OVA) in patients with egg allergy before and after a sperimental desensitization therapy. 1

identifying which epitopes are recognized by allergic patients;

2

comparing the epitopes identified before and after the treatment for those patients which successfully concluded the desensitization therapy.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Data description 16 patients: 5 negative controls and 11 treated patients; IgE and IgG4 microarray for each patients before and after the treatment; 125 peptides: 15 amino acids overlapping by 12 (6 replicates). 20

SNR IgE: before and after treatment (Successes)

10 5

ova 125

ova 120

ova 115

ova 110

ova 105

ova 95

ova 100

ova 90

ova 85

ova 80

ova 75

ova 70

ova 65

ova 60

ova 55

ova 50

ova 45

ova 40

ova 35

ova 30

ova 25

ova 20

ova 15

ova 5

ova 10

ova 1

0

SNR

15

Before treatment After treatment

comparing the epitopes identified before and after the desensitization therapy. 5

SNR 10

Data description Modelling issues

ova 1 ova 5 ova 10 ova 15 ova 20 ova 25 ova 30 ova 35 ova 40 ova 45 ova 50 ova 55 ova 60 ova 65 ova 70 ova 75 ova 80 ova 85 ova 90 ova 95 ova 100 ova 105 ova 110 ova 115 ova 120 ova 125

0

5

10

2 ova 1 ova 5 ova 10 ova 15 ova 20 ova 25 ova 30 ova 35 ova 40 ova 45 ova 50 ova 55 ova 60 ova 65 ova 70 ova 75 ova 80 ova 85 ova 90 ova 95 ova 100 ova 105 ova 110 ova 115 ova 120 ova 125

identifying epitopes

SNR

1

0

Introduction Results Model Comparison

Problem formalization

peaks

Conclusions

Introduction

Data description

Modelling issues

Results

Model Comparison

Modelling issues Modelling interior spot dependence

5

10

15

Lag

1.0 0.8 −0.2

0.0

0.2

0.4

ACF

0.6

0.8 −0.2

0.0

0.2

0.4

ACF

0.6

0.8 0.6 0.4

ACF

0.2 0.0 −0.2 0

0

5

10

15

0

Lag

Peptide 1 2 3 4

3

Successes: After treatment

1.0

Successes: Before treatment

1.0

Negative Controls

3

5

10 Lag

Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393

I. Cerecedo et al. (2008) and A. Flinterman et al. (2008)

15

Conclusions

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Notation

yprc : SNR for peptide p (p=1,...,125) and replicate r (r=1,...,6) for patient c (r=1,...,11); yprc and yp+1rc are signals of two consecutive peptides in primary structure;

Some similarities with Gottardo et al.(2008) for ChIP-Chip data 4

4

Gottardo et al. (2008) A Flexible and Powerful Bayesian Hierarchical Model for ChIP-Chip Experiments, Biometrics,64,468-478

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model

2 yprc ∼ N(µp + βpc , σpc )

µp = α 0 +

K X

αk (µp−k − α0 ) + γp δp + ε

(1)

(2)

k=1

βpc = a0c + a1c µp + a2c µ2p

(3)

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model

2 yprc ∼ N(µp + βpc , σpc )

µp = α 0 +

K X

αk (µp−k − α0 ) + γp δp + ε

(1)

(2)

k=1

βpc = a0c + a1c µp + a2c µ2p

(3)

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model

2 yprc ∼ N(µp + βpc , σpc )

µp = α 0 +

K X

αk (µp−k − α0 ) + γp δp + ε

(1)

(2)

k=1

βpc = a0c + a1c µp + a2c µ2p

(3)

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model (cont) µp = α0 +

K X

αk (µp−k − α0 ) + γp δp + ε

(2)

k=1

α0 + γp δp

PK

k=1 αk (µp−k

− α0 )

latent AR process

Markov switching component

( 0 δp = 1

∼ Dirichlet(Π[δp−1 + 1, ]) Π =

γp ∼ wExp(λ1 ) + (1 − w )Exp(λ2 )



π00 π10

π01 π11



Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model (cont) µp = α0 +

K X

αk (µp−k − α0 ) + γp δp + ε

(2)

k=1

α0 + γp δp

PK

k=1 αk (µp−k

− α0 )

latent AR process

Markov switching component

( 0 δp = 1

∼ Dirichlet(Π[δp−1 + 1, ]) Π =

γp ∼ wExp(λ1 ) + (1 − w )Exp(λ2 )



π00 π10

π01 π11



Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model (cont)

βpc

βpc = a0c + a1c µp + a2c µ2p P array/patient effect ( Cc=1 βpc = 0)

(2)

Remark Innovative components and new dependences: latent AR component peptides;

structural dependence among

Markov switching component dependence between peptides belonging to the same epitope region.

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

A Bayesian hierarchical model (cont)

βpc

βpc = a0c + a1c µp + a2c µ2p P array/patient effect ( Cc=1 βpc = 0)

(2)

Remark Innovative components and new dependences: latent AR component peptides;

structural dependence among

Markov switching component dependence between peptides belonging to the same epitope region.

Introduction

Data description

Modelling issues

Results

Model Comparison

Inferential issues 1

Identifying a peptide as belonging to an epitope: 0

ppδ = P(δp = 1|data) > πcut 2

Comparison of two experimental condition: 00

Pr (µ1p − µ2p > 2 ) > πcut where 1 and 2 calibrated through negative controls; 0

00

πcut and πcut controlling for FDR as in Newton (2004)5 5

M.A. Newton (2004), Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155-176.

Conclusions

Introduction

Data description

Modelling issues

Results

Model Comparison

Egg Project: Results

Conclusions

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Model comparison We compare the proposed model, labeled M ∗ with two alternative models M1 and M2 : M1 : µp ∼ N(ν0 + δp γp , τ ) with no latent autoregressive component while δp is a Markov process as in the proposed model; M2 : µp ∼ N(ν0 + δp γp , τ ) with no latent autoregressive component and δp ∼ Bernoulli(θ). Diagnostic tools: Global fit indicator: AIC, BIC and DIC; Observed and Predicted ACF;

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusions

Model comparison (cont) Model M∗ M1 M2

DIC -5086.71 -3758.52 -2788.70

∆AIC – 1347.709 2332.650

∆BIC – 1279.562 2236.434

M1: predicted ACF (lag 1)

3 1

0.2

0.3

0.4

0.5

0.6

−0.6

−0.2

0.0

0.2

0.4

0.6

0.8

−0.2

0.0

0.2 Lag 1

M2: predicted ACF (lag2)

0.1

0.2 Lag 2

0.3

0.4

Density

−0.3 −0.4

−0.2

0.0 Lag 2

(b)

0.2

0.4

0.4

0.6

0 1 2 3 4 5 6

Lag 1

M1: predicted ACF (lag 2)

Density 0.0

−0.4

Lag 1

Proposed model: predicted ACF (lag2) 0.0 0.5 1.0 1.5 2.0 2.5

−0.1

0

0.0

0

0.1

0 1 2 3 4 5 6

0.0

Density

2

Density

4

1.0

Density

0.5

4 3 2 1

Density

M2: predicted ACF (lag1)

1.5

Proposed model: predicted ACF (lag1)

−0.2

−0.1

0.0 Lag 2

0.1

0.2

0.3

Introduction

Data description

Modelling issues

Results

Model Comparison

Simulation study Models M ∗ , M1 and M2 are compared in terms of average peak detection performance with respect to 100 data sets have been simulated from the proposed model M ∗ and from M2 . True Model M∗

M2

Model M∗ M1 M2 M∗ M1 M2

FDR 0.068 0.165 0.104 0.088 0.090 0.048

FNR 0.014 0.033 0.071 0.009 0.010 0.005

TDR 0.986 0.890 0.681 0.962 0.960 0.977

Conclusions

Introduction

Data description

Modelling issues

Results

Model Comparison

Conclusion and further developments

Peptide data: overlapping amino acids: a first model-based solution to tackle the new problem; a new peculiar feature autoregressive structure. Further research joint modelling IgE and IgG4; embedding 3D neighboords structure.

Conclusions