Modelling issues. Results. Model Comparison. Conclusions. A Bayesian hierarchical model for identifying epitopes in peptide microarray data. Serena Arima1.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model for identifying epitopes in peptide microarray data. Serena Arima1 Valentina Pecora and Luca Tardella
Greco Italian meeting on Statistics September 2010 Porto San Paolo Sardegna 1
Dip. di metodi e modelli per l’economia, il territorio e la finanza, Sapienza Università di Roma, Italy
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Post-genomic Era: Proteomics From genomic microarray to ... protein!2 In the post-genome era, proteomics has attracted more and more attention due to its potential in understanding biological functions and structures at the protein level.
2 Parmigiani, G., Garett, E.S., Irizarry,R.A. and Zeger,S.L. (2003) The Analysis of Gene Expression Data. Springer.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
⇒ Epitope: 379-QAFDSH-384
Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
⇒ Epitope: 379-QAFDSH-384
Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
⇒ Epitope: 379-QAFDSH-384
Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
⇒ Epitope: 379-QAFDSH-384
Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray Similar to genomic microarray but ... gene peptides and epitopes peptide: a chain of amino acids; epitope: the part of a protein antigen that is recognized by the immune system and it is formed by localized chains of consecutive peptides. Peptide 1 2 3 4
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
⇒ Epitope: 379-QAFDSH-384
Peptide microarrays allow investigators to understand the immune system functioning looking specifically at those epitopes which are recognized in the presence of a pathological status.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Peptide microarray in allergology: the EGG PROJECT
Objectives: study allergenic epitopes of ovalbumin (OVA) in patients with egg allergy before and after a sperimental desensitization therapy. 1
identifying which epitopes are recognized by allergic patients;
2
comparing the epitopes identified before and after the treatment for those patients which successfully concluded the desensitization therapy.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Data description 16 patients: 5 negative controls and 11 treated patients; IgE and IgG4 microarray for each patients before and after the treatment; 125 peptides: 15 amino acids overlapping by 12 (6 replicates). 20
SNR IgE: before and after treatment (Successes)
10 5
ova 125
ova 120
ova 115
ova 110
ova 105
ova 95
ova 100
ova 90
ova 85
ova 80
ova 75
ova 70
ova 65
ova 60
ova 55
ova 50
ova 45
ova 40
ova 35
ova 30
ova 25
ova 20
ova 15
ova 5
ova 10
ova 1
0
SNR
15
Before treatment After treatment
comparing the epitopes identified before and after the desensitization therapy. 5
SNR 10
Data description Modelling issues
ova 1 ova 5 ova 10 ova 15 ova 20 ova 25 ova 30 ova 35 ova 40 ova 45 ova 50 ova 55 ova 60 ova 65 ova 70 ova 75 ova 80 ova 85 ova 90 ova 95 ova 100 ova 105 ova 110 ova 115 ova 120 ova 125
0
5
10
2 ova 1 ova 5 ova 10 ova 15 ova 20 ova 25 ova 30 ova 35 ova 40 ova 45 ova 50 ova 55 ova 60 ova 65 ova 70 ova 75 ova 80 ova 85 ova 90 ova 95 ova 100 ova 105 ova 110 ova 115 ova 120 ova 125
identifying epitopes
SNR
1
0
Introduction Results Model Comparison
Problem formalization
peaks
Conclusions
Introduction
Data description
Modelling issues
Results
Model Comparison
Modelling issues Modelling interior spot dependence
5
10
15
Lag
1.0 0.8 −0.2
0.0
0.2
0.4
ACF
0.6
0.8 −0.2
0.0
0.2
0.4
ACF
0.6
0.8 0.6 0.4
ACF
0.2 0.0 −0.2 0
0
5
10
15
0
Lag
Peptide 1 2 3 4
3
Successes: After treatment
1.0
Successes: Before treatment
1.0
Negative Controls
3
5
10 Lag
Sequence 370-LKNPQEETLQAFDSH-384 373-PQEETLQAFDSHYDY-387 376-ETLQAFDSHYDYTIC-390 379-QAFDSHYDYTICGDS-393
I. Cerecedo et al. (2008) and A. Flinterman et al. (2008)
15
Conclusions
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Notation
yprc : SNR for peptide p (p=1,...,125) and replicate r (r=1,...,6) for patient c (r=1,...,11); yprc and yp+1rc are signals of two consecutive peptides in primary structure;
Some similarities with Gottardo et al.(2008) for ChIP-Chip data 4
4
Gottardo et al. (2008) A Flexible and Powerful Bayesian Hierarchical Model for ChIP-Chip Experiments, Biometrics,64,468-478
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model
2 yprc ∼ N(µp + βpc , σpc )
µp = α 0 +
K X
αk (µp−k − α0 ) + γp δp + ε
(1)
(2)
k=1
βpc = a0c + a1c µp + a2c µ2p
(3)
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model
2 yprc ∼ N(µp + βpc , σpc )
µp = α 0 +
K X
αk (µp−k − α0 ) + γp δp + ε
(1)
(2)
k=1
βpc = a0c + a1c µp + a2c µ2p
(3)
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model
2 yprc ∼ N(µp + βpc , σpc )
µp = α 0 +
K X
αk (µp−k − α0 ) + γp δp + ε
(1)
(2)
k=1
βpc = a0c + a1c µp + a2c µ2p
(3)
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model (cont) µp = α0 +
K X
αk (µp−k − α0 ) + γp δp + ε
(2)
k=1
α0 + γp δp
PK
k=1 αk (µp−k
− α0 )
latent AR process
Markov switching component
( 0 δp = 1
∼ Dirichlet(Π[δp−1 + 1, ]) Π =
γp ∼ wExp(λ1 ) + (1 − w )Exp(λ2 )
π00 π10
π01 π11
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model (cont) µp = α0 +
K X
αk (µp−k − α0 ) + γp δp + ε
(2)
k=1
α0 + γp δp
PK
k=1 αk (µp−k
− α0 )
latent AR process
Markov switching component
( 0 δp = 1
∼ Dirichlet(Π[δp−1 + 1, ]) Π =
γp ∼ wExp(λ1 ) + (1 − w )Exp(λ2 )
π00 π10
π01 π11
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model (cont)
βpc
βpc = a0c + a1c µp + a2c µ2p P array/patient effect ( Cc=1 βpc = 0)
(2)
Remark Innovative components and new dependences: latent AR component peptides;
structural dependence among
Markov switching component dependence between peptides belonging to the same epitope region.
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
A Bayesian hierarchical model (cont)
βpc
βpc = a0c + a1c µp + a2c µ2p P array/patient effect ( Cc=1 βpc = 0)
(2)
Remark Innovative components and new dependences: latent AR component peptides;
structural dependence among
Markov switching component dependence between peptides belonging to the same epitope region.
Introduction
Data description
Modelling issues
Results
Model Comparison
Inferential issues 1
Identifying a peptide as belonging to an epitope: 0
ppδ = P(δp = 1|data) > πcut 2
Comparison of two experimental condition: 00
Pr (µ1p − µ2p > 2 ) > πcut where 1 and 2 calibrated through negative controls; 0
00
πcut and πcut controlling for FDR as in Newton (2004)5 5
M.A. Newton (2004), Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155-176.
Conclusions
Introduction
Data description
Modelling issues
Results
Model Comparison
Egg Project: Results
Conclusions
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Model comparison We compare the proposed model, labeled M ∗ with two alternative models M1 and M2 : M1 : µp ∼ N(ν0 + δp γp , τ ) with no latent autoregressive component while δp is a Markov process as in the proposed model; M2 : µp ∼ N(ν0 + δp γp , τ ) with no latent autoregressive component and δp ∼ Bernoulli(θ). Diagnostic tools: Global fit indicator: AIC, BIC and DIC; Observed and Predicted ACF;
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusions
Model comparison (cont) Model M∗ M1 M2
DIC -5086.71 -3758.52 -2788.70
∆AIC – 1347.709 2332.650
∆BIC – 1279.562 2236.434
M1: predicted ACF (lag 1)
3 1
0.2
0.3
0.4
0.5
0.6
−0.6
−0.2
0.0
0.2
0.4
0.6
0.8
−0.2
0.0
0.2 Lag 1
M2: predicted ACF (lag2)
0.1
0.2 Lag 2
0.3
0.4
Density
−0.3 −0.4
−0.2
0.0 Lag 2
(b)
0.2
0.4
0.4
0.6
0 1 2 3 4 5 6
Lag 1
M1: predicted ACF (lag 2)
Density 0.0
−0.4
Lag 1
Proposed model: predicted ACF (lag2) 0.0 0.5 1.0 1.5 2.0 2.5
−0.1
0
0.0
0
0.1
0 1 2 3 4 5 6
0.0
Density
2
Density
4
1.0
Density
0.5
4 3 2 1
Density
M2: predicted ACF (lag1)
1.5
Proposed model: predicted ACF (lag1)
−0.2
−0.1
0.0 Lag 2
0.1
0.2
0.3
Introduction
Data description
Modelling issues
Results
Model Comparison
Simulation study Models M ∗ , M1 and M2 are compared in terms of average peak detection performance with respect to 100 data sets have been simulated from the proposed model M ∗ and from M2 . True Model M∗
M2
Model M∗ M1 M2 M∗ M1 M2
FDR 0.068 0.165 0.104 0.088 0.090 0.048
FNR 0.014 0.033 0.071 0.009 0.010 0.005
TDR 0.986 0.890 0.681 0.962 0.960 0.977
Conclusions
Introduction
Data description
Modelling issues
Results
Model Comparison
Conclusion and further developments
Peptide data: overlapping amino acids: a first model-based solution to tackle the new problem; a new peculiar feature autoregressive structure. Further research joint modelling IgE and IgG4; embedding 3D neighboords structure.
Conclusions