Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors Robert D. Clark Tripos, Inc.
[email protected]
©2004 Tripos, Inc.
Outline • Background o o
history mechanics
• Finding appropriate binning ranges o
biased conformer generation
• Similarity measures o
stochastic similarity
• Hypothesis generation o
asymmetric similarity
• Conclusions
History of Pharmacophore Multiplets A.C. Good and I.D. Kuntz; J. Comput.-Aided Mol. Design 1995, 9, 373-379. X. Chen, A. Rusinko, and S.S. Young; J. Chem. Inf. Comput. Sci. 1998, 38, 1054-1062. J.S. Mason, I. Morize, P.R. Menard, D.L. Cheney, C. Hulme & R.F. Labaudiniere; J. Med. Chem. 1999, 42, 3251-3264. M.J. McGregor & S.M. Muskal; J. Chem. Inf. Comput. Sci. 1999, 39, 569-574. H. Matter and T. Pötter; J. Chem. Inf. Comput. Sci. 1999, 39, 1211-1225. J.S. Mason and B.R. Beno; J. Mol. Graphics Mod. 2000, 18, 438-451 E. Abrahamian, P.C. Fox, L. Nærum, I.T. Christensen, H. Thøgersen & R.D. Clark; J. Chem. Inf. Comput. Sci. 2003, 43, 458-468.
Novo Nordisk / Tripos Tuplets Collaboration • 2 year collaboration to develop and extend existing SYBYL triplet (PDT) technology
• Incorporate pair, triplet and quartet (‘Tuplet) technology • Augmented ‘Tuplets and support for privileged substructures
• Conformers generated on-the-fly or retrieved • Bitmaps created, stored and manipulated in compressed format o
four 1.8 x 109 bit bitmaps stored as ~80kb file
o
0.01-0.5 seconds/molecule
Type III antiarrhythmic: UK 66914
donor atom
positive nitrogen
acceptor atoms
hydrophobic center
hydrophobic center donor/acceptor atoms
Multiplet Fingerprints
… 000010001010000000100100001110100001110000111000000000011001...
Indexing Triplets
2 Vertex joining longest and shortest edges
D
3
A
H 5
Bin: 5, 3, 2 Triplet: H-A-D
Indexing Tetrahedra Problems: • Need a unique mapping • Must deal with chirality • Literally dozens of possible permutations • Mapping must be based on bins and features C
2
2
D
D
4
Plane of symmetry implies no chirality
3 2 A
C
2
4 3 A
4
2
4 3
C
C
4
C
2 D 2 B
Chiral tetrahedra
A
4 C
D 2 B
4 3 4
A
Mapping Quartet Bits Mapping for 7 bins and 3 features (D, A, H)
000000 000001
...
542333*
DDDD DDDA DDDH
...
...
666665 666666
HHHH
Bitmap Size = 76 * 34 = 9,529,569 bits *542333 specifies the + enantiomer; 245333 specifies the - enantiomer
-
+
frequency
beta blockers
frequency
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
150
K+ channel openers
100 50 0 0
frequency
Distribution of Distances Between Features
300 250 200 150 100 50 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
200 150
Type I antiarrythmics
100 50 0 0
1
2
3
4
5
6
7
8
9
edge length (Å)
10
11
12
13
14
15
frequency
1800 1600 1400 1200 1000 800 600 400 200 0
Estrogen Antagonists Type III Antiarrythmics Benzamides Phenothiazines Beta Blockers Type I Antiarrythmics K Channel Openers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
100 Conformer By Class 50000
frequency
Cumulative Distributions across Classes
1 Conformer By Class
Estrogen Antagonists
40000
Type III Antiarrythmics
30000
Benzamides Phenothiazines
20000
Beta Blockers
10000
Type I Antiarrythmics K Channel Openers
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
edge length (Å)
frequency
50000
Estrogen Antagonists
40000
Type III Antiarrythmics
30000
Benzamides Phenothiazines
20000
Beta Blockers
10000
Type I Antiarrythmics K Channel Openers
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
100 Systematic Search Conformers By Class
frequency
Effect of Biased Conformer Generation
100 Confort Conformer By Class
16000 14000 12000 10000 8000 6000 4000 2000 0
Estrogen Antagonists Type III Antiarrythmics Benzamides Phenothiazines Beta Blockers Type I Antiarrythmics K Channel Openers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
edge length (Å)
Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 000 001 200 210 210 331 333 433
Binary Compound Fingerprints
0
1
0
0
0
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
1
1
0
1
1
1
0
0
1
0
Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 000 001 200 210 210 331 333 433
Binary Compound Fingerprints
Vector Sum Fingerprint
0
1
0
0
0
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
1
1
0
1
1
1
0
0
1
0
1
3
1
3
0
2
4
1
Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 111 211 311 321 321 442 444 544
0
1
0
0
0
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
1
1
0
1
1
1
0
0
1
0
Vector Sum Fingerprint
1
3
1
3
0
2
4
1
Feature Weights
3
3
3
3
4
4
5
6
Bin Weights
3
4
5
6
6
10
12
13
Bit Score
9
36
15
54
24
80
240
78
Binary Compound Fingerprints
Weighting Bits for Hypothesis Generation nf
nd
i =1
j =1
Sb = f b × ∑ fwi × ∑ dw j Sb is the score for the bit fb is the frequency of the bit fwi is the weight of the feature type dwj is the weight of the distance bin
⇒Construct an hypothesis from the highest scoring bits.
f1 d2
d1 f2
d3
f3
Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 111 211 311 321 321 442 444 544
0
1
0
0
0
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
1
1
0
1
1
1
0
0
1
0
Vector Sum Fingerprint
1
3
1
3
0
2
4
1
Feature Weights
3
3
3
3
4
4
5
6
Bin Weights
3
4
5
6
6
10
12
13
Bit Score
9
36
15
54
24
80
240
78
Binary Compound Fingerprints
Sanity Checker
N tn S= Nt
Similarity Measures • Tanimoto coefficient t ( A, B) =
pdt ( A) ∩ pdt ( B) pdt ( A) ∪ pdt ( B)
• Cosine coefficient Cc (a, b) =
pdt (a ) ∩ pdt (b) pdt (a ) × pdt (b)
• Stochastic cosine coefficient s ( A, B) =
[
[
E pdt ( A) ∩ pdt * ( B )
] [
]
E pdt ( A) ∩ pdt * ( A) × E pdt ( B ) ∩ pdt * ( B)
]
Effect of Conformer Count on Stochastic Cosine Similarity
similarity
0.6 0.5
Estrogen_Antagonist Class Similarity
0.4
Estrogen_Antagonist Non-Class Similarity K_openers Class Similarity
0.3
K_openers Non-Class Similarity
0.2
benzamides Class Similarity
0.1
benzamides Non-Class Similarity
0 0
100 200 300 400 500 600 700 800 900 1000
conformer count (max)
Effect of Conformer Count on Stochastic Cosine Discrimination discrimination ratio
14.0000 12.0000
I_Antiarrythmics
10.0000
III_Antiarrythmics Phenothiazines
8.0000
beta Blocker
6.0000
Benzamides
4.0000
K_openers
2.0000
Estrogen_Antagonist
0.0000 1
10
100
conformer count (max)
1000
discrimination ratio discrimination ratio
Discrimination and Similarity Measure
14.0000
simple cosine
12.0000
I_Antiarrythmics III_Antiarrythmics
10.0000
Phenothiazines
8.0000
beta Blocker
6.0000
Benzamides
4.0000
K_openers
2.0000
Estrogen_Antagonist
0.0000 1
10
100
1000
20.0000
Tanimoto
I_Antiarrythmics III_Antiarrythmics
15.0000
Phenothiazines
10.0000
beta Blocker Benzamides
5.0000
K_openers Estrogen_Antagonist
0.0000 1
10
100
1000
conformer count (max)
discrimination ratio
CONFORT
12.0000
I_Antiarrythmics III_Antiarrythmics
10.0000
Phenothiazines
8.0000
beta Blocker
6.0000
Benzamides
4.0000
K_openers
2.0000
Estrogen_Antagonist
0.0000 1
discrimination ratio
Discrimiantion and Conformer Bias
14.0000
10
100
1000
14.0000
systematic search
12.0000
I_Antiarrythmics III_Antiarrythmics
10.0000
Phenothiazines
8.0000
beta Blocker
6.0000
Benzamides
4.0000
K_openers
2.0000
Estrogen_Antagonist
0.0000 1
10
100
conformer count (max)
1000
Symmetric Similarity Measures • Symmetric stochastic cosine s ( A, B) =
[
[
E pdt ( A) ∩ pdt * ( B)
] [
E pdt ( A) ∩ pdt * ( A) × E pdt ( B) ∩ pdt * ( B )
• Asymmetric stochastic cosine s *(h, t ) =
]
[
E pdt (h) ∩ pdt (t )
[
]
E pdt (h) ∩ pdt *(h)
]
]
average similarity
0.6 0.5 0.4 0.3 0.2 0.1 0
CONFORT within class
symmetric cosine
100 Conformers without class 0
average similarity
Effect of Hypoothesis Size (Type III antiarrhythmics)
asymmetric stochastic cosine
200
0.6 0.5 0.4 0.3 0.2 0.1 0
400
600
800
1000
systematic search within class
1000 Conformers without class 0
200
400
600
bits in hypothesis
800
1000
Conclusions • Compression is cool • Natural binning does make sense o o
1.75 3 4 5 6 7 8 8.75 9.75 10.75 11.75 13 15 >15Å at least for triplets
• Systematic bias increases discrimination o o
rule-based conformational bias can be useful caveat: it may limit lead-hopping
• More is not necessarily better o o
true in terms of conformation count true in terms of multiplet hypothesis size
• A little asymmetry can be a good thing • Compression is still cool
Acknowledgements
www.tripos.com
Novo Nordisk A/S (Denmark) Lars Nærum* Henning Thøgersen* Tripos, Inc. Edmond Abrahamian Peter Fox Trevor Heritage
May the multiplets be with you...
What a Protein “Sees”
(electrostatic field at 0.5 Å resolution, 80 and 30% contours)
What the Chemist Sees H3C
O S
Cl
O
O
N
O
H3C
N N
O
F O
H3C
N
H3C
N H
CF3
O tetrahydrophthalimide (American Cyanamide)
trifluorotoluidide pyrazole ether (Monsanto)
Pharmacophoric Features hydrogen bond acceptors
H3C
O S
Cl
O O
N
O
H3C
N N
O
F
H3C
O
hydrophobic centers
N
O
H3C
hydrogen bond donor
N H
CF3
Conformational Sampling*
*diverse conformers obtained using CONFORT
Mapping Multiplets Mapping for 7 bins and 3 features (D, A, H)*
000
001
...
532
1 bit DDD
DDA
DDH
...
665
...
Bitmap Size = 73 * 33 = 9261 bits * Features are handled in the order supplied by the application.
666
HHH
Hypothesis Generation Multiple methods implemented for hypothesis generation o
From a collection of known actives
o
From a user defined UNITY® query
o
From a single molecule pharmacophore map a)
o
Single or multiple generated conformers
From user specified residues in receptor cavity
Privileged Substructures: Augmented Triplets DS HY
AA
HY @_AUGMENTED # name DONOR_SITE
. =NULL
.
mnemonic DS
xref AA
weight 3.0
min_dist 2.5
max_dist 3.5
Effect of Conformer Count on Cosine Coefficient Similarity
similarity discrimination ratio
0.6 0.5
Estrogen_Antagonist Class Similarity
14.0000 0.4
Estrogen_Antagonist Non-Class Similarity
12.0000
K_openers Class Similarity I_Antiarrythmics
0.3 10.0000
III_Antiarrythmics K_openers Non-Class Similarity Phenothiazines
8.0000
beta Blocker benzamides Class Similarity Benzamides
0.2
6.0000 4.0000
K_openers benzamides Non-Class Similarity Estrogen_Antagonist
0.1
2.0000 0.0000 0 0
1 200 300 10 100700 800 1000 100 400 500 600 900 1000
conformer count (max)