Characterization of Pharmacophore Multiplet Fingerprints as

0 downloads 0 Views 2MB Size Report
History of Pharmacophore Multiplets. A.C. Good and I.D. Kuntz;. J. Comput.-Aided Mol. Design 1995, 9, 373-379. X. Chen, A. Rusinko, and S.S. Young;. J. Chem.

Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors Robert D. Clark Tripos, Inc.

[email protected]

©2004 Tripos, Inc.

Outline • Background o o

history mechanics

• Finding appropriate binning ranges o

biased conformer generation

• Similarity measures o

stochastic similarity

• Hypothesis generation o

asymmetric similarity

• Conclusions

History of Pharmacophore Multiplets A.C. Good and I.D. Kuntz; J. Comput.-Aided Mol. Design 1995, 9, 373-379. X. Chen, A. Rusinko, and S.S. Young; J. Chem. Inf. Comput. Sci. 1998, 38, 1054-1062. J.S. Mason, I. Morize, P.R. Menard, D.L. Cheney, C. Hulme & R.F. Labaudiniere; J. Med. Chem. 1999, 42, 3251-3264. M.J. McGregor & S.M. Muskal; J. Chem. Inf. Comput. Sci. 1999, 39, 569-574. H. Matter and T. Pötter; J. Chem. Inf. Comput. Sci. 1999, 39, 1211-1225. J.S. Mason and B.R. Beno; J. Mol. Graphics Mod. 2000, 18, 438-451 E. Abrahamian, P.C. Fox, L. Nærum, I.T. Christensen, H. Thøgersen & R.D. Clark; J. Chem. Inf. Comput. Sci. 2003, 43, 458-468.

Novo Nordisk / Tripos Tuplets Collaboration • 2 year collaboration to develop and extend existing SYBYL triplet (PDT) technology

• Incorporate pair, triplet and quartet (‘Tuplet) technology • Augmented ‘Tuplets and support for privileged substructures

• Conformers generated on-the-fly or retrieved • Bitmaps created, stored and manipulated in compressed format o

four 1.8 x 109 bit bitmaps stored as ~80kb file

o

0.01-0.5 seconds/molecule

Type III antiarrhythmic: UK 66914

donor atom

positive nitrogen

acceptor atoms

hydrophobic center

hydrophobic center donor/acceptor atoms

Multiplet Fingerprints

… 000010001010000000100100001110100001110000111000000000011001...

Indexing Triplets

2 Vertex joining longest and shortest edges

D

3

A

H 5

Bin: 5, 3, 2 Triplet: H-A-D

Indexing Tetrahedra Problems: • Need a unique mapping • Must deal with chirality • Literally dozens of possible permutations • Mapping must be based on bins and features C

2

2

D

D

4

Plane of symmetry implies no chirality

3 2 A

C

2

4 3 A

4

2

4 3

C

C

4

C

2 D 2 B

Chiral tetrahedra

A

4 C

D 2 B

4 3 4

A

Mapping Quartet Bits Mapping for 7 bins and 3 features (D, A, H)

000000 000001

...

542333*

DDDD DDDA DDDH

...

...

666665 666666

HHHH

Bitmap Size = 76 * 34 = 9,529,569 bits *542333 specifies the + enantiomer; 245333 specifies the - enantiomer

-

+

frequency

beta blockers

frequency

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

150

K+ channel openers

100 50 0 0

frequency

Distribution of Distances Between Features

300 250 200 150 100 50 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

200 150

Type I antiarrythmics

100 50 0 0

1

2

3

4

5

6

7

8

9

edge length (Å)

10

11

12

13

14

15

frequency

1800 1600 1400 1200 1000 800 600 400 200 0

Estrogen Antagonists Type III Antiarrythmics Benzamides Phenothiazines Beta Blockers Type I Antiarrythmics K Channel Openers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

100 Conformer By Class 50000

frequency

Cumulative Distributions across Classes

1 Conformer By Class

Estrogen Antagonists

40000

Type III Antiarrythmics

30000

Benzamides Phenothiazines

20000

Beta Blockers

10000

Type I Antiarrythmics K Channel Openers

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

edge length (Å)

frequency

50000

Estrogen Antagonists

40000

Type III Antiarrythmics

30000

Benzamides Phenothiazines

20000

Beta Blockers

10000

Type I Antiarrythmics K Channel Openers

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

100 Systematic Search Conformers By Class

frequency

Effect of Biased Conformer Generation

100 Confort Conformer By Class

16000 14000 12000 10000 8000 6000 4000 2000 0

Estrogen Antagonists Type III Antiarrythmics Benzamides Phenothiazines Beta Blockers Type I Antiarrythmics K Channel Openers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

edge length (Å)

Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 000 001 200 210 210 331 333 433

Binary Compound Fingerprints

0

1

0

0

0

1

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

1

1

1

0

1

1

1

0

0

1

0

Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 000 001 200 210 210 331 333 433

Binary Compound Fingerprints

Vector Sum Fingerprint

0

1

0

0

0

1

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

1

1

1

0

1

1

1

0

0

1

0

1

3

1

3

0

2

4

1

Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 111 211 311 321 321 442 444 544

0

1

0

0

0

1

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

1

1

1

0

1

1

1

0

0

1

0

Vector Sum Fingerprint

1

3

1

3

0

2

4

1

Feature Weights

3

3

3

3

4

4

5

6

Bin Weights

3

4

5

6

6

10

12

13

Bit Score

9

36

15

54

24

80

240

78

Binary Compound Fingerprints

Weighting Bits for Hypothesis Generation nf

nd

i =1

j =1

Sb = f b × ∑ fwi × ∑ dw j Sb is the score for the bit fb is the frequency of the bit fwi is the weight of the feature type dwj is the weight of the distance bin

⇒Construct an hypothesis from the highest scoring bits.

f1 d2

d1 f2

d3

f3

Hypothesis Fingerprint Creation DDD DDD DDA DAA DDH DAH DHH HHH 111 211 311 321 321 442 444 544

0

1

0

0

0

1

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

1

1

1

0

1

1

1

0

0

1

0

Vector Sum Fingerprint

1

3

1

3

0

2

4

1

Feature Weights

3

3

3

3

4

4

5

6

Bin Weights

3

4

5

6

6

10

12

13

Bit Score

9

36

15

54

24

80

240

78

Binary Compound Fingerprints

Sanity Checker

N tn S= Nt

Similarity Measures • Tanimoto coefficient t ( A, B) =

pdt ( A) ∩ pdt ( B) pdt ( A) ∪ pdt ( B)

• Cosine coefficient Cc (a, b) =

pdt (a ) ∩ pdt (b) pdt (a ) × pdt (b)

• Stochastic cosine coefficient s ( A, B) =

[

[

E pdt ( A) ∩ pdt * ( B )

] [

]

E pdt ( A) ∩ pdt * ( A) × E pdt ( B ) ∩ pdt * ( B)

]

Effect of Conformer Count on Stochastic Cosine Similarity

similarity

0.6 0.5

Estrogen_Antagonist Class Similarity

0.4

Estrogen_Antagonist Non-Class Similarity K_openers Class Similarity

0.3

K_openers Non-Class Similarity

0.2

benzamides Class Similarity

0.1

benzamides Non-Class Similarity

0 0

100 200 300 400 500 600 700 800 900 1000

conformer count (max)

Effect of Conformer Count on Stochastic Cosine Discrimination discrimination ratio

14.0000 12.0000

I_Antiarrythmics

10.0000

III_Antiarrythmics Phenothiazines

8.0000

beta Blocker

6.0000

Benzamides

4.0000

K_openers

2.0000

Estrogen_Antagonist

0.0000 1

10

100

conformer count (max)

1000

discrimination ratio discrimination ratio

Discrimination and Similarity Measure

14.0000

simple cosine

12.0000

I_Antiarrythmics III_Antiarrythmics

10.0000

Phenothiazines

8.0000

beta Blocker

6.0000

Benzamides

4.0000

K_openers

2.0000

Estrogen_Antagonist

0.0000 1

10

100

1000

20.0000

Tanimoto

I_Antiarrythmics III_Antiarrythmics

15.0000

Phenothiazines

10.0000

beta Blocker Benzamides

5.0000

K_openers Estrogen_Antagonist

0.0000 1

10

100

1000

conformer count (max)

discrimination ratio

CONFORT

12.0000

I_Antiarrythmics III_Antiarrythmics

10.0000

Phenothiazines

8.0000

beta Blocker

6.0000

Benzamides

4.0000

K_openers

2.0000

Estrogen_Antagonist

0.0000 1

discrimination ratio

Discrimiantion and Conformer Bias

14.0000

10

100

1000

14.0000

systematic search

12.0000

I_Antiarrythmics III_Antiarrythmics

10.0000

Phenothiazines

8.0000

beta Blocker

6.0000

Benzamides

4.0000

K_openers

2.0000

Estrogen_Antagonist

0.0000 1

10

100

conformer count (max)

1000

Symmetric Similarity Measures • Symmetric stochastic cosine s ( A, B) =

[

[

E pdt ( A) ∩ pdt * ( B)

] [

E pdt ( A) ∩ pdt * ( A) × E pdt ( B) ∩ pdt * ( B )

• Asymmetric stochastic cosine s *(h, t ) =

]

[

E pdt (h) ∩ pdt (t )

[

]

E pdt (h) ∩ pdt *(h)

]

]

average similarity

0.6 0.5 0.4 0.3 0.2 0.1 0

CONFORT within class

symmetric cosine

100 Conformers without class 0

average similarity

Effect of Hypoothesis Size (Type III antiarrhythmics)

asymmetric stochastic cosine

200

0.6 0.5 0.4 0.3 0.2 0.1 0

400

600

800

1000

systematic search within class

1000 Conformers without class 0

200

400

600

bits in hypothesis

800

1000

Conclusions • Compression is cool • Natural binning does make sense o o

1.75 3 4 5 6 7 8 8.75 9.75 10.75 11.75 13 15 >15Å at least for triplets

• Systematic bias increases discrimination o o

rule-based conformational bias can be useful caveat: it may limit lead-hopping

• More is not necessarily better o o

true in terms of conformation count true in terms of multiplet hypothesis size

• A little asymmetry can be a good thing • Compression is still cool

Acknowledgements

www.tripos.com

Novo Nordisk A/S (Denmark) Lars Nærum* Henning Thøgersen* Tripos, Inc. Edmond Abrahamian Peter Fox Trevor Heritage

May the multiplets be with you...

What a Protein “Sees”

(electrostatic field at 0.5 Å resolution, 80 and 30% contours)

What the Chemist Sees H3C

O S

Cl

O

O

N

O

H3C

N N

O

F O

H3C

N

H3C

N H

CF3

O tetrahydrophthalimide (American Cyanamide)

trifluorotoluidide pyrazole ether (Monsanto)

Pharmacophoric Features hydrogen bond acceptors

H3C

O S

Cl

O O

N

O

H3C

N N

O

F

H3C

O

hydrophobic centers

N

O

H3C

hydrogen bond donor

N H

CF3

Conformational Sampling*

*diverse conformers obtained using CONFORT

Mapping Multiplets Mapping for 7 bins and 3 features (D, A, H)*

000

001

...

532

1 bit DDD

DDA

DDH

...

665

...

Bitmap Size = 73 * 33 = 9261 bits * Features are handled in the order supplied by the application.

666

HHH

Hypothesis Generation Multiple methods implemented for hypothesis generation o

From a collection of known actives

o

From a user defined UNITY® query

o

From a single molecule pharmacophore map a)

o

Single or multiple generated conformers

From user specified residues in receptor cavity

Privileged Substructures: Augmented Triplets DS HY

AA

HY @_AUGMENTED # name DONOR_SITE

. =NULL

.

mnemonic DS

xref AA

weight 3.0

min_dist 2.5

max_dist 3.5

Effect of Conformer Count on Cosine Coefficient Similarity

similarity discrimination ratio

0.6 0.5

Estrogen_Antagonist Class Similarity

14.0000 0.4

Estrogen_Antagonist Non-Class Similarity

12.0000

K_openers Class Similarity I_Antiarrythmics

0.3 10.0000

III_Antiarrythmics K_openers Non-Class Similarity Phenothiazines

8.0000

beta Blocker benzamides Class Similarity Benzamides

0.2

6.0000 4.0000

K_openers benzamides Non-Class Similarity Estrogen_Antagonist

0.1

2.0000 0.0000 0 0

1 200 300 10 100700 800 1000 100 400 500 600 900 1000

conformer count (max)