The octane number (ON) of organic com poundsâa measure of their anti knock abilityâ depends in a rather complex way on the molecular structure.
ISSN 00125008, Doklady Chemistry, 2011, Vol. 436, Part 1, pp. 5–10. © Pleiades Publishing, Ltd., 2011. Original Russian Text © E.A. Smolenskii, A.N. Ryzhov, M.I. Milina, A.L. Lapidus, 2011, published in Doklady Akademii Nauk, 2011, Vol. 436, No. 1, pp. 58–63.
CHEMISTRY
Modeling the Octane Numbers of Alkenes E. A. Smolenskii, A. N. Ryzhov, M. I. Milina, and Corresponding Member of the RAS A. L. Lapidus Received June 29, 2010
DOI: 10.1134/S0012500811010034
The octane number (ON) of organic com pounds—a measure of their antiknock ability— depends in a rather complex way on the molecular structure. Until recently, none of the models con structed in the framework of the structure–property problem [1] for ONs of hydrocarbons has reflected similar relationships with a squared correlation coeffi cient exceeding 0.92, which follows from numerous studies [2–6] (exceptions are specific models con structed by us for alkanes and cyclanes [7, 8]).
(here, Ii is the TI value for the ith compound, N is the number of subgraphs used in calculation, j is the sub graph number, [xj]i is the number of occurrences of the jth subgraph into the molecular graph of the ith com pound, ai are varied coefficients). It is evident that N must be smaller than the number of compounds in the sample Nmax. A property is assumed to be a linear func tion of I; in the simplest case, the procedure involves selection of subgraphs with maximal aj values in a triv ial index (I at N = Nmax). It was shown in [9] that the use of the TIs thus obtained from Eq. (1) (optimal topological indices, OTIs), in contrast to the other TIs, gives models that predict with maximum reliabil ity property values for highmolecularweight com pounds on the basis of training sets of compounds with considerably lower molecular weights.
As a rule, in modeling structure–property relation ships, topological indices (TIs), which are invariants of molecular graphs, are used [1]. The choice and con struction of TIs are mainly heuristic and largely depend on the intuition of the researcher. Therefore, successful use of some TI for describing the property under consideration in most cases cannot be justified and is of random character. About 1500 TIs are cur rently known, which are used for different classes of chemical compounds. Inasmuch as each physico chemical property has been experimentally studied for no more than few tens or, in rare instances, hundreds of compounds, the number of TIs is certainly redun dant for solving similar problems since many of them are related by linear equations [9].
As shown in [8], this is possible only if, beginning with some n ≥ n0, a property of nalkanes CnH2n + 2 is a linear function of n. If this is not the case, the follow ing procedure, referred to as the method of inverse functions, is suggested. We find an auxiliary nonlinear function f –1 reflecting the dependence of the OTI on the property (in fact, the inverse of the function f, which, on the contrary, reflects the dependence of the property on the OTI). The argument of this function is exclusively a physicochemical property, and the value of the function is some quantity meeting the above condition and referred to as topological equivalent (TE) of the property [7]. It is precisely this quantity that is modeled by the method reported in [10], and then the transition is made from this quantity to the values of the property by applying the function f. This method was approved for modeling the ONs of octanes and cyclanes, as well as their cetane numbers [11]. In so doing, the initial sample was partitioned into several subsets containing compounds with definite specific features of molecular structure.
Successful attempts to replace such a heuristic approach by a rigorous mathematical procedure have been described [9, 10]. Instead of searching for an appropriate TI to solve a certain structure–property relationship problem, it was suggested using the solu tion of the set of equations N
Ii =
∑ a [x ] j
j i
(1)
j=1
Zelinskii Institute of Organic Chemistry, Russian Academy of Sciences, Leninskii pr. 47, Moscow, 119991 Russia
In this paper, we report on the results of modeling the ONs of alkenes with the aim of predicting the val 5
6
SMOLENSKII et al.
model for alkenes. The resulting OTIs were denoted as Ii. As for the criterion for discarding terms of the sum on the righthand side of Eq. (1), we showed in [7, 8] that the limitations based exclusively on the accuracy of the experimental determination of the property in [10] should be in this case replaced by new requirements. It is evident that it hardly makes sense to use the accuracy of calculations expressed through max(|nP, calc – nP, exp|) and considerably exceeding the analogous defining index Δdef for the sample of nalkanes used for determining nP, exp in [7].
ON 120
80
40
0
1
2
3
4
5
6
n
Let us introduce the quantity
Fig. 1. Octane numbers of alkanes CnH2n + 2 vs. n. Sym bols correspond to experimental data. The solid line shows the approximating curve.
ues of this property with high precision and maximum reliability. First, we recalculated the available ONs of alkenes (Pexp) [2, 3, 12, 13] into TEs (nP, exp). Inasmuch as the P(n) dependence for nalkanes CnH2n + 2 is close to a hyperbolic one [7] (Fig. 1), the TE values were calcu lated by the following equations: –1
n P, exp = f ( P exp ) = – BP exp 2
– ( BP n, exp + D ) –
2 4A ( CP n, exp 2
+ EP n, exp + F ) , 2
cos α sin α⎞ A = M ⎛ – , 2 ⎝ a2 b ⎠ 1⎞ sin α cos α, B = – 2M ⎛ 12 + ⎝ a b 2⎠ 2
for the set S1 and 2
Δ def 2 R i, opt = 1 – D ( n P, exp – n P, calc Si – 1 ) S
i
for the S2–S7 sets, where D is dispersion. Now, if the maximum of the squared correlation coefficient for the TI set of type (1) at fixed N = k is 2
lower than R i, opt , these models must be studied at N = k + 1. If this maximum is as large as or higher than 2
R i, opt , the TI from this set corresponding to the maxi mum R2 value is the OTI for the modeled quantity. Here, we use R2 defined for the modeled quantity X as D ( X exp – X calc ) 2 . R = 1 – D ( X exp )
2
sin α cos α⎞ C = M ⎛ – , 2 ⎝ a2 b ⎠ D = – ( BP 0 + 2An 0 ), E = – ( Bn 0 + 2CP 0 ), 2
2
Δ def 2 R i, opt = 1 – D ( n P, exp ) Si
2
F = An 0 + Bn 0 P 0 + CP 0 – M, where α, a, b, P0, and n0 are empirical constants deter mining a hyperbola (taken from [7]), and M is an arbi trary nonzero factor. Then, the sample of 72 alkenes was partitioned into seven subclasses S1–S7 according to the number of substituents at sp2hybridized atoms and the presence of branching in molecules, provided that the subclass size allowed for this (Table 1). The nP, exp quantities themselves were modeled using OTIs by Eq. (1) only for the S1 set. In the other cases for the Si sets, the dif ference between nP, exp and nP, calc was studied. The value of n was calculated from the Si ⎯ 1 set, using the
The TEs of octane numbers constructed on the basis of the OTIs for the S1–S7 sets were combined into a com mon model. To do this, we defined additional indices L2–L7 corresponding to the jth compound as ⎧ 0, if L ij = ⎨ ⎩ 1, if
[ gi ]j = 0 [ gi ]j ≠ 0
for
i ∈ [ 2; 4 ],
⎧ 0, if [ g i ] j = 0 or L 4j = 0 L ij = ⎨ ⎩ 1, if [ g i ] j ≠ 0 and L 4j ≠ 0 for i ∈ [ 5; 6 ] , and L 7j = L 3j L 4j , where Lij are the values of the Li index for the jth com pound, gi are the subgraphs corresponding to isobutene (i = 2, 5), 2methyl2butene (i = 3), iso butane (i = 4), and 2butene (i = 6). DOKLADY CHEMISTRY
Vol. 436
Part 1
2011
MODELING THE OCTANE NUMBERS OF ALKENES
7
Table 1. Modeling parameters of ONs of alkenes (R1, R2, and R3 are alkyl radicals, and R4 is H or an alkyl radical) Signs of belonging to the Si set Ii
formula
presence of branching at sp3hybridized atoms
I1
CHR1=CHR2, CHR1=CH2
Absent
I2
CH2=CR1R2
Absent
I3
CR1R2=CR3R4
Absent
I4
CH2=CHR1
Present
I5
CH2=CR1R2
Present
I6
CHR1=CHR2
Present
I7
CR1R2=CR3R4
Present
a b n0 P0 α DOKLADY CHEMISTRY
Vol. 436
Part 1
2011
Molecule corresponding to the subgraph
Coefficients of Eq. (1)
Ethylene Propene 1Butene 2Pentene 1Hexene 2Hexene 3Hexene 3Heptene 2Octene 2Octene Propane Heptane Isobutene 2Methyl1hexene Methane Pentane 2Methyl2butene 2Methyl2pentene 2Methyl2hexene 3Methyl3hexene Propane 1Butene 1Hexene 2Ethyl1butene 2Methyl1butene 4Methyl1pentene 3,4Dimethyl1pentene 4,4Dimethyl1pentene Propane Ethylene 1Heptene 2,3Dimethyl1butene 2Ethyl3methyl1butene Isobutane Pentane 2Methylpentane 1Butene 5Methyl2hexene 4,4Dimethyl2pentene 1Butene 2Pentene 2Hexene 3Hexene 3Heptene 3Methyl2pentene Ethane Butane Isobutane
–70.16521 –0.01583 –0.01609 0.04606 0.01693 –0.04022 –0.07395 0.05085 –0.04598 –0.04843 0.02801 0.03223 –0.29518 –0.02368 0.05669 –0.06998 0.09860 –0.04977 –0.06754 0.07750 –0.06745 –0.04496 0.10651 0.07695 –0.01848 –0.02755 0.01497 –0.02872 –0.02516 0.02064 0.05796 0.02145 0.03816 –0.02125 0.03999 0.01684 –0.02289 0.00661 –0.01828 0.02875 –0.06895 0.04190 0.07936 –0.07124 0.10601 0.03231 –0.18531 0.12364 1.68444 1.83242 4.62115 107.94588 –0.77465
8
SMOLENSKII et al.
Table 2. Modeling results for the ONs of alkenes Pexp
Alkene
Pcalc
|Pexp – Pcalc |
Training set
Pexp
Pcalc
|Pexp – Pcalc |
2,3Dimethyl1hexene
96.3
96.3
0.0
Alkene
Ethylene
97.3
97.7
0.4
2,3Dimethyl2hexene
93.1
93.1
0.0
Propene
101.8
101.8
0.0
2,2Dimethyl3hexene
106.0
106.1
0.1
1Butene
98.8
98.8
0.0
3Ethyl2methyl1pentene
99.5
99.4
0.1
Isobutene
106.3
106.3
0.0
3Ethyl2methyl2pentene
95.6
95.6
0.0
2Pentene
87.8
87.7
0.1
2,3,4Trimethyl2pentene
96.6
96.6
0.0
2Hexene
92.7
92.7
0.0
2,4,4Trimethyl2pentene
103.5
103.5
0.0
3Hexene
94.0
94.1
0.1
3,4,4Trimethyl2pentene
103.0
103.0
0.0
4Methyl2pentene
98.9
98.9
0.0
4,4Diethyl1heptene
79.8
79.8
0.0
2Ethyl1butene
99.3
99.1
0.2
R2
0.99997
2,3Dimethyl1butene
101.3
101.3
0.0
s
0.08
3,3Dimethyl1butene
105.4
105.2
0.2
|Δmax|
0.35
2Heptene
73.4
73.5
0.1
3Heptene
90.0
89.9
0.1
2Butene
101.6
101.8
0.2
2Methyl1hexene
90.7
90.6
0.1
1Pentene
87.9
88.7
0.8
4Methyl1hexene
86.4
86.4
0.0
3Methyl1butene
97.5
98.7
1.2
2Methyl2hexene
90.4
90.4
0.0
2Methyl1butene
98.3
97.7
0.6
3Methyl2hexene
92.0
92.0
0.0
2Methyl2butene
97.3
97.7
0.4
5Methyl2hexene
94.3
94.3
0.0
1Hexene
76.4
76.1
0.3
2Methyl3hexene
97.9
97.9
0.0
2Methyl1pentene
94.2
93.7
0.5
3Methyl3hexene
96.2
96.2
0.0
3Methyl1pentene
96.0
96.7
0.7
3Ethyl1pentene
95.6
95.8
0.2
4Methyl1pentene
95.7
95.3
0.4
3Ethyl2pentene
93.7
93.7
0.0
2Methyl2pentene
97.8
98.1
0.3
2,3Dimethyl1pentene
99.3
99.3
0.0
3Methyl2pentene
97.2
96.9
0.3
2,4Dimethyl1pentene
99.2
99.3
0.1
2,3Dimethyl2butene
97.4
97.0
0.4
3,3Dimethyl1pentene
103.5
103.5
0.0
1Heptene
54.5
56.3
1.8
3,4Dimethyl1pentene
98.9
98.9
0.0
3Methyl1hexene
82.2
82.5
0.3
2,3Dimethyl2pentene
97.5
97.5
0.0
5Methyl1hexene
75.5
74.4
1.1
2,4Dimethyl2pentene
100.0
100.0
0.0
4Methyl2hexene
97.6
97.5
0.1
4,4Dimethyl2pentene
105.3
105.3
0.0
4,4Dimethyl1pentene
104.4
105.0
0.6
2Ethyl3methyl1butene
97.0
97.0
0.0
3,4Dimethyl2pentene
96.0
97.8
1.8
Test set
1Octene
28.7
28.6
0.1
2,3,3Trimethyl1butene
105.3
105.3
0.0
2Octene
56.3
56.3
0.0
6Methyl2heptene
71.3
70.5
0.8
3Octene
72.5
72.5
0.0
2,5Dimethyl2hexene
95.2
96.8
1.6
4Octene
73.3
73.3
0.0
2,5Dimethyl3hexene
101.9
102.3
0.4
2Methyl1heptene
70.2
70.3
0.1
2,3,3Trimethyl1pentene
106.0
106.1
0.1
6Methyl1heptene
63.8
63.8
0.0
2,4,4Trimethyl1pentene
106.0
106.2
0.2
2Methyl2heptene
79.8
79.8
0.0
R2
0.99572
2Methyl3heptene
94.6
94.6
0.0
s
0.80
6Methyl3heptene
91.3
91.3
0.0
|Δmax|
1.82
DOKLADY CHEMISTRY
Vol. 436
Part 1
2011
MODELING THE OCTANE NUMBERS OF ALKENES TEcalc –70.25
–70.20
–70.15
–70.10
–70.05
9
–70.00
–70.05
–70.10
–70.15
–70.20
–70.25 TEexp Fig. 2. Modeling of topological equivalents (TEs) of alkene ONs.
Then, the common model for all sets can be repre sented by the formula 7
n P, calc
j
= I 1j +
∑L I ,
where Iij is the values of the Ii index for the jth compound. The results of modeling are shown in Fig. 2.
ij ij
After that, using the equation
i=2
2
P n, calc
2
2 n P, calc – D n P, calc – D n P, calc – D – B – E – ⎛ B + E⎞ – C ⎛ + 4F⎞ ⎝ ⎠ ⎝ ⎠ 2A 2A A = f ( n P, calc ) = 2C
we calculated the ON values for alkenes. The direct substitution of the calculated nP, calc(gi), calc values for alkenes into Eq. (2) gives R2 = 0.99926, s = 0.37, and |Δmax| = 1.27 (model no. 1). These values of statistical characteristics are on the whole good, except |Δmax|, exceeding unity. Therefore, the parame ters of index expansions (1) were additionally opti mized. This led to model no. 2 with R2 = 0.99952, s = 0.30, and |Δmax| = 0.89. Attempts to improve this model by optimizing the hyperbola parameters gave only insignificant changes in statistical characteristics, whereas attempts not to optimize some of expansion parameters considerably deteriorated them. The sta tistical characteristics of expansions are given above. The modeling results are shown in Fig. 3. DOKLADY CHEMISTRY
Vol. 436
Part 1
2011
(2)
Finally, we constructed a model, using the mini mal necessary number of initial data (the training set of 48 alkenes), and predicted the ONs of the remain ing 24 compounds of the test set. The sets were opti mized as follows. In the test set, the compound was selected for which the deviation of the calculated value from the experimental one is maximal in mag nitude. Then, we considered models obtained by sub stitution of any of the appropriate compounds in the training set for the above compound. The model that has the lowest maximum (in magnitude) deviation of the calculated value from the experimental one was studied further if this quantity decreased as compared with the previous model; if this was not the case, the previous model was accepted as the final one. Such a model, which can be referred to as extrapolation
10
SMOLENSKII et al.
model, was actually found. The results are presented in Tables 1 and 2 and Fig. 4. Thus, on the basis of the method of optimal topo logical indices, we suggested a new model of calcula tion of octane numbers of alkenes, which have optimal statistical characteristics for prediction. The high effi ciency of the suggested method of constructing calcu lation models for solving structure–property relation ship problems permits its use for prediction and con struction of models of other physicochemical properties for not only hydrocarbons but also for a wide range of organic compounds.
ONexp 120
100
80
60
REFERENCES
40
20
40
60
80
100
120 ONcalc
Fig. 3. Results of modeling alkene ONs: (䊊) model no. 1 and ( ) model no. 2.
ONexp 120
100
80
60
40
20
40
60
80
100
120 ONcalc
Fig. 4. Extrapolation model of alkene ONs: (䊊) training set and ( ) test set.
1. Stankevich, M.I., Stankevich, I.V., and Zefirov, N.S., Usp. Khim., 1988, vol. 67, no. 3, pp. 337–366. 2. Sidorova, A.V., Baskin, I.I., Petelin, D.E., et al., Dokl. Chem., 1996, vol. 350, nos. 4–6, pp. 254–258 [Dokl. Akad. Nauk, 1996, vol. 350, no. 5, pp. 642–645]. 3. Smolenskii, E.A., Vlasova, G.V., and Lapidus, A.L., Dokl. Phys. Chem., 2004, vol. 397, part 1, pp. 145–149 [Dokl. Akad. Nauk, 2004, vol. 397, no. 2, pp. 219–223]. 4. Randic, M., J. Chem. Inf. Comput. Sci., 1997, vol. 37, no. 4, pp. 672–685. 5. Balaban, A.T., Kier, L.B., and Josh, N., MATCH, 1992, no. 28, pp. 13–27. 6. Ghosh, P., Hickey, K.J., and Jaffe, S.B., Ind. Eng. Chem. Res., 2006, vol. 45, no. 1, pp. 337–345. 7. Smolenskii, E.A., Ryzhov, A.N., Bavykin, V.M., et al., Izv. Akad. Nauk, Ser. Khim., 2007, no. 9, pp. 1619– 1632. 8. Smolenskii, E.A., Ryzhov, A.N., Bavykin, V.M., et al., Dokl. Chem., 2007, vol. 417, part 1, pp. 267–272 [Dokl. Akad. Nauk, 2007, vol. 417, no. 3, pp. 347–352]. 9. Smolenskii, E.A., Izv. Akad. Nauk, Ser. Khim., 2006, no. 9, pp. 1447–1453. 10. Smolenskii, E.A., Vlasova, G.V., Platunov, D.Yu., and Ryzhov, A.N., Izv. Akad. Nauk, Ser. Khim., 2006, no. 9, p. 1454. 11. Lapidus, A.L., Smolenskii, E.A., Bavykin, V.M., et al., Neftekhimiya, 2008, vol. 48, no. 4, pp. 277–286. 12. Obolentsev, R.D., Fizicheskie konstanty uglevodorodov zhidkikh topliv i masel (Physical Constants of Hydro carbons of Liquid Fuels and Oils), Moscow: Gostoptekhizdat, 1953. 13. Fizikokhimicheskie svoistva individual’nykh uglevodo rodov (Physicochemical Properties of Individual Hydrocarbons), Tatevskii, V.M, Ed., Moscow: Gostoptekhizdat, 1960.
DOKLADY CHEMISTRY
Vol. 436
Part 1
2011