Bioengineering for polycyclic aromatic hydrocarbon

0 downloads 0 Views 1MB Size Report
Nov 14, 2016 - [11] A. Çelekli, F. Geyik, Artificial neural networks (ANN) approach for modeling of · removal of Lanaset Red G on Chara contraria, Bioresour.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309897839

Bioengineering for polycyclic aromatic hydrocarbon degradation by Mycobacterium litorale: Statistical and artificial... Article in Chemometrics and Intelligent Laboratory Systems · December 2016 DOI: 10.1016/j.chemolab.2016.10.018

CITATIONS

READS

2

80

5 authors, including: Dushyant Dudhagara

Dr Rahul K Rajpara

Maharaja Krishnakumarsinhji Bhavnagar Uni…

Maharaja Krishnakumarsinhji Bhavnagar Uni…

9 PUBLICATIONS 41 CITATIONS

9 PUBLICATIONS 41 CITATIONS

SEE PROFILE

SEE PROFILE

Haren Gosai

Bharti Dave

Maharaja Krishnakumarsinhji Bhavnagar Uni…

Maharaja Krishnakumarsinhji Bhavnagar Uni…

10 PUBLICATIONS 27 CITATIONS

35 PUBLICATIONS 288 CITATIONS

SEE PROFILE

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Bio remediation of PAHs by bacteria along Gulf of Kutch View project

All content following this page was uploaded by Haren Gosai on 14 November 2016.

The user has requested enhancement of the downloaded file.

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Bioengineering for polycyclic aromatic hydrocarbon degradation by Mycobacterium litorale: Statistical and artificial neural network (ANN) approach

crossmark

Dushyant R. Dudhagara, Rahul K. Rajpara, Jwalant K. Bhatt, Haren B. Gosai, Bharti P. Dave



Department of Life Sciences, Maharaja Krishnakumarsinhji Bhavnagar University, Bhavnagar 364001, Gujarat, India

A R T I C L E I N F O

A BS T RAC T

Keywords: Polycyclic aromatic hydrocarbons Biodegradation Bioremediation Central composite design Artificial neural network

The study deals with the modeling for enhancing fluoranthene biodegradation using a conventional processcentric approach response surface methodology, and a comparatively newer, data-centric approach artificial neural network. The study deals with the comparison of two models for enhancing fluoranthene biodegradation using Mycobacterium litorale. The study involves step wise optimization protocol incorporating screening of medium components. The variables of interest were CaCl2, KH2PO4 and, NH4NO3, screened based on PlackettBurman model. The second step involves the CCD matrix, resulting in 51.21% degradation on the 3rd day with R2 value 0.9882. The non-linear multivariate ANN has model predicted 51.28% degradation with 0.9987 R2 value. The root mean square error and mean absolute percentage error values were found to be 0.3234 and 0.5715, respectively. The entire approach has resulted in 51.28% degradation on 3rd day as compared to an unoptimized degradation (26.37%) on 7th day. The values obtained by ANN network were more precise, reliable and reproducible, compared to the conventional RSM model, proving the superiority of ANN model over RSM model. The study thus widens the current understanding of the scientific community for the fabrication, forecasting precisely simulated biological process for green technology.

1. Introduction Polycyclic aromatic hydrocarbons (PAHs) constitute a noteworthy group of recalcitrant and mutagenic pollutants which are produced by an array of anthropogenic activities. Even though they have natural origins, these priority pollutants are often associated with extensive urbanization and industrialization. It is the latter source which is considered to be a major cause of environmental pollution and hence, a focus of many bioremediation regimes [1,2]. Point sources of PAHs contamination are of significant environmental concern. They arise from petroleum spills, coal liquefaction and gasification, urban runoffs etc. [3]. The persistence of PAHs depends on an array of physiological and chemical peculiarities of the surroundings such as structure and type of soil, pH, temperature, oxygen availability, nutrients and water adequacy. Factors such as molecular mass, hydrophobicity, angularity, and sequestration also contribute to degradability and persistence of such pollutants in the environment. Due to extreme hydrophobicity, mutagenicity, toxicity, and elevated persistence, soils and sediments are considered to be the ultimate repositories for these pollutants. PAHs affect the biota through its various toxic actions which is considered to interfere with the normal ⁎

functions of cellular membranes and enzyme systems embedded within. They are considered to be immunosuppresants and can cause severe and chronic health hazards such as kidney and liver damage, DNA adduct formation and can cause genotoxic, teratogenic and carcinogenic catastrophes in humans [4,5]. The present study is a novel attempt for the remediation of historically contaminated site by expanding the current knowledge and application of model development, prediction and successful implementation by using artificial neural network (ANN) over the traditional usage of response surface modeling. The previous work of the authors relate to develop an understanding about the source identification of PAHs and their risk assessment using various multivariate models such principal component analyses (PCA), hierarchical cluster analyses (HCA), total equivalent factor (TEF) etc. [5]. The study incorporated herein is an extension of the previous work [5] in which models such as PCA, HCA and TEF were successfully designed and used for the quantification and risk assessment of ∑PAHs. The other objective includes the enhancement of fluoranthene degradation (current study) for which both – Response surface methodology (RSM) and ANN models were fabricated and implemented with the utmost precision. A comparative case study thus elaborates

Corresponding author. E-mail address: [email protected] (B.P. Dave).

http://dx.doi.org/10.1016/j.chemolab.2016.10.018 Received 13 September 2016; Received in revised form 27 October 2016; Accepted 28 October 2016 0169-7439/ © 2016 Elsevier B.V. All rights reserved.

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

and their optimization using CCD approach (ii) illustrate general frameworks for ANN model design such as selection of network type, determination of appropriate input variables, numbers of hidden neurons, specification for optimizing network training parameters; and (iii) compare RSM and ANN models for fluoranthene biodegradation by establishing neural network which provides a reliable alternative to the current methods which are in use.

the current understanding behind empirical model building, forecasting and successful application of conventional and non-linear models for simulation of microbial process such as biodegradation, which is one of the first reports, to the best of authors’ knowledge. Many of the remediation strategies demand careful execution and modeling, selection of parameters and accurate predictions which corroborate the actual data. The identification of such forecast models is practically very useful for scientific community and researchers at large since they can be used for accurately measured outcomes [6,7]. Classical process-centric model approaches can provide good insights for degradation experiments being too general to be applied without data calibration. They often have approximated processes and parameters which may overlook some of vital factors affecting the process. A process based model requires a lot of input data and parameters which are often unknown. On the other hand, data-centric models are computationally rapid and require few parameters when compared to process-centric models. The latter has emerged as a popular method of choice amongst the scientific and engineering communities [6]. RSM and ANN have been applied successfully since long for accurate designing and modeling of many biophysical, chemical, enzymatic and other environmental processes with greatest precision. RSM is a statistical method for building models and to analyze interactions of independent variables. The main advantage is to reduce experimental runs or trials when compared to traditional optimization approaches [8,9]. It is presumed that ANN requires much more number of experimental trials than RSM for the modeling; but in fact, ANN can also work well even with relatively scarce data or input. Thus, experimental data of RSM should be sufficiently used to build the even effective ANN model, using the same variables [10]. ANN is however, still not used in the field of hydrocarbon biodegradation. ANN is able to extrapolate accuracy and complicated non-linear input-output relationship. ANN, like their physics-based numerical modeling counterpart, requires training and calibration. Each application is an estimation of simple algebraic expression with known coefficients and is executed practically. The technique offers the advantage of flexibility for any additional restraint which may arise during its implementation [6,11]. Moreover, ANN simulates human intuition in making decisions and drawing conclusions when fortified with information. No specific design is necessary for developing ANN model [9]. ANN recognizes patterns based on learning regime through a series of inputs and outputs without assuming or recognizing its nature and interrelations. The property such as polynomial and regression based empirical modeling, makes ANN superior over other modeling applications [12]. The robustness of ANN model and the ability to learn from data have made ANN an attractive tool for nonlinear multivariate modeling. Moreover, it can predict almost all kind of non-linear functions including quadratic, whereas RSM is useful only for quadratic approximations [10,13]. RSM and ANN are mainly compared for modeling prospective, but the present work also has shed light upon its sensitivity analysis and its usefulness in optimization of the process though the validation of the approach. There are a few disadvantages of ANN over RSM such as it is less useful for the analysis of interaction between two components. Moreover, RSM offers smooth and easy optimization based on traditional gradient-based protocols. This makes the process continuous and differentiable, which is not possible with ANN as it is exclusive data-based approach [10]. However, the specific use of RSM and ANN to develop models predictive for degradation of fluoranthene by Mycobacterium litorale and importantly the comparison of both modeling methods for HMW PAHs degradation has yet to be investigated. It was observed through the meticulous search in an open source of literature which has led to the best of authors’ knowledge that the present work is one of the initial attempts. Thus, the main objectives for enhanced fluoranthene degradation using M. litorale study were to; (i) screen various medium components using Plackett-Burman design

2. Materials and methods 2.1. Chemical, reagents and cultivation media Standard PAHs (acenaphthalene, phenanthrene, anthracene, pyrene, and fluoranthene) were purchased from Sigma Aldrich, USA. The chemicals, reagents, and solvents used were purchased from Fisher Scientific, India. The organism (M. litorale) was routinely grown on Bushnell-Hass (BH) medium (Hi-Media, India) with fluoranthene as a sole carbon and energy source. Throughout the study, amber glasswares were used to prevent photo-oxidation of volatile PAHs. 2.2. Organism and its PAHs degradation ability Mycobacterium litorale (GenBank accession number KP725065) has been exploited for the current study, which had been isolated from the sediments of contaminated sites of Alang-Sosiya ship breaking and recycling yard (ASSBRY), Gujarat, which is one of the most polluted coastal areas of the globe with tremendous anthropogenic input [5]. The organism as examined for its ability to degrade an array of LMW and HMW PAHs such as acenaphthalene, phenanthrene, anthracene, pyrene and fluoranthene was found to be a multiple PAHs degrader (data not shown). The organism was routinely grown on BH medium with an amendment of 100 ppm fluoranthene as sole carbon and energy source [14–16]. The organism was preserved at 4 °C until further use. 2.3. Growth linked fluoranthene degradation blueprint M. litorale was examined for observing its growth linked degradation ability in BH medium supplemented with 100 ppm fluoranthene. The flasks were kept on a rotary shaker (New Brunswick, USA) at 30 °C. Growth was measured at 600 nm using UV–vis spectrophotometer (Shimadzu-1800, Japan) and residual fluoranthene was extracted at every 24 h, till seven days [16,17]. 2.4. Estimation of residual fluoranthene For extraction of residual fluoranthene, 15 mL dichloromethane (DCM) was added to flasks, mixed vigorously and sonicated for 5 min with one min of rest. The solvent phase was collected and same procedure was repeated thrice. Aqueous phase was removed by sodium sulphate and collected solvent phase was evaporated with gentle stream of N2 gas. Solid crystals of residual fluoranthene were dissolved in DCM and the extract was subjected to GC–MS (Shimadzu QP2010+, Japan) analysis [18] using a modified method [19,20]. 2.5. Screening of BH medium components using Plackett-Burman design As a preliminary optimization experiment, BH medium components such as MgSO4, CaCl2, KH2PO4, K2HPO4, NH4NO3 and FeCl3 have been evaluated based on the Plackett-Burman (PB) design. Each factor was examined at two levels: -1 (lower level) and +1 (higher level). These six variables were screened in twelve experimental trials (Table 1). 156

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

Table 1 PB design matrix of six variables in terms of actual and coded values. Run Order

MgSO4 (g/L)

CaCl2 (g/L)

KH2PO4 (g/L)

K2HPO4 (g/L)

NH4NO3 (g/L)

FeCl3 (g/L)

Predicted D (%)

Actual D (%)

1 2 3 4 5 6 7 8 9 10 11 12

0.3(1) 0.1(−1) 0.1(−1) 0.3(1) 0.3(1) 0.3(1) 0.1(−1) 0.1(−1) 0.1(−1) 0.3(1) 0.1(−1) 0.3(1)

0.03(1) 0.01(−1) 0.01(−1) 0.01(−1) 0.01(−1) 0.03(1) 0.03(1) 0.01(−1) 0.03(1) 0.01(−1) 0.03(1) 0.03(1)

0.5(−1) 0.5(−1) 0.5(−1) 0.5(−1) 1.5(1) 1.5(1) 1.5(1) 1.5(1) 0.5(−1) 1.5(1) 1.5(1) 0.5(−1)

1.5(1) 1.5(1) 0.5(−1) 0.5(−1) 0.5(−1) 0.5(−1) 0.5(−1) 1.5(1) 0.5(−1) 1.5(1) 1.5(1) 1.5(1)

0.5(−1) 1.5(1) 0.5(−1) 1.5(1) 0.5(−1) 1.5(1) 1.5(1) 1.5(1) 0.5(−1) 0.5(−1) 0.5(−1) 1.5(1)

0.025(−1) 0.075(1) 0.025(−1) 0.075(1) 0.025(−1) 0.075(1) 0.025(−1) 0.025(−1) 0.075(1) 0.075(1) 0.075(1) 0.025(−1)

33.22 19.78 25.98 19.08 14.48 16.60 20.33 12.02 31.26 11.45 20.47 30.05

30.86 18.87 27.07 17.86 14.26 17.23 19.02 12.47 32.31 12.27 20.11 32.41

2.6. Experimental design and modeling 2.6.1. Fluoranthene degradation with screened medium components by RSM using CCD RSM is a sequential, investigative approach to examine the relationship between more than one variable, which gives a statistically significant response. RSM was used to evaluate effect of screened medium components such as CaCl2, KH2PO4 and, NH4NO3 on fluoranthene degradation as resolved by PB design (data not shown). CaCl2, KH2PO4 and, NH4NO3 were studied at five different levels in a set of twenty experimental runs (Table 4). Regression analysis was performed on data obtained from the experiments. Coding of the variables was done according to the following equation:

Xi = (Xi –X0 )/ δXi

(1)

Where, Xi is experimental value of variable, X0 is the midpoint of Xi and δXi is the step change in Xi. Coded value for Xi; i=1, 2, 3, 4. Fluoranthene degradation was analyzed using second order polynomial equation and data were fitted to equation by multi regression procedure. The equation is as given below:

Fig. 1. Schematic diagram of multilayer feed forward neural network.

specification of training algorithm, learning rate, number of iterations and retrains, and training stopping criteria. The number of input and output layers was usually fixed, depending upon input variables and output variables [7,24]. Thus, it is always a trial and error approach in determining the number of hidden nodes during modeling [6,7,25]. There is no formula for the selection but there are some rules of thumb. Hidden layer (Nh) can lie between I and 2I + 1 and that it should not in any way be less than the maximum of I/3 and O, where I and O represent the numbers of input and output nodes, respectively. Fewer nodes in hidden layer is the most preferable since they have better generalization with very few over fitting problems. In the present study, I is 3 and O is 1. Therefore, the number of hidden nodes were calculated to be ≥3/3 = 1 neuron and ≤3*4 = 12 neurons, indicating that the hidden layer may lie between 1 ≤ Nh ≤ 12. Therefore, within this range, Nh was determined by the trial and error approach and neurons were optimized in this spectrum using Mathworks R2015a [6,7]. The neurons used in the MLP neural network have been activated by a non-linear, logistic sigmoid function. Hence, logistic sigmoid functions are most commonly used with the MLP neural networks. Moreover, extreme values do not have extreme effects on output because the function reduces the effect on the performance of logistic sigmoid function. This capability makes ANN very unique and attractive when the raw data have outliers [7,26]. Consequently, in this study, a linear transfer function as an input layer and logistic sigmoid function in hidden and output layers had been used. ANN has been employed to reproduce the same dataset used for RSM except the simulated data (Table 4). These replicates do not contribute in prediction ability improvement of ANN network [27]. The datasets were categorized into three subsets i.e. training (10), validation (2), and testing (2) consisting of 70%, 15%, and 15% respectively. The multilayered feed forward architecture of ANN was used to build

Y = β0 + β1A + β2B + β3C + β1β2AB + β1β3AC + β2β3BC + β1β1A2 + β2β2B 2 + β3β3C 2

(2)

Where, Y is response variable (dependent variable), β0 is intercept (constant), β1, β2, β3 are linear coefficients, β1β2, β1β3, β2β3 are interaction coefficients, β1β1, β2β2, β3β3 are squared coefficients and A, B, C, AB, AC, BC, A2, B2 and, C2 are level of independent variables. Fisher's test values can be determined which in turn express statistical significance. The proportion of variance can be given by the expression of R2 values. 2.6.2. Artificial neural network modelling A feed forward ANN was selected for the optimization of medium components to maximize PAHs degradation efficiency using MATLAB R2015a (Mathworks Inc., USA). The selection of an appropriate learning method is of utmost importance to develop ANN model, since it can be linked to constant improvement of the network by reducing error function, done by training algorithm [21,22]. Back propagation (BP) is the most widely used amongst all, known to minimize errors at each iteration [23]. Thus, a multilayered feed forward ANN has been trained by BP algorithm for the present study (Fig. 1). 2.6.3. Optimization of network design The multilayer perceptron (MLP) has been designed to function optimally in non-linear phenomena. A feed forward MLP consists of two layers: An input and an output, along with one or more hidden layers. Each layer has a certain number of artificial neurons. A basic two step methodology can be followed in order to design this topology: i) to determine the numbers of input, hidden and output layers; and ii) 157

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

predictive model with three medium components such as CaCl2 (g/L), KH2PO4 (g/L), and NH4NO3(g/L) as an input and degradation (%) as an output. The input layer prepares scaled input data to be worked by the hidden layer through weights. These are some small, random, nonzero values that ranged from −1 to +1, trained by BP. The weighted sum of the inputs is transferred to each hidden neuron by activation functions as logistic sigmoid, and then undergoes another weighted sum transformation to get the outputs. In this study, prediction of fluoranthene degradation has been achieved by the help of multilayered feed forward ANN, trained by BP [7]. Therefore, the number of neurons in the hidden layer is the most important criterion that may potentially influence the fidelity of the network. The hidden layer then sums up the weighted inputs along with biases as represented by the following equation: n

Sum= ∑

i =1

xiwi + θ

(3)

Fig. 2. Growth linked fluoranthene degradation pattern.

Where, wi (i =1,n) represents weights of the connection between neurons of input and the hidden layer, θ is defined as the bias and xi signifies the input parameter. An activation function is used to transfer weighted output to a non-linear domain. The weighted output is then subjected to be passed through an activation function, which shifts the space in non-linearity of input data. The logistic function applied in the present study can be demonstrated by:

f (sum) =

1 1+exp(−sum)

error is actually problem dependent; due to computational time considerations, however, the training process can usually be stopped whenever the number of good patterns (i.e. data with normalized errors below values of the target error) is larger than 98%. In addition, a maximum number of iterations are generally specified.

3. Results and discussion (4)

3.1. Growth linked fluoranthene degradation pattern

The output thus produced by hidden layer becomes an input to output layer, as neurons in the output layer produce output by neurons in the hidden layer. The calculated and actual experimental output has been formulated based on an error function. Training an ANN is an iterative process where this pre-specified error function is minimized by adjusting the weights properly. The commonly employed error functions, root-mean-square error (RMSE) and mean absolute percentage error (MAPE) were used in this study, which can be defined as Eqs. (5) and (6), respectively: N

M

(y i −ŷ i )2 n n NM

The unoptimized growth linked fluoranthene degradation pattern by M. litorale resulted in maximum fluoranthene degradation (26.37 ± 0.071%) on 7th day. Degradation was examined as compared to uninoculated flask (control). Fig. 2 indicated that both growth and degradation of fluoranthene increased gradually reaching its maximum on 7th day.

3.2. Screening of BH medium components using PB design

∑i =1 ∑n =1

RMSE=

(5)

The BH medium components were screened in twelve experimental trials. Maximum fluoranthene degradation achieved was 32.41% on 3rd day (Table 1). Screening of BH medium components had been carried out based on their main effect, standard error and p value of each variable using PB model. Out of six variables examined, CaCl2 (p=0.001), KH2PO4 (p= 0.001) and NH4NO3 (p= 0.037) have shown significant main effects on fluoranthene degradation at 95% confidence level (Table 2). The results obtained by ANOVA (Table 3) have revealed that the main effect of factors in the model were highly significant (p=0.001). The above data of PB design clearly indicates that the model is highly significant. Hence, based on PB design, CaCl2, KH2PO4 and NH4NO3 were selected for further optimization study by CCD using RSM.

Where, N refers to the number of patterns used in the training; M denotes the number output nodes; i denotes the index of the input pattern (vector) and yin and ŷin are the actual and predicted outputs, respectively.

MAPE=

1 n

n

∑ i =1

yi − ydi ydi

X100 (6)

Where, n represents the number of points, yi is the predicted value and ydi is the measured value. 2.6.4. Learning rule The network learning and its criteria such as training, learning, number of retrains and iterations, and stopping are importantly needed in the study. The most widely used neural network is BP. The weights of network are carefully adjusted by the gradient descent algorithm which reduces errors along a descent direction. For controlling weight adjustment and for dampening oscillations, two parameters such as learning rate (LR) and momentum factor (MF) are used. In BP learning, the actual outputs are compared with the actual data and error signals can be derived. The signals then can be propagated backward in layer by layer manner for updation of synaptic weights in all lower layers [6,7]. Chiefly, two criteria are used for stopping network: i) stop after a specific run throughout the training data; and ii) stop only when the total sum squared error reaches to the low level. The value of target

Table 2 Estimate effects and coefficients of degradation (%) of fluoranthene analyzed by PB design.

158

Term

Main effect

Coefficient

SE Coefficient

T

p

Constant MgSO4 CaCl2 KH2PO4 K2HPO4 NH4NO3 FeCl3 R2= 96.95%

−0.827 8.190 −10.670 −0.127 −3.170 −2.907

21.228 −0.413 4.095 −5.335 −0.063 −1.585 −1.453

0.5613 0.5613 0.5613 0.5613 0.5613 0.5613 0.5613 R2 Adj. = 93.29%

37.82 −0.74 7.30 −9.51 −0.11 −2.82 −2.59

< 0.0001 0.495 < 0.001 < 0.001 0.915 0.037 0.049

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

Table 3 ANOVA for PB Design.

Table 5 ANOVA for CCD design of medium components.

Source

DF

Seq SS

Adj SS

Adj MS

F

P

Source

DF

Seq SS

Adj SS

Adj MS

F

p

Main effects Residual Error Total

6 5 11

600.36 18.90 619.26

600.36 18.90

100.06 3.78

26.47

0.001

Regression Linear Square Interaction Residual error Lack of Fit Pure error Total

9 3 3 3 10 5 5 19

739.61 437.99 89.57 212.03 8.84 7.48 1.35 748.45

739.61 437.99 89.57 212.03 8.84 7.48 1.35

82.17 145.99 29.86 70.67 140.88 1.49 0.27

92.95 165.14 33.77 79.94

< 0.0001 < 0.0001 < 0.0001 < 0.0001

5.53

0.042

3.3. RSM modeling Optimization of screened medium components by CCD using RSM derived model predicted maximum fluoranthene degradation (%) on 3rd day. According to Table 4, the predicted values of fluoranthene degradation closely matched to the experimentally calculated values. The ANOVA model for fluoranthene degradation by CCD is shown in Table 5. The Fisher's F values of both models were found to be high due to regression. A p value < 0.05 indicates a statistically significant model. The linear and square terms for fluoranthene degradation on 3rd day were also significant (p ≤ 0.001). Interactions between variables were also found to be significant. Thus, the overall model has been found to be precisely accurate, significant, and reproducible. Table 6 shows that the regression coefficient for linear terms CaCl2, KH2PO4 and NH4NO3 were highly significant (p ≤ 0.001). The fitted second order response surface model is as specified by Eq. (7) for degradation of fluoranthene.

Table 6 Regression coefficient of fluoranthene degradation by CCD. Term

Coefficient

SE Coefficient

Constant CaCl2 KH2PO4 NH4NO3 CaCl2*CaCl2 KH2PO4*KH2PO4 NH4NO3*NH4NO3 CaCl2*KH2PO4 CaCl2*NH4NO3 KH2PO4*NH4NO3 R2= 0.9882

43.04 0.38 3.57 0.25 −2.45 0.25 −3.63 0.25 −1.83 0.24 −0.54 0.24 −1.84 0.24 −3.89 0.33 3.20 0.33 1.03 0.33 R2(Adj.)= 0.9776

t

p

112.25 < 0.0001 14.06 < 0.0001 −9.64 < 0.0001 −14.30 < 0.0001 −7.41 < 0.0001 −2.19 0.053 −7.45 < 0.0001 −11.71 < 0.0001 9.64 < 0.0001 3.10 0.011 R2(Pred.)= 0.9160

Y = 43.04 + 3.57X1–2.45X 2–3.63X 3–1.83X –0.54X –1.84X –3.89X1X (7)

2 + 3.20X1X 3 + 1.03X 2X 3

of fluoranthene was indicated by the surface confirmed in the smallest curve (circular) of contour plot. Careful examination of the plots have revealed the optimum value of the process conditions as CaCl2 - 0.06 g/ L, KH2PO4 – 0.55 g/L and NH4NO3 – 0.55 g/L. The divalent ion balance, especially calcium, is important for several membrane-related processes. The activation and stabilization of a number of extracellular enzymes by 0.1–1.0 mM Ca2+ and the influence of calcium in modulation of certain periplasmic and cytoplasmic enzymes is well known; in the present study 0.8 mM L−1 concentration of CaCl2 supported these findings [16,28]. KH2PO4 as inorganic P source could act as buffering component during degrada-

As perceived earlier in ANOVA, the interaction between two variables i.e. CaCl2 and NH4NO3 on fluoranthene degradation was significant whereas KH2PO4 had less significant effect compared to other variables. The regression coefficient (R2= 0.9882), predicted R2 (0.9160) and adjusted R2 (0.9776) indicated that the equation is highly reliable even if the same experiment would be repeated. Fig. 3 represents contour plots for degradation of fluoranthene by M. litorale on 3rd day as observed after a number of repeated experiments. The contour plots have a number of combinations of two variables. The other variables were set at zero alpha level. Maximum degradation (%)

Table 4 Experimental matrix for CCD using RSM and ANN for BH medium components with actual and predicted D (%). Run No.

b

1 2a 3b 4a 5b 6b 7a 8b 9b 10b 11b 12b 13a 14b 15c 16c 17d 18d 19a 20a a b c d

CaCl2 g/L

KH2PO4 g/L

NH4NO3 g/L

Coded

Actual

Coded

Actual

Coded

Actual

1 0 0 0 1 0 0 1 1.68 −1.68 −1 −1 0 0 −1 0 1 −1 0 0

0.09 0.06 0.06 0.06 0.09 0.06 0.06 0.09 0.11 0.009 0.03 0.03 0.06 0.06 0.03 0.06 0.09 0.03 0.06 0.06

−1 0 1.68 0 1 0 0 −1 0 0 −1 1 0 −1.68 −1 0 1 1 0 0

0.30 0.55 0.97 0.55 0.80 0.55 0.55 0.30 0.55 0.55 0.30 0.80 0.55 0.129 0.30 0.55 0.80 0.80 0.55 0.55

−1 0 0 0 −1 −1.68 0 1 0 0 −1 1 0 0 1 1.68 1 −1 0 0

0.30 0.55 0.55 0.55 0.30 0.129 0.55 0.80 0.55 0.55 0.30 0.80 0.55 0.55 0.80 0.97 0.80 0.30 0.55 0.55

Actual D (%)

RSM Predicted D (%)

ANN Predicted D (%)

51.53 43.72 37.21 42.45 35.27 43.46 43.33 47.53 42.83 31.70 41.46 30.39 43.39 44.64 26.95 31.01 37.71 43.09 42.46 43.15

50.2140 43.0496 37.3893 43.0496 35.4572 43.9468 43.0496 47.2844 43.8734 31.8375 41.6824 30.8710 43.0496 45.6415 25.9278 31.7040 36.6526 42.5006 43.0496 43.0496

51.2828 42.7835 37.2100 42.7835 35.2933 43.5500 42.7835 47.3383 42.7569 31.6747 41.5878 30.3757 42.7835 44.6164 27.5418 31.0044 37.7263 43.0211 42.7835 42.7835

= average value of six replicates of central point of three valuables have been considered as training period. = training dataset. = testing dataset. = validation dataset.

159

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

Fig. 3. Contour plots showing the response surface effect of interaction between CaCl2, KH2PO4 and NH4NO3 on fluoranthene degradation on 3rd day.

3.4. Predicted ANN modeling

tion. Moreover, P may play a significant role in maintaining pH for achieving maximum degradation. Addition of nitrogen could improve the growth of bacteria. The results showed that there was sufficient nitrogen in the medium which was beneficial for the degradation of high molecular weight PAHs. Higher fluoranthene degradation in the culture with 0.30 g L−1 nitrogen supplement suggested positive impact on fluoranthene metabolism on 3rd day.

The developed ANN architecture was used to optimize fluoranthene degradation using input neurons network topology. The number of neurons in hidden layer was recognized by training of several ANN topologies and selecting the optimal one, based on minimization of root mean square error (RMSE) and mean absolute percentage error (MAPE) to improve generalization ability of the ANN topology. The

160

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

equation with R2 value 0.9987 which was remarkably close to 0.9882 (R2 value of RSM data set). Thus, the developed ANN model was able to accurately simulate fluoranthene degradation (target) and reproduce experimental results with greater precision. Fluoranthene degradation (target) has been precisely achieved by the incorporation of multilayered feed forward ANN, trained by BP algorithm, with significant R2 values.

Table 7 Optimum ANN training model parameters. Parameters

Value

Number of input nodes Number of hidden neurons Number of output nodes Maximum number of epochs Learning rate Learning rule Error goal Maximum frail

3 8 1 1000 0.01 Back-propagation 0.00 6

3.6.1. External validation of the RSM and ANN dataset In order to alleviate the sensitivity rendered by ANN modeling and enhance the prediction capabilities, the probability of overfitting can be avoided when number of neurons is in range of 8 to 12 as suggested by Choi and Park, (2001) [30]. The forecasting ability of the model, however, has been intermittently observed, as the variables selected were few. The ANN modeling thus has not been linked to PCA or partial least squares (PLS) [30]. However, external validation has been performed between the actual output and the data sets obtained by RSM and ANN as mentioned by Heidari et al., (2016) [31]. Fig. 6 shows the external validation points clearly indicating that the ANN rendered model shows no overfitting and the model is statically more suitable when compared to RSM.

BP algorithm was incorporated to train ANN predictive model. The algorithm uses second-order derivatives of the RMSE between the desired output and the actual output so that better transformation behavior can be obtained [12,29]. In this study, the optimal topology of the ANN model has three input variables, one hidden layer with 12 neurons (Fig. 1) and one output layer (3-8-1) (Table 7). The developed ANN model has shown better predicted value of fluoranthene degradation (%) (Table 4). The value of R2 was 0.9930, and the value of RMSE was 0.5270 using multilayered BP neural network that permitted the lowest RMSE establishing it as the best network for the prediction of optimal conditions for fluoranthene degradation.

3.7. Comparison of RSM and ANN model 3.7.1. Predictive potentiality The predictive capabilities of RSM and ANN models were compared on the basis of R2, RMSE, and MAPE using Minitab V16 (Table 8). The experimental and predicted values of fluoranthene degradation by RSM and ANN are as shown in Table 4. The R2 values for predicted models of RSM and ANN were 0.988 and 0.9987; RMSE values for RSM and ANN were 0.940 and 0.3234; and MAPE value for RSM and ANN were 1.432 and 0.5715, respectively as shown in Table 8. In every possible aspect, ANN model proved to be superior over RSM for degradation of fluoranthene, a four ring HMW PAH. The higher predictive capability of ANN model is attributed to its non-linear polynomials of system whereas RSM has the ability to generalize data on the basis of only quadratic equations. The comparative predictive supremacy of ANN over RSM as a bioprocess modeling tool has also been reported [10,32]; while implementation of ANN model in the study of biodegradation of PAHs has not been reported so far.

3.5. Optimization of the number of hidden neurons The neurons in hidden layer had significantly affected accuracy and prediction of the optimal conditions. If the network topology is simple, the trained networks have no ability to learn properly. Therefore, neurons were optimized so as to determine optimum neuron using best predictive capability and accuracy of the model. The neurons predicted were obtained based on the model performance such as R2, RMSE and MAPE values. The optimal structure of the feed forward network model for neural network is represented in Fig. 1. The optimum eight neurons showed the best predicted capability and high accuracy of the model for fluoranthene biodegradation (Fig. 4). Results showed the higher values of R2 (0.9987), strongly suggests in decrement of both RMSE values (0.3234) and MAPE values (0.5715), which shows that developed ANN model is significant and can be used to predict optimal neuron, which has yielded maximum fluoranthene degradation.

3.7.2. Sensitivity analysis The effect of individual medium components and their interactions can be studied in a better way using ANN when compared to RSM. Coefficients of RSM attributed direct measure of the contribution of various medium components in the system. In this study, a sensitivity analysis has been performed for determining the effectiveness of a parameter by the constructed ANN model using ‘Perturb’ method [10,33,34]. Thus, performances of three variables were examined by the optimal ANN model with twenty neurons in the hidden layer. As shown in Table 6, out of three variables, CaCl2 had the highest coefficient values (3.57); indicating CaCl2 as the most influencing factor for fluoranthene degradation. CaCl2*NH4NO3 and KH2PO4*NH4NO3 had highest coefficient values compared to other interactions; suggesting their significant effect on the enhancement of fluoranthene degradation. ANN does not have the facility to give insights of the interactions directly, but it can be obtained using inherent nature of ANN. Fig. 7 shows the sensitivity analysis of ANN indicating the rate of response and its change with respect to change in the input variable. Influence of variable can be estimated based on the slope and range of the change in response. Higher the slope and range of change in the response, greater the influence of variable on output. Fig. 3 represents the slope of all three parameters viz. CaCl2 (3.878), KH2PO4 (-2.317) and NH4NO3 (-3.550). This has clearly indicated greater influence of CaCl2 compared to other variables, which participated to enhance fluoranthene biodegradation. In addition to that, the

3.6. Training, validation and testing of the model The input data were divided into three sub-categories such as training (70%), validation (15%) and testing (15%) for development of the model. Fig. 5(a,b,c) represents the model of ANN with better R2 values of training (0.9983), validation (0.9999) and testing (0.9996). Fig. 5(d) represents that the overall model was best fit to linear

Fig. 4. Optimization of neural topology.

161

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

Fig. 5. Comparison between ANN derived and experimentally measured values of phenanthrene degradation for a) training b) validation c) testing and d) overall datasets. Table 8 Comparison of predictive capability between RSM and ANN. Parameters

RSM

ANN

R square Mean absolute percentage error (MAPE) Root means square error (RMSE)

0.9882 1.432 0.940

0.9987 0.5715 0.3234

slope of CaCl2 is quite comparable with quadratic RSM coefficient, which proves higher efficiency of ANN, compared to RSM in the sensitivity analysis.

4. Conclusion Fig. 6. External validation of RSM and ANN obtained dataset.

The present study has been focused on careful optimization of fluoranthene biodegradation by M. litorale using two models – RSM and ANN. The fabricated model has successfully rendered enhanced biodegradation of fluoranthene (51.28% D) as compared to the unoptimized experiment (26.37% D) with significant reduction in time, 162

Chemometrics and Intelligent Laboratory Systems 159 (2016) 155–163

D.R. Dudhagara et al.

[11]

[12]

[13]

[14] [15]

[16] Fig. 7. Sensitivity analysis of fluoranthene degradation using ANN model.

[17]

thus proving to be an accurate predictive tool resulting in greater significant model. The study would surely encourage the scientific community to extensively exploit the salient features of ANN over traditional RSM analyses. The study thus opens new avenues for the development of such models for effective remediation strategies for PAHs impacted habitats, which otherwise can have adverse effect on marine biota and human health.

[18]

[19]

[20]

Acknowledgements

[21]

Authors are thankful to Gujarat State Biotechnology Mission (GSBTM) Gandhinagar, Gujarat and Earth System Sciences Organization (ESSO), Ministry of Earth Sciences, Government of India, New Delhi, for financial assistance to carry out this research.

[22]

[23]

References

[24] [25]

[1] S.M. Bamforth, I. Singleton, Bioremediation of polycyclic aromatic hydrocarbons: current knowledge and future directions, J. Chem. Technol. Biotechnol. 80 (2005) 723–736. [2] R. Pietzsch, S.R. Patchineelam, J.P.M. Torres, Polycyclic aromatic hydrocarbons in recent sediments from a subtropical estuary in Brazil, Mar. Chem. 118 (2010) 56–66. [3] H. Parastar, J.R. Radovic, M.J. Heravi, S. Diez, J.M. Bayona, R. Tauler, Resolution and quantification of complex mixtures of polycyclic aromatic hydrocarbons in heavy fuel oil sample by means of GC × GC-TOFMS combined to multivariate curve resolution, Anal. Chem. 83 (2011) 9289–9297. [4] T. Rengarajan, P. Rajendran, N. Nandakumar, B. Lokeshkumar, P. Rajendran, I. Nishigaki, Exposure to polycyclic aromatic hydrocarbons with special focus on cancer, Asian Pac. Trop. Biomed. 5 (2015) 182–189. [5] D.R. Dudhagara, R.K. Rajpara, J.K. Bhatt, H.B. Gosai, B.K. Sachaniya, B.P. Dave, Distribution, sources and ecological risk assessment of PAHs in historically contaminated surface sediments at Bhavnagar coast, Gujarat, India, Environ. Pollut. 213 (2016) 338–346. [6] S. Palani, S. Liong, P. Tkalich, An ANN application for water quality forecasting, Mar. Pollut. Bull. 56 (2008) 1586–1597. [7] N.M. Gazzaz, M.K. Yusoff, A.Z. Aris, H. Juahir, M.F. Ramli, Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors, Mar. Pollut. Bull. 64 (2012) 2409–2420. [8] D. Bingöl, M. Hercan, S. Elevli, E. Kılıç, Comparison of the results of response surface methodology and artificialneural network for the biosorption of lead using black cumin, Bioresour. Technol. 112 (2012) 111–115. [9] J. Ye, P. Zhang, E. Hoffmann, G. Zeng, Y. Tang, J. Dresely, Y. Liu, Comparison of response surface methodology and artificial neural network in optimization and prediction of acid activation of Bauxsol for phosphorus adsorption, Water Air Soil Pollut. 225 (2014) 2225. [10] K.M. Desai, S.A. Survase, P.S. Saudagar, S.S. Lele, R.S. Singhal, Comparison of

[26] [27]

[28] [29]

[30]

[31]

[32]

[33]

[34]

163

View publication stats

artificial neural network (ANN) and response surface methodology (RSM) in fermentation media optimization: case study of fermentative production of scleroglucan, Biochem. Eng. J. 41 (2008) 266–273. A. Çelekli, F. Geyik, Artificial neural networks (ANN) approach for modeling of removal of Lanaset Red G on Chara contraria, Bioresour. Technol. 102 (2011) 5634–5638. Y. Yasin, F.B.H. Ahmad, M. Ghaffari-Moghaddam, M. Khajeh, Application of a hybrid artificial neural network–genetic algorithm approach to optimize the lead ions removal from aqueous solutions using intercalated tartrate-Mg–Al layered double hydroxides, Environ. Nanotechnol., Monit. Manag. 1–2 (2014) 2–7. M. Buyukada, Co-combustion of peanut hull and coal blends: artificial neural networks modeling, particle swarm optimization and Monte Carlo simulation, Bioresour. Technol. 216 (2016) 280–286. R.M. Atlas, Handbook of Media for Environmental Microbiology, 2nd ed., CRC Press, Taylor & Francis Group, USA, 2005. H. Kiyohara, K. Nagao, K. Yana, Rapid screen for bacteria degrading water insoluble solid hydrocarbons on agar plates, Appl. Environ. Microbiol 43 (1982) 454–457. C.M. Ghevariya, J.K. Bhatt, B.P. Dave, Enhanced chrysene degradation by halotolerant Achromobacter xylosoxidans using response surface methodology, Bioresour. Technol. 102 (2011) 9668–9674. J.C. Willison, Isolation and characterization of novel sphingomonad capable of growth with chrysene as sole source of carbon and energy, FEMS Microbiol. Lett. 241 (2004) 143–150. J.K. Bhatt, C.M. Ghevariya, D.R. Dudhagara, R.K. Rajpara, B.P. Dave, Application of response surface methodology for rapid chrysene biodegradation by newly isolated marine-derived fungus Cochliobolus lunatus strain CHR4D, J. Microbiol 52 (2014) 908–917. L. Mohajeri, H.A. Aziz, M.H. Isa, M.A. Zahed, A statistical experiment design approach for optimizing biodegradation of weathered crude oil in coastal sediments, Bioresour. Technol. 101 (2010) 893–900. Y. Xu, M. Lu, Bioremediation of crude oil-contaminated soil: comparison of different biostimulation and bioaugmentation treatments, J. Hazard. Mater. 183 (2010) 395–401. M.H.Beale, M.T.Hagan, H.B.Demuth, Neural network toolbox 7. User’s Guide, MathWorks, 2010. A. Witek-Krowiak, K. Chojnacka, D. Podstawczyk, A. Dawiec, K. Pokomeda, Application of response surface methodology and artificial neural network methods in modeling and optimization of biosorption process, Bioresour. Technol. 160 (2014) 150–160. N.G. Turan, B. Mesci, O. Ozgonenel, The use of artificial neural networks (ANN) for modeling of adsorption of Cu(II) from industrial leachate by pumice, Chem. Eng. J. 171 (2011) 1091–1097. L. Bruzzone, R. Cossu, G. Vernazza, Detection of land-cover transitions by combining multidate classifiers, Pattern Recognit. Lett. 25 (2004) 1491–1500. K.P. Singh, A. Basant, A. Malik, G. Jain, Artificial neural network modeling of the river water quality: a case study, Ecol. Model. 220 (2009) 888–895. T. Hill, L. Marquez, M. O’Connor, W. Remus, Artificial neural network models for forecasting and decision making, Int. J. Forecast. 10 (1994) 5–15. P. Singh, S.S. Shera, J. Banik, R.M. Banik, Optimization of cultural conditions using response surface methodology versus artificial neural network and modeling of Lglutaminase production by Bacillus cereus MTCC 1305, Bioresour. Technol. 137 (2013) 261–269. P.A. Willumsen, U. Karlson, Effect of calcium on the surfactant tolerance of a fluoranthene degrading bacterium, Biodegradation 9 (1998) 369–379. A. Gulbag, F. Temurtas, A study on quantitative classification of binary gasmixture using neural networks and adaptive neuro-fuzzy inference systems, Sens. Actuators B Chem. 115 (2006) 252–262. D.J. Choi, H. Park, A hybrid artificial neural network as a software sensor for optimal control of a wastewater treatment process, Water Res. 35 (2001) 3959–3967. E. Heidari, M.A. Sobati, S. Movahedirad, Accurate prediction of nanofluid viscosity using a multilayer perceptron artificial neural network (MLP-ANN), Chemom. Intell. Lab. 155 (2016) 73–85. S. Das, A. Bhattacharya, S. Haldar, A. Ganguly, S. Gu, Y.P. Ting, P.K. Chatterjee, Optimization of enzymatic saccharification of water hyacinth biomass for bioethanol: comparison between artificial neural network and response surface methodology, Sustain. Mater. Technol. 3 (2015) 17–28. K. Yetilmezsoy, S. Demirel, Artificial neural network (ANN) approach for modeling of Pb (II) adsorption from aqueous solution by Antep pistachio (Pistacia Vera L.) shells, J. Hazard. Mat. 153 (2008) 1288–1300. L. Jing, B. Chen, B. Zhang, Modeling of UV-induced photodegradation of naphthalene in marine oily wastewater by artificial neural networks, Water Air Soil Pollut. 225 (2014) 1–14.