Comparative Analyses of Response Surface Methodology and Artificial Neural Network on Medium Optimization for Tetraselmis sp. FTC209 Grown under Mixotrophic Condition

Mixotrophic metabolism was evaluated as an option to augment the growth and lipid production of marine microalga Tetraselmis sp. FTC 209. In this study, a five-level three-factor central composite design (CCD) was implemented in order to enrich the W-30 algal growth medium. Response surface methodology (RSM) was employed to model the effect of three medium variables, that is, glucose (organic C source), NaNO3 (primary N source), and yeast extract (supplementary N, amino acids, and vitamins) on biomass concentration, X max, and lipid yield, P max/X max. RSM capability was also weighed against an artificial neural network (ANN) approach for predicting a composition that would result in maximum lipid productivity, Prlipid. A quadratic regression from RSM and a Levenberg-Marquardt trained ANN network composed of 10 hidden neurons eventually produced comparable results, albeit ANN formulation was observed to yield higher values of response outputs. Finalized glucose (24.05 g/L), NaNO3 (4.70 g/L), and yeast extract (0.93 g/L) concentration, affected an increase of X max to 12.38 g/L and lipid a accumulation of 195.77 mg/g dcw. This contributed to a lipid productivity of 173.11 mg/L per day in the course of two-week cultivation.


Introduction
The microalga Tetraselmis has been used extensively in aquaculture, especially for rearing larval stages of mollusks and crustaceans [1]. In light of favorable outlooks [2], an emerging application of Tetraselmis or other oleaginous microalgae is currently centered on carbon biofixation in tandem with bioconversion towards renewable fuels. The whole process is considered to be efficient and encouragingly leaves a small pollution footprint [3]. Several species of Tetraselmis are acknowledged to possess metabolic plasticity, whereby in response to the culture conditions, these species can manifest in alternative phenotypes resulting in altered formation of algal bioproducts [4].
Heterotrophy of exogenous nutrients by microalgae is now being regarded as the practical means to increase the volumetric productivity of algal biomass [5]. Nonetheless, the amount of neutral lipid, a principal component in biodiesel synthesis, was significantly diminished for Tetraselmis cultured with organic carbon substrates without illumination. For an unspecified species of Tetraselmis, Day and Tsavalos [6] found that cultivation with glucose yielded only 0.64% w/w cellular lipid obtained under complete darkness, as opposed to 3.71% w/w for culture exposed to light. A study by Azma et al. [1] on the heterotrophy of T. suecica has reported a respectable 28.8 g/L dried cell weight. Lipid productivity was claimed to increase by about 2 times. However, comparison was made solely against photoautotrophic culture, by which the typical biomass concentration is in the range of 0.1 to 1.0 g/L, owing to the effect of mutual shading [7]. Alternatively, researchers are looking into the potential of mixotrophic (photoheterotrophic) mode upon realization of 2 The Scientific World Journal enhanced growth and unrepressed light-dependent bioproducts formation from the combined effects of photosynthesis and the cells' own ability to ingest either prey or dissolved organic materials [8].
Mixotrophy in the commercial-scale open ponds of Chlorella and Spirulina has been practiced for some time through continuous addition of acetate in small quantities during daytime to support greater growth [9]. Zhao et al. [10] had found that the mode was conducive for both biomass and lipid accumulations of Scenedesmus quadricauda, wherein max and lipid, max , were registered at 3.36 g/L and 0.79 g/L, respectively when culture was fed with starch wastewater. In addition, Isleten-Hosoglu et al. [11] also proposed that the mixotrophicallygrown Ettlia texensis isto be a good biofuelproducer candidate, where imposing optimized mixing and medium in stirred tank environment would lead to max of 10.1 g/L, with microalga retaining 35% of lipid bodies during its 11 days of cultivation, corresponding to biomass and lipid productivities of 0.92 g/L per day and 322 mg/L per day, respectively.
Previous observations on the native isolate Tetraselmis sp. FTC 209 in our laboratory revealed the strain preference towards mixotrophic over other forms of metabolisms. Initial attempt on medium design was based on the concept of elemental balance of aqueous nutrients to match the stoichiometry of the particular algal species [12]. The technique comprises a straightforward increase of nutrients found lacking to promote a higher cell-density culture, but neglecting the combined interactions between medium components involved. In a way, it may not guarantee to pinpoint the exact optimal condition, which possibly leads to slight inaccurate conclusions [13]. Statistical methods have been applied for developing reliable culture system. Of late, response surface methodology (RSM) coupled with central composite design (CCD) has been a popular tool to model the probable curvature of the measured responses in algal medium formulation [1,11,13,14]. However, a major obstacle in reaching model accuracy and generalization lies in the nonlinearity and time-varying nature of bioprocesses [15]. Artificial neural network (ANN) has been progressively applied in a number of optimization works [16]. ANN is a highly interconnected network of processing elements (neurons) capable of massive parallel computations, representing a data-centric modeling inspired by biological nervous system. Contrary to the conventional model requiring that the order needs to be stated (i.e., second, third or fourth order), ANN is more flexible and does not impose any restriction on the type of relationship governing the dependence output parameters on the various running conditions [17]. ANN essentially transforms inputs that passed through network of neurons with weighted interconnection into outputs predicted to the best of its ability. It is adaptive or trainable with a given dataset via adjusting many network factors (no. of layers, no. of neurons in hidden layers, types of transfer functions, or learning algorithms). The process continues until a defined accuracy has been reached [18].
Linear regression modeling via RSM has been constructed in the past for T. suecica [1]. However, no study to date has made use of ANN capacity for simulating the structure and functional aspects of neural networks to precisely develop an optimal growth medium for Tetraselmis. The aim of this study was to analyze the potential improvement of predictive microbiology afforded by RSM and ANN-based models, in this case, by assessing the contribution of major organic carbon and nitrogen sources such as glucose, NaNO 3 , and yeast extract towards enhancing the lipid productivity.

Microalga Strain, Maintenance, and Inoculum
Preparation. The microalga, Tetraselmis sp. FTC 209 previously isolated from the coastal waters of Port Dickson, Negeri Sembilan (Malaysia), was obtained from the collection of Fermentation Technology Unit (FTU-GMP@BIOTECH) of University Putra Malaysia. Prior to microalgal characterization via morphological, 16S rDNA partial gene sequencing and phylogenetic analysis had disclosed a taxonomic position of the strain as closely related to Tetraselmis striata. The axenicity of Tetraselmis isolate was maintained by culturing the cells onto Walne's medium agar treated with antibiotics cocktail comprising of 100/25 mg/L of ampicillin/streptomycin and fortified with 5 g/L glucose.
Four-week old colonies grown on agar plates were collected with disposable loops and cultivated in liquid medium formulated beforehand according to the principle of elemental balance. In brief, the technique requires increasing the concentrations of any bioelements found deficient in the standard Walne's medium by matching to the alga's cellular elemental composition. Nonetheless, a drop in Tetraselmis sp. FTC 209 growth rate and cell density was observed if such increment, expressed in "percent biomass capacity" for all macronutrients that were raised higher than 30%. The modified basal medium, designated as W-30 used throughout this study, was formulated using sterile double-filtered seawater and composed of (g/L): (  The Scientific World Journal 3 of 30 ± 3.3 mol/m 2 ⋅s for a constant diurnal cycle of 12 h (light) : 12 h (dark). All runs were carried out in triplicates for a total duration of 2 weeks cultivation.

Experimental Design.
Based on prior mixotrophic cultivations with W-30 medium, max was recorded at 8.08 g/L with lipid embodied about 20-22% of the Tetraselmis cell after 500 h of cultivation, by having the presence of glucose in the range of 20-30 g/L, and NaNO 3 being supplied at 3.50-5.23 g/L, respectively. Addition of organic complex nutrients, for example, peptone, yeast extract, malt extract, or beef extract to further boost the biomass propagation, and therefore indirectly, the lipid yield, have been suggested in the reports pertaining to mixotrophy or heterotrophy of several Tetraselmis species [1,4,19]. The foremost among the proposed supplements is yeast extract, by and large sourced from the autolysate of spent Saccharomyces cells, an underutilized waste by-product of brewing industry [20]. It has a competitively lower price compared to other organic N sources [11] and generally the most preferred for productionscale bioreactor [5].
A five-level, full-factorial central composite design (CCD) with three independent variables, that is, the concentrations of glucose (Merck Co.), yeast extract (Merck Co.), and NaNO 3 (Kollins Chemicals) was applied in this study (Table 1), requiring 19 sets of experimental runs consisting of 8 factorial (cubic points), 6 axial (star points), and 5 replicates of center points. The effects of these medium constituents towards ultimately achieving the maximum lipid productivity, Pr lipid (mg/L per day), were identified. Subsequent experimental values acquired from the runs using predicted optimal conditions were then used as validating set and were compared with the computed optimal values.

Response Surface Methodology
Modeling. RSM was employed to optimize the cultivation plus to investigate the relative and interactive effects the three medium constituents. To comprehend the algal growth and lipid secretion behavior, three responses comprising of biomass concentration (g/L), lipid yield (mg lipid/g dcw), and lipid productivity (mg/L per day) were initially measured. Design Expert (version 7.1.6, Stat-Ease Inc., Minneapolis, MN, USA) was used for regression modeling and data interpretation. The observed responses from CCD design were then fitted to the following polynomial equation as shown by where is the predicted response; and are the index numbers for the pattern; is the offset term; , , and are the coefficients for the linear, quadratic, and interaction effects, respectively; and are the coded variables; and is the error. The regression equation was optimized by an iterative method to achieve the optimum values.

Artificial Neural Network
Modeling. NeuralPower (version 2.5, CPC-X Software, USA) is a powerful ANN module for forecasting nonlinear regression. It was chosen to conduct pattern recognition on similar dataset subjected to RSM analysis. Data were divided into two sets; training set (15 data) and testing set (4 data) which was randomly picked from Table 2 (bold numbers). Every network possesses three input variables and one output response, each underwent training for computation of network parameters. Network performance was simultaneously consulted with the testing set during training to avoid becoming "over trained" and thereby improves the prediction (i.e., generalization) towards any data excluded from the training sets [21]. In the event of supervised training, designed networks were trained to the point of exhibiting root mean square error (RMSE) as shown by (1) to be as closest to 0.01, whereas the networks' correlation coefficient ( ) and determination coefficient (DC) as defined by (2) and (3), respectively, are closest or equal to 1: where is the number of data points, obs is the observed value, is the predicted value obtained from ANN model, is the average of actual values, and is the average of predicted values.
A full feed-forward network structure was selected for modeling of lipid productivity. In this case, network comprises of three input neurons, one output (response) neuron, and a single hidden layer, which is highly recommended for most practical feed-forward network designs [18]. Networks were consecutively trained via different learning algorithms (the "standard" back propagation package; genetic algorithm, GA; and Levenberg-Marquardt, LM). Adjustment of network parameters encompassed on the number of neurons in hidden layer and the types of transfer functions for both hidden and output layers. The trial and error approach was sufficient in choosing the optimal number of neurons that translates to the best network topology [22]. In NeuralPower; the numbers were tested from 5 to 30, each with the increment of one neuron at a time. The common transfer functions used for nonlinear regression are sigmoid, hyperbolic tangent, 4 The Scientific World Journal and Gaussian [23]. Linearity of network was also tested using linear, bipolar linear, and threshold linear-types of transfer functions. The search for optimal network topology proceeded by iteratively developing several networks. Each would be trained to meet the acceptable residual error terms as stipulated by (2) to (4). Other parameters such as the learning rate and momentum coefficient were kept to the default values of the software.

Verification of Predicted Data.
The estimation capabilities of both RSM and ANN models were evaluated by means of comparing the responses computed from both methods to the observed data. The calculated coefficients of determinations, 2 or DC (4), were exploited for the purpose of comparison, whether to determine the most accurate ANN model amongst various generated topologies, and from such outcome, the aforementioned best model would be compared to RSM results. 2 represents the proportion of the total sample variability as explained by the given regression. Nonetheless, it is not a sole measurement of model accuracy. The use of RMSE (2) or an absolute relative error test is more appropriate to describe the deviations. Apart from 2 , RMSE and mean absolute error (MAE) as defined by (5) were chosen as ancillary statistical indicators to measure the model performance Model is considered accurate when 2 is closest to 1.0, while RMSE and MAE between predicted and observed data must be as small as possible. Acceptable values of 2 , RMSE, and MAE mean that model equation is able to describe the true behaviour of the system, and it can be applied for interpolation in the experimental domain [16].

Analytical
Methods. Tetraselmis biomass concentration (g/L) was determined gravimetrically after the cells were lyophilized overnight in a preweighed sample vials. Culture samples of known volume were washed beforehand with 0.5 M ammonium formate to remove excess sea salts, followed by at least twice with distilled water. The resuspended cell pellets following recentrifugation at 2465 rcf for 5 min (5810 R, Eppendorf, Germany) were later subjected to freezedrying. The procedure consists of prefreezing (−30 ∘ C) at 1 bar for 4 h, sample preparation for 15 min at 1 bar, main drying (30 ∘ C) for 20 h at 0.001 mbar, and concluded with final drying at 0.0001 mbar for 15 min (Epsilon 1-8D, Martin Christ, Germany).
Lipid was extracted from 300 to 400 mg of dried cells according to a method modified from Folch et al. [24]. Cells were first pulverized to fine granules by pestle and mortar, followed by adding 4 mL methanol containing 500 ppm butylated hydroxytoluene (BHT), 2 mL chloroform, and 0.4 mL water. The mixture was homogenized and disrupted in an ultrasonic bath (Thermo-10D, Thermoline, Australia) for 15 min. Additional chloroform (2 mL) was added, and the mixture was left standing for one day. Water (2 mL) was later added, and the mixture was vortexed for 60 s. After centrifugation and siphoning off the upper phase, the lower chloroform phase containing lipid was collected in The Scientific World Journal 5 a pre-weighed vial. Organic solvent was heated to 62 ∘ C and purged with a passing nitrogen stream. The total lipid was also determined gravimetrically.
Qualitative inspection of intracellular lipid was also conducted using fluorescent microscopy. Nile red (9-(diethylamino)-5H-benzo [a] phenoxazin-5-one), a photostable, selective fluorescent dye, was used for in situ staining of neutral lipid. 30 L of algal cells suspensions sampled at the end of cultivation as mixed with 10 L of 0.1 mg/mL Nile red solution (Sigma) (dissolved in acetone). 1960 L of a freshly prepared 25% (v/v) dimethylsulfoxide (DMSO) solvent was then added as stain carrier since the thick, rigid cell walls of Tetraselmis sp. could inhibit the permeation of fluorescent dye [25]. The mixture was vortexed for about 60 s and incubated in the dark at 40 ∘ C for 10 min. Microalga cells were photographed using light microscope (Leica DMLB, Wetzlar GmbH, Germany) with an eye-piece digital camera (Dino-Eye AM4023X, ANMO Electronics, Taiwan). Epifluorescent images of Nile red stained cells were captured using D filter cube (broad-range UV+ violet excitation) or N2.1 filter cube (green excitation) obtained at 1000x magnification with oil immersion (Leica Microsystems). Table 2 displays the CCD design matrix of the medium constituents chosen, together with the actual responses, that is, biomass concentration, lipid yield, and productivity. Sequential comparison of all the potential RSM models' sum of squares by Design Expert software has demonstrated that the quadratic type is the highest order polynomial regression aptly suitable to explain the relationship between input variables and responses. The corresponding uncoded second-order polynomial response equations derived accordingly for the algal biomass (6), lipid yield (7), and lipid productivity (8)

RSM Modeling.
The goodness of fit of each equation is denoted by 2 Adj . Model assessment that was based on 2 Adj in place of 2 was more accurate, given that the presence of extraneous factorial terms in a derived model equation will result in some reduction in the error sum of squares. 2 Adj will compensate for the added explanatory variables since 2 value naturally increases with the addition of new variable terms. 2 Adj in this case are 0.868, 0.914, and 0.970 for (6) to (8), respectively, indicating good model agreement between the observed against predicted values for all the output responses.
Statistical testing for significances of the proposed models is presented by the analysis of variance (ANOVA) in Table 3. According to the results, the individual 2 obtained at 0.934, 0.957, and 0.985 shows that the three derived models could explain more than 93% of the variability. The -test value of 14.20 for biomass concentration, 22.14 for lipid yield, and 64.82 for lipid productivity, plus the probability values ( model > ) of less than 0.05, indicates that each of these models were considered significant. Besides, relative variability of the experimental results was confirmed to be acceptable based on the individual coefficient of variation (CV) for biomass (10.30%), lipid yield (12.72%), and productivity (10.51%). Another cue for the goodness of fit is represented by the models' lack of fit (LOF) terms which were proven to be insignificant, whereby ( model > ) were determined at 0.1907, 0.5391, and 0.4416, respectively. The optimal conditions and interactions between the medium constituents are shown in the three dimensional response surface plots (Figure 1). The biomass concentration was varied from 4.95 (g/L) to 13.05 (g/L). On the other hand, the intracellular lipid yield range was varied from 58.87 (mg lipid/g dcw) to 204.30 (mg lipid/g dcw), while its related productivity was varied from 31.72 (mg/L per day) to 177.20 (mg/L per day).
As per ANOVA analysis, all three independent input variables directly contributed to the first-order effect on the cell growth model. However, the quadratic effect of NaNO 3 ( 2 3 ) was more prominent ( < 0.0001) compared to the other inputs. By maintaining the NANO 3 concentration at its center point, the cell density was observed as increasing in an almost linear fashion with the increase in both glucose and yeast extracts (Figure 1(a)). From the examination of contour plots, the highest biomass concentration was obtained with glucose ranging from 26.0 to 30 g/L, and yeast extract from 0.83 to 1.80 g/L, provided that the NaNO 3 concentration as kept below 6.0 g/L (Figures 1(b) and 1(c)).
Compared to the results of biomass concentration, the responses associated with lipid production were more bounded by the range of the input variables selected. Glucose of about 25 g/L was devoted to attain the maximum lipid yield and productivity at a given yeast extract (Figures 1(d) and 1(g)) and NaNO 3 (Figures 1(e) and 1(h)) concentrations. Here, keeping the glucose at the center point would correspond to an optimal range of NaNO 3 at 4.5 to 5.5 g/L, while yeast extract would be confined to a narrower range of 1.0 to 1.48 g/L (Figures 1(f) and 1(i)). Response surfaces with regard to lipid productivity depict an excellent circular contour, suggesting that the interaction between the input variables pose very little role in predicting the response [1]. Moreover, the quadratic terms were recognized to impart more influence towards regression modeling. Nonetheless, contour plots exhibiting a defined elliptical shape would otherwise indicate 6 The Scientific World Journal perfect interactions between the medium constituents used in formulation [14]. Such topology was visually evident in Figures 1(d) and 1(e). A cross referencing to ANOVA table confirms that the interaction between glucose and sodium nitrate ( 1 3 ) was actually very significant for directly promoting lipid yield in Tetraselmis cell (Prob > = 0.0391).

ANN Modeling.
The overall lipid productivity was given more emphasis in algal cultivation. Biomass concentration on the other hand usually affects the downstream costs [11]. Thus, maximizing the main response of interest became the focal point of ANN optimization exercise. In the network training/testing process, a total of 330 neural network The Scientific World Journal      architectures were tested for the prediction of Pr lipid , each having a diverse configuration of hidden neurons, learning algorithm, and transfer functions of output and hidden layer. Nonetheless, it was necessary to ultimately choose only one of them, which provides the best compromise between bias and variance and also generates a good generalization. Table 4 summarizes the top five ANN models. Network training entails selecting a particular model that minimizes the error or cost criterion. Judging from Table 4, models with the least residual error were either trained using the Levenberg-Marquardt (LM) or Genetic algorithm (GA). LM is often regarded as the most efficient in terms of speed and accuracy in finding the optimal point compared to others [22]. Networks designed throughout this study were considered suitable to be trained by LM by abiding to the algorithm restrictions. Namely, LM is only effective for a small network (containing a few hundred weights) as its memory requirements are proportional to the square of the number of weights in the network, and the algorithm can only be used for network with a single output response. Additionally, LM is specifically used to minimize the sum of squares error and cannot be applied for other types of network errors. GA on the other hand is a stochastic method mostly associated with simulation of biologic heredities and evolutionary processes. Each possible solution to a set of problems is taken as an "individual" among population, and each individual is coded as a character string. GA applies its unique selection, crossing, and mutagenesis operators on a random population in order to compute a new one, eventually introducing some diversity to the algorithm [17]. An interesting trait of GA is that the algorithm is able to avoid a one-point optimal search usually associated with gradient descent or LM back propagation. Instead, GA is capable of global optimum exploration of the design space [23].

Y e a s t e x t r a c t io n ( g / L ) S o d i u m n i t r a t e ( g / L )
The choice of transfer function also directly affects the ANN's learning rate and is deemed instrumentally to its performance. In this study, most of the statistically accepted models were produced with linear function for output layer. Linear was frequently chosen for output layer for simulating functions without discontinuities. Gaussian, hyperbolic tangent, and sigmoid were all found to be suitable for hidden layer. Evidently from the tabulated results, the network using linear and sigmoid for the output and hidden layer produced the lowest RMSE (6.517) and a very high (0.999) and DC (0.986). It has become a rule-of-thumb to choose sigmoid as the activation function for excellent non-linear model, but at the expense of slower learning [26]. Its hyperbolic tangent (Tanh) counterpart has the same response shape as that of sigmoid; thus, their computational cost is insignificantly different, and both functions can create a very smooth model. However, it was noted that the convergence performance of error functions was faster when Tanh function was employed for hidden layer. Calculated RMSE is slightly higher at 8.074, in addition to comparable (0.995) and DC (0.978). Unlike sigmoid or Tanh function that acts as a gate (open or closed) for a neuron's output response when given a set of inputs, Gaussian behaves like a probabilistic output controller, producing output that can be described as a type of partial response. This transfer function tends to map pattern quicker than sigmoid; nevertheless, its prediction can be prone to memorization. The optimal number of neurons is an idiosyncrasy of the system in question. Increasing the neurons would rationally improve the learning performance, as too few would consequently lead to erratic learning or nonconvergence as observed in networks trained either with GA or LM. Network with too many neurons however may allow for too much freedom for the weights to adjust and, hence, invariably learn the noises that present in the training dataset [27]. To evaluate the fidelity of ANN architecture, parity plots of a testing set with 10 altogether different data points ( Figure 2) were constructed for the top two networks in Table 4. The 2 and MAE of the plots were then determined. Based on the fitting criteria, the LM-trained network of 3-10-1 architecture with Tanh function for hidden layer (Figure 3) was better in predicting the lipid productivity ( 2 of 0.953 and MAE of 6.048). Hence, the designed network could properly correlate the input and response. In most cases, good generalization could be obtained with ANN incorporating between 4 to 15 neurons [21]. Figure 4 depicts the response surface topologies describing the interaction effect of the three medium constituents on lipid productivity as predicted by the optimal network. Every plot has a dome-shaped surface much similar to Figures  1(g) to 1(i). Notwithstanding, these plots project a distinctive undulated curvature, representing a graphical refinement in terms of nonlinearity in the output response compared with those generated by RSM. Figure 4(a) shows that when NaNO 3 is fixed at the middle level (5.0 g/L), lipid productivity increases when yeast extract and glucose are ramped up to a certain level before decreasing thereafter with further addition of these components. Similar trend persists in interaction  between NaNO 3 and glucose (Figure 4(b)) and also for interaction between yeast extract and NaNO 3 (Figure 4(c)).

Comparison of Predictive Capacity between RSM and ANN Models.
The predicted responses computed via RSM and ANN are presented in Table 5. Evaluation based on the models' coefficient of determination actually shows a satisfactory convergence between the predicted and actual lipid productivity values. Thus, both models can be considered to perform well in data fitting and offered stable responses. Yet, the 2 of ANN is closer to 1.0, indicating a higher predictive ability and accuracy as compared to RSM. Furthermore, RSM produced about 36.95% deviation in RMSE and about twice the difference in MAE than the error functions calculated

Optimization Employing Best Predicted Points of RSM and ANN Models.
In RSM, the finalized medium composition was searched using the Design Expert optimization module with the goal of achieving the maximum lipid productivity. A single set of simulated solution was proposed comprising of (g/L): glucose, 28.11; NaNO 3 , 5.09; and yeast extract, 1.20. Highest lipid productivity was estimated at 169.20 mg/L per day with a desirability of 0.862. Alternatively, ANN calculates the optimum composition by ways of "Rotation Inherit Optimization" (RIO), an evolutionary algorithm much in-line with Genetic Algorithm (GA) or Particle Swarm Algorithm (PSA) albeit with faster convergence, and it dispenses with customized parameters set by experimenter, with the sole exception of population size. RIO was utilized to improve the best point searches of the studied system. Population size was set to 10. By 18000 iterations, the maximum lipid productivity was predicted at 174.84 mg/L per day. The resulting theoretical composition consisted of (g/L): glucose, 24.05; NaNO 3 , 4.70; and yeast extract, 0.93. Validation set was carried out and the results are compiled in Table 6. Formulation using RSM (160.17 mg/L per day) and ANN (173.11 mg/L per day) saw an increase of 1.76-fold and 1.90-fold of lipid productivity compared to the previous nonstatistically optimized run, in which W-30 medium was added with 30 g/L glucose. The final results were observed to be insignificantly different for the medium formulated using the two statistical approaches. Nevertheless, it may still come as a surprise to see that culture grown in medium with lesser concentrations   of glucose, NaNO 3 , and yeast extract could provide higher biomass and lipid productivity. This could be explained from the standpoint that unlike any synthetic medium prepared with distilled water, a workable concentration range of medium constituents adopted in this study was actually limited by the inherent physicochemical properties of fullstrength (undiluted) seawater itself. The major obstacle would be to fully dissolve the nutrient components while simultaneously maintaining a circum-neutral pH tolerable to Tetraselmis survival. It was found that a too high concentration of yeast extract tends to lower the pH, and an attempt of readjusting by alkaline buffer (0.5 M NaOH) may be tampered with the existing solutes equilibrium and promote the constituents to form complexes with seawater, resulting in precipitation. Elevated alkalinity increases the supersaturation level of calcium ions in seawater and hence, leads to formation of amorphous calcium carbonate. This phenomenon would be enhanced either by the lack of magnesium ions present in seawater, or if there was an increased in iron (Fe 3+ ), initiating the precipitation of calcite or potentially forming the colloidal iron hydroxide [28]. Fe 3+ is an abundant and naturally occurring element in yeast extract (∼2% w/w). In addition, a very high concentration of organic complex nutrient would obviously darken 12 The Scientific World Journal 3.5. The Importance of Medium Components. Figure 5 shows the degrees of importance (expressed in term of percentage of contribution) of the three medium constituents towards influencing lipid productivity as determined by NeuralPower. NaNO 3 is the most important factor at 37.10%, followed by glucose at 34.06% and yeast extract at 28.84%. In general, nitrate is a major N source that strongly impacted the metabolism and growth of plant system. To assimilate nitrate, microalgae cells need to transport the ion across The Scientific World Journal 13 the membrane and subsequently reduce it to ammonia. The process is said to consume large amounts of energy, carbon, and protons [9]. Glucose is nonetheless a good choice for both carbon and energy sources for microalgae, since it can be easily stored as starch without prior conversion to glyceraldehyde phosphate (GAP) in Calvin's cycle. Plus, some part of it is readily oxidized throughout the glycolytic pathway [3]. The cheap and easily available yeast extract on the other hand was specifically chosen as it represents an economically sound and sustainable alternative for amino acids and vitamins sources. Nile Red staining ( Figure 6) has revealed the presence of substantial lipid globules inside Tetraselmis cells fed with glucose, in contrast with microalga grown under strict photoautotrophic after two-week cultivation. Introduction of organic carbon source and medium optimization had therefore yielded cultures with oil content consistent with the upper range reported for Tetraselmis species [2] in conjunction with rapid growth and higher biomass concentration. In microalgae cell, starch and lipid biosynthesis are two competing pathways of reduced carbon storage sink, whereby starch usually dominates over lipid under normal condition. Recent metabolic study has suggested that the high production rate of triacylglyceride (neutral lipid) would take place whenever the carbon supply exceeds the cell's capacity for starch synthesis in algal system. Using Chlamydomonas reinhardtii as model microalga, Fan et al. [29] have reported that feeding acetate at several-fold the concentration of the standard growth medium, a strategy termed as "mega-dosing, " would max out the cellular starch production capacity to the point that any additional carbon would be channeled into high-gear oil production. However, it should be noted that the substrate should not exceed the growth inhibitory level of the particular algal species. In essence, carbon precursor availability, as well as the notion of N starvation, is now accepted as the key metabolic factors controlling the partitioning of carbon into lipid in mixotrophic cultivation.

Conclusions
This study has shown that statistical techniques such as RSM and ANN could predict the Tetraselmis' biomass and intracellular lipid productivity. Though ANN may appear to be superior in terms of accuracy over RSM, it is opined here that both methodologies complemented each other in interpreting the results, whether in pointing out synergistic interactions among the input variables via ANOVA, or in classifying the importance of each component. ANN is unrestricted to the order of the model, and therefore, the approach is more dynamic in simulating the true behavior of nonlinear dataset. However, the typical downside of ANN requiring large amounts of training data for pattern recognition was circumvented through CCD initially devised for RSM. CCD is known to be an efficient design-of-experiment method with a hypercube geometry region, which is the best for minimizing the number of runs while upholding statistical significance. Addition of yeast extract into W-30 medium composition significantly enhanced the algal growth to as much as 12.38 g/L, but did not increase the proportion of lipid bodies (195.77 mg/g dcw) higher than the maximum reported in the literature.