Pharmaceutical Applications of Chemometric Techniques

Chemometrics involves application of various statistical methods for drawing vital information from various manufacturingrelated processes. Multiway chemometric models like parallel factor analysis (PARAFAC), Tucker-3, N-partial least square (NPLS), and bilinear models like principle component regression (PCR) and partial least squares (PLS) have been discussed in the paper. Chemometric approaches can be used to analyze the data obtained from various instruments including near infrared (NIR), attenuated total reflectance Fourier transform infrared (ATR-FTIR), high-performance liquid chromatography (HPLC), and terahertz pulse spectroscopy.The technique has been used in the quality assurance and quality control of pharmaceutical solid dosage forms.Moreover, application of chemometric methods in the evaluation of properties of pharmaceutical powders and tablet parametric tests has also been discussed in the review. It has been suggested as a useful method for the real-time in-process testing and is a valuable process analytical tool.


Introduction
Chemometrics is a branch of science that derives data by the application of mathematical and statistical methods, for the extraction of useful information from physical and chemical phenomena involved in a manufacturing process.Chemometrics is used for multivariate data collection and analysis protocols, calibration, process modelling, pattern recognition and classification, signal correction and compression, and statistical process control.Both predictive and descriptive issues of life sciences could be solved by chemometrics.The predictive issues include numerous system properties that are utilized in an elaborated model with the intent of predicting the target properties, desired features, or behaviour of interest.The descriptive issues include properties of the investigated systems that are modelled in order to learn the underlying relationships and the system structure, which leads to the model identification, composition, and understanding.There is a vast volume of measurement data generated by the latest automated laboratory instruments in biological/medical research which are difficult to absorb and interpret.The use of chemometrics helps to perform such a challenging task of consuming the data and reveal the useful information.Some applications of chemometrics in pharmacy and medical sciences are depicted in Figure 1.
Chemometrics and its methods are versatile and there is a high level of abstraction as it characterises the scientific disciplines extensively by the application of the statistical and mathematical methods, mainly the multivariate methods.There are various algorithms and analogous ways for processing and evaluating the data and they can be implemented to various fields, namely, medicine, pharmacy, food control, and environmental monitoring [1,2].Different chemometric models for the analysis of data are shown in Figure 2.
1.1.Bilinear Models.In bilinear models, the data is arranged in data matrices so that each vertical column has variables and each horizontal row contains samples [1].Bilinear chemometric techniques are further categorized as principal component analysis and partial least squares.(PCA).PCA is a simple, nonparametric method for extracting relevant information from datasets, identifying patterns in data, and expressing the data in such a way to highlight their similarities and   differences.PCA is applied for the reduction of dimensionality and multivariate data compression exploration within different fields of science.It is one of the widely used multivariate methods because of its wide applicability in the multivariate problems.During the monitoring of a process, PCA can be used to find a correlation structure of variables and to examine the changes in variable correlations and hence is used for reducing the number of variables in a process.If for a series of sites, or objects, or persons, a number of variables are measured, then each variable will have a variance, and usually the variables will be associated with each other; that is, there will be covariance between pairs of variables.In PCA, data is transformed to describe the same amount of variability.The first axis depicts the total variance possible whereas the second axis depicts the remaining variance possible (without correlating with the first axis) and the third axis depicts the total variance remaining after accounting for the earlier two axes also without correlating with either of the axes.The new axes, or dimensions, are uncorrelated with each other and are weighted according to the amount of the total variance that they describe [3,4].A special case of PCA was mentioned in Table 1.

Partial Least Squares (PLS)
. Partial least squares (PLS) is a widely accepted class of methods used for modelling the relations between different sets of observed variables by means of latent variables.Projections of the observed data to its latent structure by means of PLS were developed by Wold and coworkers [5][6][7].The basic assumption of the PLS method is that it modifies relations between sets of the observed variables by a small number of latent variables (not directly observed or measured) by incorporating regression, dimension reduction techniques, and modelling tools.Generally these latent vectors maximise the covariance between different sets of variables.PLS is similar to canonical correlation analysis (CCA) and can be applied as a discrimination tool and dimension reduction methodsimilar to principal component analysis (PCA) [8,9].It can also be related to other regression methods like principal component regression (PCR), ridge regression (RR), and multiple linear regression (MLR); all these methods can be cast under a unifying approach called continuum regression (CR) [10][11][12][13].The PLS method has great acceptance in the field of chemometrics as a wide spectrum of chemical data problems are processed by this algorithm [14][15][16][17][18]. Figure 3 enlists the applications of PLS in various fields.However, with the presence of substantial nonlinearity, PLS tends to give large prediction errors thereby making nonlinear calibration techniques such as nonlinear partial least squares (NPLS), locally weighted regression (LWR), alternating conditional expectations (ACE), and artificial neural networks (ANN) in such cases.

Multiway Models.
Multiway models are used when the data is multivariate and linear in more than two dimension arrays.The multiway modelling is more acceptable for data analyses as bilinear treatment methods were not able to provide sufficient results.These multiway models are widely applicable for extracting chemical information from spectra because of the ability to determine the compound composition of a mixture, which is often a demanding task due to overlapping and other problems typically present in spectral data along with enhancing chemical understanding and evaluating relative concentrations of compounds in a sample.Besides this multiway models have found their application in process control and in regression analyses.The methods like multiway principal component analysis (MPCA) and multiway partial least squares (MPLS) are recognised as tools for monitoring batch data as it improves the process understanding and summarises its behaviour in a batch-wise manner which is quite advantageous.But, if the original data contains higher dimensions, it becomes difficult for the models to interpret the computed data and therefore multiway methods that work with three-way or higher arrays like parallel factor analysis (PARAFAC and PARAFAC-2, Tucker-3, and N-partial least squares (N-PLS)) are the methods of choice [24][25][26][27][28]. Usually these comprise the factor models by preserving the common variation of the original data in every dimension.It should be noted that utilization of multiway models in problem solving has been on the increase in recent years, most probably because of the increased awareness of the potential advantages of these multiway methods [1].

Parallel Factor Analysis (PARAFAC).
Parallel factor analysis (PARAFAC) is a decomposition method for modelling three-way or higher data mainly intended for data having congruent variable profiles within each batch.There is a brief history to understand PARAFAC.Cattell (1944) reviewed seven principles for the choice of rotation in component analysis and advocated the principle of "parallel proportional profiles" as the most fundamental principle.Specifically, this principle means that the two data matrices with the same variables should contain the same components.Using this principle as a constraint Harshman (1970) proposed a new method to analyze two or more data matrices that contain scores for the same person on the same variables and termed the method as PARAFAC [26,[28][29][30].

Parallel Factor Analysis-2 (PARAFAC-2)
. PARAFAC-2 can handle data with different temporal durations and variable profiles that are shifted or/are in a different phase.PARAFAC-2 enables trilinearity not to be fulfilled in one mode, whereas in PARAFAC trilinearity is a fundamental condition.However, it should be noted that PARAFAC may be used to fit nonlinearity to some extent in one mode only in cases where data shifts from linearity are regular.PARAFAC as well as PARAFAC-2 has been mainly applied for analyzing chemical data from experiments that form a 3-way or higher data structure, for example, chromatographic data, fluorescence spectroscopy measurements, temporal varied spectroscopy data with overlapping spectral profiles, and process data [26,29,31,32].

Tucker-3.
For the compression and data exploration of N-way array the Tucker-3 method can be used as it consists of loading matrices in n modes.The Tucker-3 model has taken the name from the psychometrician Ledyard R. Tucker who in 1966 proposed the model.He also proposed a way to calculate the parameters of the model and since then many improvements have been suggested with regard to the algorithmic solution.The model itself has remained a strong tool for analysis of three-way (and higher-way) data arrays.
The generality of the Tucker-3 model, and the fact that it covers the PARAFAC model as a special case, has made it an often used model for decomposition, compression, and interpretation in many applications [26,29,33].
1.2.4.N-Partial Least Square (N-PLS).For handling a multiway data extension of PLS method namely N-PLS was introduced that uses dependent and independent variables for finding the latent variables for describing maximal covariance.N-PLS decomposition starts by constructing a distinct PARAFAC-like model for dependent response variables and maximizing the covariance between the two matrices [24,29,34].

Techniques and Pharmaceutical Application
Nowadays various spectroscopic techniques are being combined with various chemometric techniques like multivariate analysis methods, PLS, CLS, PCR, and so forth, for the evaluation of different pharmaceutical properties of tablets, powders, granules, and so forth.The most popular technique used is NIR spectroscopy.
2.1.Near Infrared Spectroscopy.Near infrared spectroscopy (NIR) is a widely acceptable analytical technique used to record the spectra for solid and liquid samples.The NIR range is between 780 and 2500 nm.NIR spectroscopy is a nondestructive, noninvasive technique that requires lesser quantity of sample for analysis.NIR is a fast and nondestructive analytical method associated with chemometrics suitable for analysis of solid, liquid, and biotechnological pharmaceutical forms.It can also be used for the determination of nonchemical properties (density, viscosity) of the sample [35,36].Different fundamental vibrations can be described by using a harmonic oscillator model with different energy and space levels.In NIR spectroscopy various radiation sources are selected and at a particular wavelength the spectra of unknown sample are being recorded by different detectors.The spectrophotometers used in NIR spectroscopy are of two types depending on the wavelength selection, that is, discrete wavelength and whole spectrum.In discrete wavelength, light sources filters like LEDs are used for getting narrow bands while whole spectrum involves diffraction grating.The NIR spectra which contain analytical information can be extracted by using multivariate analysis technique.Multivariate analysis is also used for qualitative analysis [37].NIR spectroscopy can also be used with other calibration models like PLS and PCR.Tomuta et al. implemented NIR chemometric method for meloxicam assay from powder blends for tableting.NIR spectra of the different meloxicam powder blends were developed and analysed by partial least square regression (PLS) and principal component regression (PCR) methods [38].Awaa et al. studied that after comparing two pharmaceutical tablets of pentoxifylline (PTX) and palmitic acid with NIR spectroscopy and chemometrics self-modelling curve resolution (SMCR) analysis it gave an idea about the qualitative and quantitative information.Thus the concentration profiles obtained by self-modelling curve resolution (SMCR) provided information that pentoxifylline (PTX) was well distributed in the waxy matrix of the tablet by increasing the grinding time.It was concluded that grinding time, distribution, and change in crystal shape of pentoxifylline (PTX) are interrelated with each other.The study indicated that distribution, uniformity, and change in molecular structure of the ingredients could be assessed by NIR imaging with SMCR analysis [39].Cen et al. measured the soluble solid content and pH of the orange juice by using visible and near IR spectroscopy (Vis/NIRS) and chemometrics.104 samples of orange juice were taken and preprocessed and their spectrum was recorded by using wavelet packet.Chemometrics of PLS regression analysis was preferred for processing spectral data and then evaluation of SSC and pH of orange juice was performed showing that NIRS and chemometrics increase the evaluation of data analysis [40].Near infrared chemical imaging is a reliable and robust technique for the development of pharmaceutical product and for assuring the quality in the final product.One of the main advantages of this technique is its capability of recording a great amount of spectral information in short a time.The classical least square and multivariate curve revolution model could be used for analysing the qualitative and spatial information about the ingredient used in the formulation of pharmaceutical formulations [41].Furukawa et al. evaluated the homogeneity of blends of poly((R)-3hydroxybutyrate) (PHB) and poly(L-lactic acid) (PLLA) by the near infrared chemical imaging.The predictions of the spatially averaged concentrations of the blend components obtained from PLSR results show values similar to the actual contents for the blends [42].Rahman et al. investigated the enhancement of the dissolution rate of solid dispersion formulation of cyclosporine using polyethylene glycol (PEG-6000).The characterisation was done by near infrared and near infrared chemical imaging, DSC, FTIR, PXRD, and SEM.Near IR chemical imaging (NIR-CI) tells about the homogeneity of the solid dispersion matrix where the chemometric application of the nondestructive method gives a valuable means of characterization and estimation of drug and carrier [43].

Attenuated Total Reflectance Fourier Transform Infrared
(ATR-FTIR) Spectroscopy.ATR-FTIR spectroscopy is mainly based on curvature of light beams passing through different media and also depends on the molecular vibration.To produce the ATIR spectrum, transmitting radiations, like UV, IR, and visible, are used.These radiations are passed through the sample situated in an optical crystal to determine the incident radiation attenuated by the sample.ATR spectrometry is used for various purposes like in laboratory testing, medical diagnostics, and clinical assays.ATR gives reliable spectrum for semisolid, murky, turbid, and optically dense solutions.ATR in combination with IR spectroscopy can be used for the characterization of the solid states.ATR crystal is coated with Zn-Se crystal which allows the IR radiation to pass through aqueous solution [44].The main advantage of sample analysis by ATR-FTIR spectroscopy is that it is an excellent technique used for surface analysis of the sample and requires less amount of sample.To obtain reproducible and good quality spectra the powder samples have to be compressed on the ATR crystal.In ATR spectroscopy when incident light falls on crystal only an evanescent wave can pass into the sample [45].Szakonyi and Zelkó et al. study the water contents of superdisintegrant pharmaceutical excipients by ATR-FTIR spectroscopy using simple linear regression.In this water content is determined for three common superdisintegrants (crospovidone, croscarmellose sodium, and sodium starch glycolate).In this three different samples of crospovidone from two different manufacturers were examined to check the effect of different grades.Water spectra were observed in between 3700 and 2800 cm −1 and other spectra were observed due to compaction of the samples on ATR crystal by using small pressure IR range in between 1510 and 1050 cm −1 and calibration curve was made in between these ranges.Particle size has no effect on calibration if the sample of any disintegrant is taken from the same manufacturer.Baseline correction is done to maintain the linearity of the calibration curve.Chemometric methods like simple regression could be employed for the detection of water content of the powdered hygroscopic materials.

HPLC (High-Performance Liquid Chromatography).
The current method HPLC is used for the analysis of multicomponent pharmaceutical formulation.In HPLC various kinds of injection and selective treatment are required for the analysis of samples.Optimization of the different conditions like selection of the column, selection of various mobile phases with various compositions, temperature of the column, and selection of one specific wavelength has to be done for accurate analysis of results.High-performance liquid chromatography is a very sensitive analytical technique for determination in which some factors like error in linear regression, error in chromatographic area, fluctuation during single wavelength detector response, and so forth could affect the outcome of results.
According to the chemometric method, HPLC uses a PDA detector for the binary mixture analysis and it is combined with different calibrating techniques like PLS, PCR, CLS, and so forth; hence, they are collectively called HPLC-CLS, HPLC-PCR, and HPLC-PLS.For the determination of NAP and PSE in a synthetic mixture or in tablets only three chemometric approaches were applicable or applied.For the statistical comparison, various tests are being involved in it like the t-test, ANOVA test, F test, and so forth.Various chemometric methods like PLS, CLS, and PCR were used with the combination of a PDA detector.A chromatogram was plotted for analyzed drug and stored in computer.Evaluation of the response from detector is based on the function of the peak area.Dinc ¸et al. studied chemometric determination by high-performance liquid chromatography (HPLC) with photodiode array (PDA) detection and implemented for simultaneous determination of naproxen sodium and pseudoephedrine hydrochloride in tablets.The experimental results obtained from HPLC-chemometric calibrations were compared with those obtained by a classic HPLC method [46].Abdelkawy et al. carried out simultaneous determination of a mixture of ambroxol and guaifenesin in cough cold formulation by HPLC and multivariate calibration methods [47].The combinations of HPLC with a chemometric technique are as follows.

HPLC-CLS Approach.
This approach is mainly based on the application of the multilinear regression (MLR) to ratio the peak of individual drugs.In this approach, the matrix equation is applied.

HPLC-PCR Approach.
In this approach reprocessing of the ratio of drug concentration and peak area of the individual drug was done by mean centering as Ro and Co. Investigation was done on the covariance dispersion matrix of the centered matrix Ro.Normalized eigenvalues and eigenvectors can be extracted from a square covariance matrix.The highest value of eigenvalues helps in obtaining the number of optimal principal components (eigenvector ()).Other eigenvalues and eigenvectors are ignored.Coefficient  is determined by  = ×, where  is matrix of the eigenvector and  is the -loading given by  = ×  ×Ro.  represents the transpose score matrix. and  is a diagonal matrix having components inverse to selected values.Drug content was calculated by  prediction =  ×  sample .For data treatment PLS toolbox 3.5 in MATLAB 7.0 software could be used.

HPLC-PLS Approach.
The orthogonalized PLS algorithm involves, simultaneously, independent and dependent variables on the data compression and decomposition operations for PLS calibration.To obtain the decomposition of both concentration and ratio of peaks area matrix into latent variables HPLC-PLS calibration method in HPLC data,  =  ×   +  and  =  ×   +  was used.The linear regression equation  prediction =  ×  sample was used for estimation of drug in the samples.Vector  was given as  = ×(  × ) −1 ×, where  represents a weight matrix.
Application of this method was done using PLS toolbox 3.5 in MATLAB 7.0 software.Validation sample of meloxicam was checked by reference HPLC analysis.Pulverization was done after determining the average mass of the sample tablets.In this method, accurate weighing was done to obtain the tablets.Powders were extracted with 5 mL methanol in an ultrasonic bath for 10 minutes and suspension was obtained by centrifugation for 5 min at 5000 rpm.Aliquots of the clear supernatants were diluted with the mobile phase in a 5 mL volumetric flask, and the solutions were analyzed by UV detection.Data recording and process recording were done by using a different software.Separation was carried out on a Nucleosil 100-5 C18 analytical column with mobile phase containing a different buffer at 300 ∘ C. Detection was performed at 366 nm.By using linear regression the determination of the active pharmaceutical ingredients content in tablets was done [48].

Terahertz Pulse Spectroscopy.
The terahertz pulse spectroscopy range is between 10 and 330 cm −1 and 300 GHz and 10 THz.Terahertz pulse spectroscopy is a nondestructive method for the analysis of polymorphic forms and crystalline state of active pharmaceutical ingredients.The terahertz radiation can pass deep through the packaging material as well as the container.Terahertz pulse spectrometer uses coherent generation and detection of femtosecond THz pulse, based on the principle of the Auston switch [49][50][51][52].
Chemometrics has found application in solving both descriptive and predictive problems in experimental life sciences.In descriptive applications, properties of chemical systems are modelled for understanding the relationships and structure of the system.In predictive applications, properties of chemical systems are modelled with the intent of predicting new properties or behaviour of the system.In both cases, the datasets can be small but are often very large and highly complex, involving hundreds to thousands of variables and hundreds to thousands of cases or observations.Chemometrics has been used for the determination of different pharmaceutical properties of powders, granules, and tablets as it provides an ideal method of extracting quantitative information from samples.Different applications of chemometric techniques in pharmaceutical sciences have been summarized in Table 2.

Powder Flow Properties.
The powders flow behavior is an important factor affecting a number of pharmaceutical processes such as blending, compression, filling, transportation, and scale-up operations.In tablets compression and capsules filling, an optimal powder flow is required to produce final products with an acceptable content uniformity, weight variation, and physical consistence.Some factors affecting powder flowability include physical properties of the powder, such as particle size and shape, the loading experienced by particles (gravity, interaction with air flow and container, etc.), the current state of the powder (i.e., tap, free flowing, etc.), and the processing environment (e.g., humidity).Powder flow properties are assessed by angle of repose, compressibility index or Carr's index, and the Hausner ratio.
Sarraguc ¸a et al. determined flow properties of pharmaceutical powders using near IR spectroscopy.The experimental results obtained were correlated with the NIR spectrum by partial least squares (PLS) optimized in terms of latent variables using cross-validation.NIR spectroscopy was proven to be the more advantageous method concerning time per analyses and in an economic level, since a spectrum is collected in 1 min and only one method is needed for determining a series of parameters.Moreover, the use of NIR spectroscopy is not only limited to physical properties determination, but also useful in the determination of chemical properties, for example concentration [53].
Kim et al. determined density of polyethylene pellets by transmission Raman spectroscopy.Transmission Raman spectra were collected for 25 different grades of polyethylene pellets and the partial least squares method was employed to determine the sample density.The correct sample representation of internally inhomogeneous polyethylene pellets by the transmission Raman measurement eventually improved the accuracy for density determination and sample-to-sample two-dimensional correlation analysis was used to further examine the origin of the improved accuracy [54].
Otsuka et al. aimed to develop a quick and accurate way to determine the pharmaceutical properties of granules and tablets in the formulation of pharmaceuticals by applying chemoinfometric NIR spectroscopy.To predict the pharmaceutical properties, NIR spectra of the antipyrine granules such as mean particle size, angle of repose, tablet porosity, and tablet hardness were measured and analyzed by principal component regression analysis.The mean particle size of the granules was found to increase from 81 m to 650 m with an increase in the amount of water, and it was possible to make larger spherical granules with narrow particle size distribution using a high-speed mixer [55].

Water Content of Pharmaceutical Excipients.
Water content of hygroscopic pharmaceutical excipients can largely affect the manufacturing processes and the performance of the final product.Szakonyi et al. study the water contents of three commonly used tablet superdisintegrants (crospovidone, croscarmellose sodium, and sodium starch glycolate).Water content determinations were based on strong absorption of water between 3700 and 2800 cm −1 ; other spectral changes associated with the different compaction of samples on the ATR crystal using the same pressure were followed by the infrared region between 1510 and 1050 cm −1 .The described method enables the water content determination of powdered hygroscopic materials containing homogeneously distributed water [56].

Dissolution Studies. Ambrus et al. carried out the inves-
tigation of the preparation parameters to improve the dissolution of the poorly water-soluble meloxicam.For improving the dissolution rate, the drug was formulated as a nanosuspension using different methods like emulsion diffusion, high-pressure homogenization, and sonication.Use of an SMCR method on the XRPD patterns of the nanosuspensions revealed the crystalline form of the drug and the strong interaction between meloxicam and the stabilizer.The rate of dissolution of the dried meloxicam nanosuspension was enhanced (90% in 5 min), relative to that of raw meloxicam (15% in 5 min), mainly due to the formation of nanosized particles [57].
Freitas et al. reported a comparison between dissolution profiles obtained by using a dissolution apparatus (conventional method) and the NIR diffuse reflectance spectra of a series of clonazepam-containing tablets.The percentages of dissolution of each sample were correlated with the NIR spectra of three tablets of each batch, through a multivariate analysis using the PLS regression algorithm.The squared correlation coefficients for the plots of percentages of dissolution from the equipment laboratory (dissolution apparatus and HPLC determination) versus the predicted values, in the leave-one-out cross-validation, varied from 0.80 to 0.92, indicating that the NIR diffuse reflectance spectroscopy method is an alternative, nondestructive tool for measurement of drug dissolution from tablets [63].Donoso and Ghaly demonstrated the NIR diffuse reflectance spectroscopy method as an alternative, nondestructive method for measurement of drug dissolution from tablets.They employed near infrared (NIR) reflectance spectroscopy to measure the percentage drug dissolution from a series of tablets compacted at different compressional forces, calibrated NIR data versus laboratory equipment data, developed a model equation, validated the model, and tested the model predictive ability.A series of model equations, depending on the mathematical technique used for regression, were developed from the calibration of the percentage of drug dissolution by using laboratory equipment versus the NIR diffuse reflectance for each formulation [64].
2.8.Tablet Parametric Tests.Donoso and Ghaly studied the application of NIR reflectance spectroscopy for the determination of disintegration time of theophylline tablets.Laboratory disintegration time was compared to near infrared diffuse reflectance data.Linear regression, quadratic, cubic, and partial least square techniques were used to determine the relationship between disintegration time and near infrared spectra.The results demonstrated that an increase in disintegration time produced an increase in near infrared absorbance and it was concluded that diffuse reflectance spectroscopy method is an alternative nondestructive method for measurement of disintegration time of tablets [65].
Tanabe et al. employed NIR spectroscopic methods for predicting hardness of the tablet formulations.The reflectance NIR spectra of various compressed tablets were used as a calibration set to establish a calibration model to predict tablet hardness by principal component regression (PCR) analysis.The relationship between the predicted and the actual hardness values exhibited a straight line, an  2 of 0.925 [58].
Otsuka and Yamane developed a method of prediction based on near infrared (NIR) spectra of raw mixed powders before compression by using chemometric means.The effect of difference in scale-up using a pilot-scale mixing machine and a continuous tableting machine was studied.The hardness and coefficient of variation were evaluated based on the NIR spectra of the raw powdered materials by principal component regression (PCR) in both lab-scale and pilot-scale experiments [59].
Morisseau and Rhodes worked on near infrared reflectance spectroscopy (NIRS) to evaluate and quantify the effect of compression force on the NIR spectra of tablets.An increase in tablet hardness produced an upward shift (increase in absorbance) in the NIRS spectra.A series of equations was developed by calibrating tablet hardness data against NIR reflectance response for each formulation.A NIRS method is presented which has the potential as an alternative to conventional hardness testing of tablets [66].
Otsuka et al. applied near infrared (NIR) spectroscopy with chemometrics to predict the change of pharmaceutical properties of antipyrine granules during granulation by regulation of the amount of water added.The granules were characterized by mean particle size, angle of repose, compressibility, tablet porosity, and tablet hardness as parameters of pharmaceutical properties.To predict the pharmaceutical properties, NIR spectra of the granules were measured and analyzed by principal component regression (PCR) analysis.The NIR chemometric method is expected to provide a rapid quantitative analysis of pharmaceutical properties, as characterized by the simple, nondestructive, and highly sensitive nature of the method [55].Donoso et al. researched the use of the near infrared diffuse reflectance method to evaluate and quantify the effects of hardness and porosity on the near infrared spectra of tablets.The results demonstrated that an increase in tablet hardness and a decrease in tablets porosity produced an increase in near infrared absorbance.Series of model equations depending on the mathematical technique used for regression were developed from the calibration of hardness and porosity data using laboratory equipment versus the near infrared diffuse reflectance for each formulation [67].
Kirsch and Drennen proposed a new algorithm for testing tablet hardness using near IR spectroscopy.Different tablets having 1-20% w/w of cimetidine were used.Both tablet hardness and sample positioning were found to affect the spectral baseline.A destructive diametral crushing test was carried out on each tablet using a calibrated hardness tester, after recording the spectra from the unscored face of the tablets.The study was carried out using excipients showing plastic deformation (sodium chloride and microcrystalline cellulose) and brittle fracture (dibasic calcium phosphate dihydrate and lactose).After compression of the blends, one punch was removed and a fiber-optic probe was inserted into the die and the NIR spectra were recorded.Principal component analysis/principal component regression and spectral best-fit method were compared using two different approaches [68].
Otsuka and Yamane elucidated the effect of lubricant mixing on tablet hardness by near infrared (NIR) chemometrics as a basic study of process analytical technology.NIR spectra of raw mixed powders of F-L and F-C were taken using a reflection type of Fourier transform NIR spectra spectrometer, and chemometric analysis was performed using principal component regression (PCR).This approach to predict tablet hardness prior to compression could be used as a routine test to indicate the quality of the final product without spending time and energy to produce samples of questionable quality [69].
Ebube et al. evaluated a method using NIR spectroscopy as a nondestructive technique to differentiate three microcrystalline cellulose forms in powdered form and in compressed tablets.Avicel grades PH-I0l, PH-102, and PH-200 were evaluated in their study.The developed technique was able to identify both in powdered form and in compressed tablets the three different Avicel grades.The result was not affected by the presence of a lubricant, magnesium stearate.They also successfully developed a method for the determination of magnesium stearate by a multiple linear regression method [70].
Chen et al. used artificial neural network and partial least squares models for predicting the drug content in theophylline tablets.A better prediction of drug content was observed with a partial least squares model than with an artificial neural network model for drug content greater than or equal to 5% w/w whereas the artificial neural network model showed better results than the partial least squares model at less than or equal to 2% w/w theophylline content [71].
Luciana et al. applied chemometric tools, such as principal component analysis (PCA), consensus PCA (CPCA), and partial least squares regression (PLS), to a set of forty natural compounds, acting as NADH-oxidase inhibitors.The calculations were performed using the VolSurf+ program.The formalisms employed generated good exploratory and predictive results.The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.The significant results from CPCA, PCA prediction, and PLS discriminant models can be helpful for designing new antichagasic agents acting as NADH-oxidase inhibitors.The VolSurf descriptors showed that the presence and the unbalance of the hydrophilic profile in relation to the total molecular surface, and also a hydrophobic profile, are strongly correlated to the biological data [72].
Sanni et al. employed ATR-FTIR spectra using two multiway modelling techniques, parallel factor analysis (PARAFAC) and multilinear partial least squares (N-PLS), for the determination of drug and excipient distribution in a tablet.The N-PLS calibration method was more robust for accurate quantification of the amount of components in the sample whereas the PARAFAC model provided approximate relative amounts of components [60].
Cogdill et al. investigated efficient procedures for generating multivariate prediction vectors for quantitative chemical analysis of solid dosage forms using terahertz pulse imaging reflection spectroscopy.A set of calibration development and validation tablet samples was created following a ternary mixture of anhydrous theophylline, lactose monohydrate, and microcrystalline cellulose (MCC).Spectral images of one side of each tablet were acquired over the range of 8 cm −1 to 60 cm −1 .Calibration models were generated by partial least squares (PLS) type II regression of the spectra and by generating a pure-component projection (PCP) basis set using net analyte signal (NAS) processing.Sensitivity was observed for the PLS calibration over the range of all constituents for both the calibration and the validation datasets; however, some of the calibration statistics indicate that PLS overfits the spectra [61].
Kong et al. investigated the concentration of triglyceride in humans by using florescence spectroscopy with the region 220-900 nm.Nonlinear partial least squares regression with cubic B-spline-function-based nonlinear transformation was employed as the chemometric method.Wavelengths within the region of 300-367 nm and 386-392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model [62].
Cordeiro et al. conducted multivariate spectroscopic determination of lamivudine-zidovudine associations by partial least square regression (PLS).The multivariate methodology was validated according to the International Conference on Harmonization (ICH) criteria, demonstrating precision, accuracy, and robustness within legal requirements [73].
Tanabe et al. predict the hardness of the tablet.To check the variation in the tablet hardness due to different compression forces NIR spectroscopy is mainly used for different formulation.In this study tablets (200 mg, 8 mm diameter) consisting of berberine chloride, lactose, and potato starch were made with different compression pressures (59,78,98,127,and 195 MPa).Different parameters like distribution of micropores and hardness were measured.For the establishment of the calibration model, the reflectance NIR spectra of different compressed tablets were used as the calibration set for the prediction of the tablet hardness by using principal component regression (PCR) analysis.With increase in compression pressure pore volume will decrease and tablet hardness will increase.With the use of appropriate chemometric technique, change in hardness of tablets with change in compression pressures, could be predicted.A straight line was obtained between the relationship of predicted and actual hardness,  2 of 0.925.In this study investigation of the standard error of cross-validation (SEV) values, the loading vectors of each PC, and the regression vector was done to understand the theoretical analysis of the calibration models.Results show that the regression vector involves chemical and physical factors.Porosity can also be determined.A straight line is obtained even in the relationship between predicted and total pore volume,  2 = 0.801 [63].
Kemper et al. studied quantitative analysis of an active ingredient in a translucent gel formulation using Fourier transform near infrared (FT-NIR) spectroscopy.Different ratios of Carbopol 980 with 0%, 1%, 2%, 4%, 6%, and 8% ketoprofen were taken to prepare the gel and then analyzed with an FT-NIR spectrophotometer operated in the transmission mode.The correlation coefficient of the calibration was 0.9996, and the root mean squared error of calibration was 0.0775%.The percent relative standard deviation for multiple measurements was 0.10% [74].
Meza et al. performed nondestructive determination of drug content in tablets with less than 1% weight of active ingredient per weight of formulation (m/m) drug content by using the transmission near infrared (NIR) spectroscopic method.Different drug concentration ratios ∼0.5%, 0.7%, and 1.0% (m/m) and ranging in drug content from 0.71 to 2.51 mg per tablet were taken to manufacture tablets.The NIR spectra were obtained for 110 tablets which constituted the training set for the calibration model developed with partial least squares regression.The final calibration model included the spectral range from 11 216 to 8662 cm −1 , the standard normal variate, and first derivative spectral pretreatments.This model was used to predict an independent set of 48 tablets with a root mean standard error of prediction (RMSEP) of 0.14 mg and a bias of only −0.05 mg per tablet.The study showed that transmission NIR spectroscopy is a viable alternative for nondestructive testing of low drug content tablets [75].
Otsuka and Kinoshita carried out a study to predict the hydrate content in powder materials consisting of anhydrate (theophylline anhydrate (THA)) and theophylline monohydrate (THM) using various kinds of X-ray powder diffraction (XRPD) analytical methods.A profile was measured five times each for 11 standard samples containing THA and THM.THM content in the standard samples was evaluated based on XRPD profiles by the diffraction peak height and area methods and the Wakelin's and principal component regression (PCR) methods.In the result based on validation XRPD datasets, the order of the mean bias and the mean accuracy was peak height > peak area > Wakelin's > PCR, indicating that PCR was the best method to correct sample crystal orientation [76].
Maggio et al. monitor the dissolution of a pharmaceutical preparation containing two active ingredients, as a tool for the simultaneous determination of the dissolution curves and dissolution profiles of the latter.According to ICH guidelines, the suitability of the calibration procedure for quantitating the dissolved drugs was assessed with regard to linearity in the working range, specificity, accuracy, and precision.To demonstrate the usefulness of the proposed system, the dissolution profiles of different lots of HCT-BIS tablets were acquired and three of them were conveniently compared at a 31 data point level, employing the f1 and f2 ("difference" and "similarity") indexes.The use of multiple data points for comparison ensured reliability of the results [77].
Markopoulou et al. determine the perphenazine, amitriptyline hydrochloride, and imipramine hydrochloride using the information of the absorption spectra of appropriate solutions.The concentration of each component is then determined from their respective calibration graphs established by measuring the ratio derivative analytical signal at a specific wavelength.In this method, the linear determination ranges were of 3.65-18.24g/mL for PER, 4.32-21.60g/mL for AMI, and 4.83-24.19g/mL for IMI.The results were compared with those obtained by the partial least squares multivariate calibration (PLS) method pretreated by a wavelet compressionorthogonal signal correction (W-OSC) filter in zero-order derivative spectra.The calibration model was evaluated by internal validation (cross-validation) and by external validation over synthetic mixtures, content uniformity, and dissolution tests.According to the dissolution profile test more than 95% of the three substances were dissolved within 10 min.The results from both techniques were statistically compared with each other and can be satisfactorily used for quantitative analysis and dissolution tests of multicomponent tablets [78].
Otsuka et al. performed quantitative determination of the crystal content of indomethacin (IMC) polymorphs, based on Fourier transform near infrared (FT-NIR) spectroscopy.The conventional powder X-ray diffraction method was performed for collecting direct comparison of data.Powder X-ray diffraction profiles and NIR spectra were recorded for six kinds of standard material with various contents of the g form of IMC.Principal component regression (PCR) analyses were performed on the basis of the normalized NIR spectral sets of standard samples with known contents of the g form of IMC.A calibration equation was determined to minimize the root mean square error of the prediction.The results indicated that NIR spectroscopy provides an accurate quantitative analysis of crystallinity in polymorphs compared with the results obtained by conventional powder X-ray diffractometry [79].
Tatavarti et al. study the (NIR) spectroscopy for the determination of content uniformity, tablet crushing strength (tablet hardness), and dissolution rate in sulfamethazine veterinary bolus dosage forms.Sulfamethazine, corn starch, and magnesium stearate were used in formulation.Paste of starch 10% (w/v) was formed by wet granulation in a high shear granulator and dried at 60 ∘ C in a convection tray dryer.Principal component analysis (PCA) of the NIR tablet spectra and the neat raw materials indicated that the scores of the first 2 principal components were highly correlated with the chemical and physical attributes.Based on the PCA model, the significant wavelengths for sulfamethazine are 1514, (1660-1694), 2000, 2050, 2150, 2175, 2225, and 2275 nm; for corn starch are 1974, 2100, and 2325 nm; and for magnesium stearate are 2325 and 2375 nm.The PLS validation set had an  2 of 0.9662 and a standard error of 0.0354.PLS calibration models, based on tablet absorbance data, could successfully predict tablet crushing strength and dissolution in spite of varying active pharmaceutical ingredient (API) levels.Prediction plots based on these PLS models yielded correlation coefficients of 0.84 and 0.92 on independent validation sets for crushing strength and Q120 (percentage dissolved in 120 minutes), respectively [80].Doucet et al. predict the formulation excipients and active pharmaceutical ingredient (API) in a complex pharmaceutical formulation.Molecular LIBS (Laser Induced Breakdown Spectroscopy) has been demonstrated to be the first successful approach using atomic spectroscopy to evaluate a complex organic matrix.The accuracy for the API and a formulation lubricant, magnesium stearate, has less than 4% relative bias.The other formulation excipients such as Avicel and lactose have been accurately predicted to have less than a 15% relative bias.Molecular LIBS and chemometrics have provided a novel approach for the quantitative analysis of several molecules which was not technically possible with the traditional atomic LIBS procedure that required sensitive elements to be present in both API and formulation excipients [81].

Conclusion
Various chemometric models have been applied for the analysis of data of a particular manufacturing process, quality control test, or an instrumental output data with an aim to achieve maximum accuracy, precision, and robustness.The chemometric methods are expected to provide a rapid quantitative analysis of pharmaceutical properties of intermediate and finished dosage forms as characterized by the simple, nondestructive, and highly sensitive nature of the method.Pharmaceutical industrial viability of chemometric techniques could range from setting quality control specifications for raw material, powders, and dosage forms to control of various manufacturing processes and steps.The implementation of chemometric techniques with a view of ensuring overall production process control entails the use of analytical techniques capable of providing accurate results in a simple and rapid manner.

Figure 1 :
Figure 1: Applications of chemometrics in pharmacy and medical sciences.

Figure 2 :
Figure 2: Different techniques to apply chemometrics on the basis of clustering, regression and explorative methods.

Table 1 :
Special cases of Principal Component Analysis.

Table 2 :
Different applications of chemometric techniques in pharmaceutical sciences.