The method based on conventional index and UV-vision has been widely applied in the field of water quality abnormality detection. This paper presents a qualitative analysis approach to detect the water contamination events with unknown pollutants. Fluorescence spectra were used as water quality monitoring tools, and the detection method of unknown contaminants in water based on alternating trilinear decomposition (ATLD) is proposed to analyze the excitation and emission spectra of the samples. The Delaunay triangulation interpolation method was used to make the pretreatment of three-dimensional fluorescence spectra data, in order to estimate the effect of Rayleigh and Raman scattering; ATLD model was applied to establish the model of normal water sample, and the residual matrix was obtained by subtracting the measured matrix from the model matrix; the residual sum of squares obtained from the residual matrix and threshold was used to make qualitative discrimination of test samples and distinguish drinking water samples and organic pollutant samples. The results of the study indicate that ATLD modeling with three-dimensional fluorescence spectra can provide a tool for detecting unknown organic pollutants in water qualitatively. The method based on fluorescence spectra can be complementary to the method based on conventional index and UV-vision.

Water pollution problem is attracting more and more attention, and monitoring of water quality is important in order to avoid health risk to residents. It is necessary to develop a technique for rapid water quality analysis in the case of unknown contaminants because of deterioration of water resources, indiscriminate discharge of wastewater, chemical leakage, and so on.

Currently, water pollution abnormality detection mainly depends on conventional water quality parameters. In the conventional detection, water anomaly detection methods are mainly based on traditional indexes of water quality. For example, Conde [

Three-dimensional fluorescence spectra have lower detection limit of organic matter than UV-visible spectra [

The aim of this paper was to make qualitative detection of unknown pollutants for early warning when sudden pollution event happens in water supply system by using three-dimensional fluorescence spectra to make up for the lack of conventional method and UV-vision method in the water pollution qualitative detection. Fluorescence data were analyzed using matrix feature extraction method based on alternating trilinear decomposition (ATLD) and residual sum of squares, combining the threshold to judge the unknown samples and to detect the aqueous samples of organic contaminants.

The basic idea of the paper is to establish a model of normal water sample. If a test sample does not conform to the model, it will be determined as an anomaly sample. Before the establishment of the model, the data need to be pretreated. Then, the ATLD algorithm is used to obtain the model. The model is applied to test samples to obtain the model data, and the model matrix is compared to the measured matrix to obtain the residual matrix as a basis of judging whether the water samples are abnormal or not. The main process of the method is shown in Figure

Qualitative discrimination based on ATLD.

Three-dimensional fluorescence spectra could contain not only the fluorescence of the substance to be tested but also Rayleigh and Raman scattering. Since that scattering part does not meet the requirements of trilinear decomposition algorithm theory, it is not appropriate to analyze the scattering part with decomposition model. So it is essential to eliminate the scattering of the three-dimensional fluorescence spectra. Some researches eliminated the scattering by taking out the background of distilled water, which may still remain some scattering [

When measuring the Raman spectra of ultrapure water on the excitation wavelength of 350 nm and the emission wavelength of 397 nm before the experiment starts up, it is discovered that there exists difference among the measurements. Consequently, in order to eliminate the difference, this paper did Raman normalization [

Alternating trilinear decomposition (ATLD) algorithm, put forward by Wu et al. [

Below is the trilinear model of a three-dimensional data matrix

Graphical representation of trilinear model of three-way data array

First, ATLD gives matrices

Matrices

The residual sum of squares (SSR) is the loss function defined by ATLD, which is

As a result, ATLD could decompose the three-dimension fluorescence spectrum matrix

Model parameters are obtained with ATLD algorithm, which is known as relative excitation matrix

Known by formula (

The residual sum of squares is

The method based on threshold is often used for image segmentation because of its simple calculation. The grayscale of the image is usually divided into several parts through one or several thresholds and pixels belonging to the same part are considered as the same object. This paper separates test samples into two parts. One is normal drinking water samples, and the other is organic pollutant samples.

This paper aims at setting reasonable threshold of the object sequence for qualitative discrimination. In other words, through setting the threshold to the residual sum of squares, the aim that a new unknown sample is qualitatively identified can be reached. The setting of the threshold is a critical problem. If the threshold is too large, the polluted water sample cannot be detected; on the other hand, if the threshold is too small, the drinking water may be detected as the polluted water sample by mistake.

In the math analysis, the mean and standard deviation are often used to indicate the important characteristics of the data set.

Three times of the standard deviation of qualitative discrimination object sequence is usually used to detect the test sample qualitatively. Byer and Carlson [

In this paper, the standard for the setting of threshold is that false positive rate of drinking water samples is under 10%, so two times of the standard deviation of the residual sum of squares of 10 background drinking water samples is used as the threshold for qualitative discrimination. Explanatorily, the residual sums of squares of 10 drinking water samples

One group of samples was the three kinds of organic solution with fluorescent characteristics in the concentrations of 2

Three-dimensional fluorescence spectra measured by the spectrometer contain Rayleigh and Raman scattering, as shown in Figure _{ex} = 350 nm, λ_{em} = 397 nm. The charts before and after the pretreatment are shown in Figure

Three-dimensional fluorescence spectra of background drinking water samples. (a) is the spectrum before pretreatment. (b) is the spectrum after pretreatment.

ATLD was applied to background drinking water samples to establish the model. Because of the complexity of the composition of water samples, the number of factors in the ATLD model is hard to identify. However, there are two peaks in the spectra of drinking water after pretreatment, as in Figure

Relative excitation matrix and relative emission matrix of drinking water samples after ATLD. (a) is relative excitation matrix. (b) is relative emission matrix.

Relative excitation matrix

Fluorescence characteristics of background drinking water and rhodamine B solution samples. (a), (b), and (c) are, respectively, modeling value, measured value, and residual value of drinking water. (d), (e), and (f) are, respectively, modeling value, measured value, and residual value of 30 ug/L rhodamine B solution.

The measured value subtracting the modeling value of spectra matrix can obtain residual matrix (Figure

For organic matter samples, three-dimensional fluorescence spectra can be considered as superposition of drinking water spectra and organic matter spectra. Combined with Figure

According to the residual matrix, the residual sums of squares of background drinking water samples and test samples are, respectively, calculated, as a detecting target sequence of qualitative determination. The residual sums of squares of one group experiment data are shown in Figure

The residual sums of squares. “

The mean and standard deviation of residual sums of squares of drinking water sample were calculated, and then the residual sum of squares of each unknown test sample subtracted the mean of background water samples. If the difference is larger than the threshold, the sample is judged to be organic pollutant sample; otherwise, drinking water sample. The result of qualitative discrimination based on the threshold depends largely on the threshold selection. As is mentioned in the second part of this paper, the standard of selecting the threshold is that false-positive rate of background drinking water samples is less than 10%. So two times of standard deviation of drinking water samples is selected as the threshold. The experiment result is shown in Table

The detecting result of the samples.

Group | Mean | Standard deviation | Total number of OM samples | Total number of detected | Detection rate |
---|---|---|---|---|---|

1 | 1844.39 | 301.99 | 24 | 22 | 91.67% |

2 | 2017.03 | 310.34 | 24 | 20 | 83.33% |

3 | 1510.16 | 223.26 | 24 | 20 | 83.33% |

4 | 1617.39 | 303.89 | 27 | 20 | 77.78% |

5 | 1679.49 | 325.21 | 27 | 25 | 92.57% |

It can be seen from the result of qualitative discrimination that the detection rates of organic matter samples are all more than 77.78%. 107 organic matter samples are detected from the total 126, so the detection rate of all five groups of experiments is 84.92%. Some contaminants having similar spectra with drinking water were not detected, because the spectral peaks of contaminants overlap with those of drinking water. The result of detection rate in group 4 is not good. The possible reason is that the scattering part was not removed completely in many water samples of group 4, which resulted in the model established in group 4, and was not accurate enough because the scattering part cannot be explained by the trilinear model.

This paper studied a problem of unknown contaminant qualitative detection in water but not a qualitative problem of known contaminant detection. A method based on ATLD and threshold was applied to analyze three-dimensional fluorescence spectra in order to detect the unknown organic pollutants with fluorescent characteristics in the water. ATLD algorithm was used to establish the model of normal water sample, and the residual matrix was obtained through the difference of the model matrix and the measured matrix. The residual sum of squares was calculated according to the residual matrix and compared with the threshold to judge the test sample which was an organic pollutant sample or a normal water sample. In order to verify the theory, the experiments of analyzing the spectra of water samples and organic contaminant samples were launched.

The result shows that ATLD model extracting feature can be used to qualitatively discriminate whether test samples are polluted if the pollutants are with fluorescent characteristics. However, the detection rate of qualitative discrimination method based on the ATLD model fluctuated. This may be related that the method of extracting feature based on ATLD and residual sum of squares is a method depending on the background samples, that is, the result of the method is related to the quality and quantity of the samples.

In general, qualitative discrimination method based on ATLD and residual sum of squares, combining the threshold, can be used to detect the unknown organic pollutants with fluorescent characteristics in drinking water, but the detection rate is low which can be a problem worthy of further study. Besides, the method proposed in the paper does not apply to the situation of overlapped peaks well, and this will also be the work of further research.

The authors declare that they have no conflicts of interest.

This work was funded by the National Natural Science Foundation of China (no. 61573313) “Online water-quality anomaly detection, classification, and identification based on multi-source information fusion” and (no. U1509208) “Research on big data analysis and cloud service of urban drinking water network safety,” and the Key Technology Research and Development Program of Zhejiang Province (no. 2015C03G2010034) “Research on intelligent management and long-effective mechanism for river regulation and maintenance.”