The Bridge between Screening and Assessment: Establishment and Application of Online Screening Platform for Food Risk Substances

,


Introduction
With the development of the market economy and the improvement of the country's overall strength, China, the largest food producer and consumer since 2010 has a gradually increasing food quality. But because of the large amount of food consumption and the long food industrial chain, China has witnessed numerous food safety incidents, which have aroused widespread concern in society. e Chinese government has increased the monitoring of food risks through a series of policies and measures and has established a food safety risk management and control mechanism based on source control, process control, and end-product monitoring. In the mechanism, a sampling inspection and risk-screening system have been established at the technical level. is greatly improves the ability of food safety management and control and significantly improves food safety issues [1].
As the basic and supporting technology of food testing, instrumental analysis technology has developed rapidly in recent years. Liquid chromatography (LC) and gas chromatography (GC) have excellent performance in the separation of compounds. In view of the high selectivity and high sensitivity of mass spectrometry (MS) in the qualitative and quantitative analysis of trace substances, many countries rely on GC-MS and LC-MS [2][3][4] and other analytical techniques in the detection and screening of food risk substances. LC-MS technology has a wide range of analysis, and it can detect almost all compounds, thus solving the problem that GC cannot analyze thermally unstable compounds. It has a strong ability to separate substances, even if the analyzed mixture is not completely separated. It can also perform qualitative and quantitative analysis through characteristic ion mass chromatograms to obtain the structural information and molecular weight of each component. e detection sensitivity is high, and sample detection at the microgram level is possible. e analysis time is short, and the detection time of a single sample is generally less than 15 minutes, which can significantly shorten the analysis time [5][6][7][8][9]. When using the LC-MS technology to detect and screen food risk substances, in addition to relevant equipment for detection, it also needs to rely on professional screening software that includes compound standard MS databases of compounds and comparison algorithms [10][11][12]. At present, most of the inspectors in various countries are limited to professional screening software provided by various instrument and equipment manufacturers when carrying out the screening and comparison of food risk substances. e standard MS database contained in this screening software is not only expensive but also unable to cover all of them. Screening procedures for risk substances are cumbersome, and there are various problems such as the high cost of manpower and material resources [13,14]. In the context of the wide variety of substances at risk for food safety and the lack of professional network sharing databases, the establishment of a universal cross-instrument brand high-performance liquid chromatography/high-resolution mass spectrometry sharing screening software used for quickly screening for risk substances in food has become a major subject of research by food safety regulatory technical support institutions [15][16][17].
In view of the technical bottlenecks encountered by food inspection agencies in the screening of food risk substances, the relevant team of the National Institutes for Food and Drug Control conducted extensive investigation and research and used integrated technologies such as high-performance liquid chromatography/high-resolution mass spectrometry, the Internet, and big data [18,19]. It finally established a food risk substance screening platform for food inspectors across the country, which has been officially launched.
e platform refers to the European Union's analytical method guidelines [20], which aim to qualitatively analyze unknown compounds in mass spectrometry files from different instrument manufacturers. When carrying out the screening of food risk substances, the inspectors preprocess the relevant food samples according to the screening preprocessing technical standards researched and formulated by the National Institutes for Food and Drug Control. High-performance liquid chromatography/highresolution mass spectrometry is then used to perform the detection. After testing, the generated data files are uploaded to the online comparison module of the screening platform through the Internet. e online comparison module calls the screening model for real-time analysis and comparison and then sends back the screening results to the inspectors. e inspectors refer to the screening results and combine other information to make comprehensive judgments to complete the preliminary screening of risky substances. e platform can automatically identify the original mass spectrometry files of instruments from various brand manufacturers and perform a unified data format conversion; hence, there is no restriction on the brand and version of the instrument. e standard library of the platform can be jointly built and shared by cooperating laboratories, which can effectively enrich the types of food risk substances in the database and has good scalability. e screening model of the platform is based on the SS combination algorithm, and the algorithm has been optimized and improved through a large number of screening comparison experiments, which effectively guarantee the accuracy and scientific nature of the screening results given by the platform. e platform adopts the Internet online screening method, which is more efficient than the traditional risk-screening work mode and can greatly facilitate the risk investigation and control work of food safety inspection agencies.

Materials and Methods
e platform consists of three parts: a standard spectrum library, screening model, and online result comparison module. e standard spectrum library serves as the underlying basic database for risk screening. e screening model is used for screening and comparing the risk substances. e online result comparison module allows users to upload spectrometry files and obtain screening results in real time. Java language is used in the page development of the platform, and the mainstream technologies such as SpringBoot (https://spring.io/projects/spring-boot) and jQuery (https://jquery.com/) are applied. e underlying model is developed through Python, mainly using thirdparty libraries such as pymzML and Pandas [21].

Standard Spectrum Library.
e platform builds a standard spectrum library based on high-resolution MS data for 527 banned and restricted compounds found in food matrixes [22,23]. At present, the spectrum library mainly integrates the standard spectral data of Agilent brand instruments, which mainly covers the mass-to-charge ratio of the parent ion and the mass-to-charge ratio of the first 15 second-order fragment ions, as well as the corresponding relative peak intensity, retention time, and some basic information of the compounds. e content of the high-resolution spectrum library with methomyl used as an example is shown in Table 1.

Screening Model.
e screening model is the core of the whole platform, and the screening comparison algorithm is the core of the screening model, which is obtained by improving the existing spectral library search algorithm, specifically, SS combination algorithm. e SS combination algorithm, proposed by Stein and Scott, includes the cosine similarity algorithm [24] (also called the weighted dotproduct algorithm), represented here as SC (Uw, Vw), and the peak ratio algorithm, represented here as SD (Uw, Vw) [25,26]. e calculation formula of the cosine similarity algorithm is expressed as follows: where V represents the compound in the library, U represents the unknown compound, ω is the mass-to-charge ratio and peak intensity information, and U and V are the matrix form of ω. ω is obtained by multiplying the mass-to-charge ratio and relative peak intensity of the compound by taking the exponent of a weighting factor. e calculation formula of ω is expressed as follows: where x � 1.3 and y � 0.53 are weighting factors. α and β refer to the mass-to-charge ratio and relative peak intensity, respectively. e calculation formula of the peak ratio algorithm is expressed as follows: where u i and v i are nonzero peaks with the same mass-tocharge ratio. When the peak value of the former is smaller than the latter, n � 1; otherwise, n � −1. Finally, the SC and SD are, respectively, multiplied by the corresponding weights and then combined to calculate the final similarity. e calculation formula is as follows: Compared with the SS combination algorithm proposed by Stein and Scott, the improved combination algorithm has a larger difference in the strength of the same mass-to-charge ratio of the different spectra when the similarity of the mass spectra is low. In this case, the peak ratio calculation is preferred. When the degree of similarity is high, the number of the same mass-to-charge ratio increases, and the gap between the corresponding intensities of the same mass-tocharge ratio decreases. In this case, the cosine similarity calculation is preferred to further improve the similarity between the mass spectra. e premise of similarity calculation is to determine whether the parent ion is the same as the parent ion of the compounds in the standard spectral library. If the error of the parent ion is within 2 mDa, then it is considered the same. It is necessary to further compare the fragment ions and calculate the similarity and then combine with the relative retention time difference to select the best matching result with higher similarity and lower relative retention time difference. If considered as different, the mass spectrum is ruled out directly and no subsequent calculation would be performed.

Online Result Comparison Module.
e online result comparison module is developed and constructed using web technology. e front end uses the components including jQuery, Echarts, ayUI, and JSmol, and the back end uses frameworks [27,28] including SpringBoot, SpringMVC, SpringSecurity, and Mybatis (http://blog.mybatis.org/). e module includes the pages such as file uploading (shown in Figure 1), a summary of screening results (Figures 2 and 3), a detailed comparison of screening results (shown in Figures 4-6), and a basic information display of compounds (Figure 7). Its main function is to upload the mass spectrometry file to be screened, call the background screening model for comparison, and return the screening comparison results through the web page in real time. After the inspectors upload the file, the platform will call the data standardization software to convert the uploaded MS file e data standardization software ProteoWizard [29] supports data standardization for mass spectrometry files generated by mainstream mass spectrometer manufacturers [14]. us, the construction and application of the platform are not limited by specific brand instruments. After the spectrometry file conversion is completed, the system calls Python's pymzML library to parse the mzML format file and reads the information of the parent ions and their corresponding fragment ions, such as peak intensity, retention time, and high-resolution accurate mass-to-charge ratio. It then calls the screening model to compare the unknown spectrum with the standard spectrum library. It should be noted that when preparing the data, the inspectors should preprocess the sample according to the specific standard procedures and confirm that the high-resolution LC/MS instrument used has been calibrated with good performance. ey should also follow the recommended instrument method to collect data. e online result comparison module realizes the interaction between the user and the server through the file stream and the data stream. e user uploads the test files through the file stream. Since most of the test files uploaded are large, the platform adopts Conris Ultra-High-Speed Library name: Pesticide and Veterinary Drug Residue Database Tips: (1). In order to make the matching results more scientific and have a higher accuracy rate, you can perform experiments under the same conditions according to the experimental conditions of the data acquisition in the library, and then upload the acquired data for analysis. (2). If the experimental conditions of the data you upload are inconsistent with the given, it will have a certain impact on the accuracy of the results.  Transfer Protocol [30] instead of the traditional FTP transfer protocol in order to improve the upload speed and greatly improves the speed of file upload. After a series of operations such as file conversion, data analysis, result sorting, and result display, the screening platform renders the screening results via various graphics in the form of data flow on the basic information display page for users to read. On the basis of the screening results, the user can determine whether the test files contain risk substances and accordingly make the preliminary determination whether the tested food is qualified.
Each piece of information displayed on the screening results summary page includes the precursor ion, molecular formula, CAS number, and retention time of the unknown compound and the matched compound in the standard library. In the screening results, there may be a situation in which an unknown compound matches multiple compounds in the standard library. e inspector can preliminarily judge the most likely compound based on the matching score and the retention time difference between the unknown and the matched compounds. e detailed comparison page of the screening results displays the 2D bar chart of comparison and 3D bar chart of comparison of the unknown and the matched compounds. e inspector can visually observe the similarities and differences between the two. rough viewing diagrams of 2D and 3D molecular geometry and basic compound information (including the relevant physical and chemical properties of the matched compound and various information such as inspection standards and methods) of the matched compound, the inspector can have an intuitive and detailed understanding of the matched compound. According to the information displayed on the platform, the inspector can preliminarily judge whether the tested sample contains risky substances, which can guide subsequent experiments to obtain scientific judgment results more quickly. In the 2D bar graph, the abscissa, ordinate, and upper half and lower halves of the graph, respectively, represent the m/z ratio, relative peak intensity, unknown substances, and matched substances.

Model Validation.
e platform uses a series of comparison methods to evaluate the screening model and then optimizes and adjusts the model based on the evaluation results. e comparison method screens the test files to be screened via the platform and the professional screening software of the corresponding manufacturer, compares the screening results, and then calculates the accuracy of the screening model. e calculation formula is as follows: After the first model was constructed, eight test files with high resolution were uploaded to the platform for comparison.
e screening results revealed the following: first, there were false-negative results in the screening results, namely, the compounds contained in the test files were not included in the screening results and, second, the isomers were not completely distinguished.

Model Optimization.
To solve these problems, the research team optimized the model according to three technical directions: first, the number of selected spectra was reduced. Because each test file contained thousands of spectra, the more the spectra were selected initially, the more the screening results were obtained later, and the more difficult it was to select the best-matched results. e efficiency of the screening model would be greatly reduced if all the spectra were analyzed. erefore, measures were taken to reduce the number of spectra corresponding to each parent ion selected from the test files for optimization. Specifically, the total energy of the spectra was sorted, and the spectrum with higher energy was selected for analysis. Before the model optimization, 30 spectra at most could be selected for one parent ion, but now 20 spectra at most are selected for one parent ion. Second, we took into consideration the similarity and retention time difference (the difference between the retention time of the mass spectrum and that of the compared compound in the standard spectrum library) to optimize the model to avoid the deviation of a single factor.
ird, we increased the matching number of secondary fragment ions. According to the EU analytical method guidelines, if two compounds have the same precursor ion and have at least one same secondary fragment ion, then it can be determined that the two compounds are most likely to be the same compound. However, the limited number of the same fragment ions can affect the accuracy of model screening, and some isomers can produce the same fragment ions [31,32]. e isomers can be distinguished effectively by taking the method that at least two secondary fragment ions are the same under the premise of the same parent ion.

Model Revalidation.
After the team optimized the model, they verified the screening model again. ey uploaded the previous eight high-resolution test files for screening comparison. Screening results show that the proportion of compounds successfully identified by the model increased to 97.29%. e comparison of the two screening results is shown in Table 2.

Conclusion
e platform established in this paper has become stabilized after several times of model optimizing and testing. Currently, the first phase of the platform construction has been basically completed, and the platform has entered into smallscale trials. e present trials show that, by using this database, more than 300 banned and restricted compounds have been discovered in the actual food samples of daily monitoring and inspection. e platform has shown higher screening and identification for unknown compounds. It will continue to increase the standard spectrum library data of compounds; further expand the scope of screening; and continue to promote coconstruction, sharing, and verification through cooperative laboratories. e construction of a food risk substance screening platform based on high-performance liquid chromatography/high-resolution mass spectrometry, the Internet, big data, and other technologies provides a new technical means for food safety risk management and control. It also builds a bridge between screening and risk assessment of risk substances and illegally added substances. It facilitates the fullchain online risk screening of food production and circulation, and it provides solid technical support for the intelligent supervision and inspection of food safety. It is reasonable to expect that this technology platform has a wider application prospect. It is a new exploration to combine computer technology and spectrogram technology to create an online spectrogram real-time screening and comparison platform that is not subject to the limit of the instrument brand. It can be carried out not only in the food industry but also in various industries such as cosmetics, chemical industry, and environment industry to establish online spectrogram screening and comparison systems for all related industries to serve the industry risk management and control.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.