Machine Learning Integrated Multivariate Water Quality Control Framework for Prawn Harvesting from Fresh Water Ponds

Water contamination, temperature imbalance, feed, space, and cost are key issues that traditional fsh farming encounters. Te aquaculture business still confronts obstacles such as the development of improved monitoring systems, the early detection of outbreaks, enormous mortality


Introduction
Water quality monitoring is considered crucial for fsh farming. Several studies have found that measuring dissolved oxygen is crucial to sidestep high values of water quality, which may result in serious harm to fsh such as anoxia, hyperoxia, as well as hypoxia [1]. Te term "water quality monitoring" refers to the process of collecting samples of water and analysing them. In order to assess if we are succeeding in cleaning up our waterways, it is crucial that we monitor the quality of the water. It indicates the condition and make-up of streams, rivers, and lakes both in the present and over the course of days, weeks, and years. Te fve parameters of dissolved oxygen, pH, temperature, salinity, and nutrients are used to gauge the quality of water. In aquaculture, it is unavoidable to be more intelligent in terms of 24 × 7 monitoring of water quality and precise feeding. However, the bacterial balance in the aquaculture environment may be disrupted as a result of 24 × 7 monitoring of water quality, thereby reducing the disease-resistant capabilities of fsh [2]. Traditional aquaculture farming relied on experienced aquafarmers' observation and empirical judgement to identify and forecast farm health risks. Monitoring changes in water quality factors, namely dissolved oxygen, pH, temperature, and salinity, as well as others that are known to have a negative impact on the aquaculture environment is one element. For optimum fsh farming, lakes and fsheries must maintain ideal amounts of dissolved oxygen. Fish producers should concentrate on steps to guarantee that sufcient amounts of dissolved oxygen are maintained in addition to feeds and fertilizer. Keep the pond free of undesirable things, sprinkle clean water from the upper reaches, and take other precautions. Until the condition improves, prevent feeding and using fertilisers. Use oxygen-boosting medications as directed by fshing specialists. Te temperature and pH of the water pond must be measured to ensure the balance between hazardous and nontoxic nitrogen molecules such as ammonia and ammonium, thereby creating the need to monitor the temperature and pH of the water pond [3]. As a low-lying nation, natural disasters such as foods, cyclones, and other natural disasters have a substantial impact on aquaculture in both ponds and marine areas. Even little fuctuations in water quality parameter values above or below the typical, ideal range can cause physiological stress in aquatic life, afecting eating, breeding, and disease susceptibility [4]. Aquaculture and fshing are two of the most well-liked activities in coastal areas around the world. Additionally, given their vulnerability to climatic factors that endanger the economic stability of fshing communities that depend on fsh for food security and money production, these activities are regarded as greater in the context of climate change. Recent research has shown proof of the harmful consequences of climate change on corals, including coral bleaching and changes to organism variety and composition, as well as on fsh populations and aquaculture production [5][6][7].
In recent times, the use of AI in the felds of health, manufacturing, agriculture, and academia domain has grown in many folds [8][9][10]. Te role of blockchain, IoT, and WSAN is well-known and popular among researchers due to their reasonably good advantages available at low cost [11][12][13]. Industry 4.0 and 5G/6G telecommunication have revolutionized the applications in the felds of health, manufacturing, agriculture, and academia. Te domain of aquaculture has not remained untouched by the efects of the industry's 4.0/5.0 and 5G/6G revolutions [14]. Industry 4.0 is transforming how businesses produce, enhance, and sell their goods. Te Internet of things (IoT), cloud computing, statistics, AI, and machine learning are among the cuttingedge technologies that companies are incorporating into their manufacturing processes. People will experience the efects of the 5G revolution as it spreads. 5G, which is planned to ofer faster speeds, more capacity, and lower latency, is anticipated to be the driving force behind development in the future. Increased speeds, in particular, can provide new opportunities for commerce and community security. In the feld of integrated AI and smart fsh farming, especially ML and deep learning (DL), presents both new potential and obstacles for information and data processing [14]. Using the Internet of things (IoT), big data for data storage, cloud computing for remote processing, and artifcial intelligence, as well as other current information technologies, aquaculture was able to make better use of resources and improve long-term sustainability [15]. Traditional freshwater fsh farming practices are still use vast ponds with no water movement, no drainage, and no bottom silt treatment, which frequently create circumstances that encourage disease. Te close quarters of millions of fsh in their enclosed environment are the root of a lot of worries regarding fsh farming. Solid wastes, such as feces, kitchen waste, and jellyfsh, are dumped (often unprocessed) into the nearby waters, where they contribute to the water supply's pollution. It is now possible to collect real-time data, make quantitative decisions, use intelligent controls, make exact investments, and provide tailored service. For the development of water quality parameter prediction models, several types of ML approaches and methodologies have been investigated. In recent years, reliable ML models for estimating variables like nitrites and ammonia, as well as forecasting variables like dissolved oxygen, pond temperature, and pH, have been developed. Te issues faced by diferent companies, including the agricultural sector, including harvesting of crops, irrigation, soil composition sensitivity, crop scouting, weeding, harvesting, and foundation, are managed by AI-based technology, which also helps to increase productivity across all sectors. On the felds, AI technology aids in the diagnosis of pests, illnesses, and malnutrition. AI sensors can also detect and identify weeds. Te mythology that is used to classify diseases, segment the afected areas, and diagnose ailments.
Te current study aims to determine the impact of 5 QOW factors in distinguishing high-and low-performance ponds ( in terms of harvesting performance), as well as how fuctuations in QOW variables occur or are observed throughout the growing season, which infuences fnal harvesting factors such as growth and yield. Neural networks (NN), support vector machines (SVM), k-nearest neighbours (kNN), logistic regression (LR), Gaussian Naive Bayes (NB), decision trees (DT), random forests (RF), as well as AdaBoost are some of the machine learning techniques used to categorise ponds [16]. By taking into account both linear and nonlinear correlations among QOW components together with the result of prawn production, QOW variables during the course of the prawn-growing season as well as their value for prawn production are examined. Te fve QOW variables are temperature, DO, salinity, pH, turbidity, and salinity. Mutual information feature selection approaches, interlinked-based attribute selection, and ReliefF have all been utilised to discover factors impacting water quality for better animal development and productivity in ponds. Various applications, advantages, and drawbacks of machine learning and deep learning have been shown in Figure 1.
Te body of the paper is organised as follows: Section 2 provides a quick review of the literature on machine learning applications in agricultural systems. Te third section gives an overview of the dataset that has been used. Te ML techniques we used are described in Section 4. Te experimental framework has been presented and discussed in Section 5. Finally, Section 6 summarises the key fndings and concludes the work with recommendations for further research.

Related Work
Te toxicity levels of pond water are linked to nitrogen compounds, electric conductivity, and alkalinity. Te occurrence of hazardous ions that afect the pH of the pond's water. Many studies have focused on predicting dissolved oxygen, one of the most essential factors in ensuring the minimal levels of QOW necessary in fsh farming practices. For aquaculture forecasting of dissolved oxygen (DO), Huan et al. [17] recommend combining GBDT and LSTM. Te computation time of the whole method is decreased by picking characteristics with highly correlated data for dissolved oxygen as input data. When compared to DL-based prediction models such as BP, GBDT-LSTM, ELM, and PSO-LSSVM, as well as single LSTM prediction models, the suggested model has demonstrated a greater prediction efect and accuracy.
Shi et al. [18] propose a new Clustering-based Softplus Extreme Learning Machine approach (CSELM) for the purpose of forecasting dissolved oxygen variation from time series data with high accuracy and efciency. CSELM enhances efciency despite having a high tolerance for some data loss and unclear outliers in sensor time series, demonstrating that CSELM outperforms PLS-ELM and ELM models in terms of high accuracy and better efciency in predicting real-world dissolved oxygen content when compared to other models. Using the clustering technique, CSELM may endure sensor issues with data quality and still obtain good accuracy and efectiveness. Another beneft of CSELM is that the Softplus ELM has better-optimised network performance, which increases predictive performance. For aquaculture, which demands sophisticated supervision and operation, reliable and efcient dissolved oxygen prediction from time series data is essential. Te current prediction techniques are, nevertheless, put to the test by nonlinear, continually generated data streams of dissolved oxygen [18]. Csábrági et al. [19] created a nonlinear ANN for forecasting and predicting dissolved oxygen content concentration in the Hungarian part of the Danube in another comparable study. It was also discovered that when evaluating dissolved oxygen levels, pH is the most critical element.
A precise forecast of dissolved oxygen can aid farmers in taking the required actions to sustain dissolved oxygen echelons suitable for healthy prawn growth, according to Rahman et al. [20] presents a novel strategy in which a set  Journal of Food Quality of predictors is created, each of which forecasts a certain time stamp in the future. On the other hand, Liu et al. [21] investigated the efciency of attention-based recurrent neural networks (RNN) in predicting dissolved oxygen in the short and long term. Te author also proposes two attention-based RNN architectures for capturing temporal correlations independently and learning spatiotemporal relationships concurrently that outperform state-of-the-art approaches. Te fndings of the proposed model reveal that attention-based RNNs can predict dissolved oxygen more accurately in both short-and long-term predictions. In recirculating aquaculture, the level of dissolved oxygen is a vital indication of control; its content as well as dynamic fuctuations have been found to have a signifcant infuence on the healthy growth of aquatic live feedstock. It is vital to forecast the levels of dissolved oxygen concentration in advance to ensure the safety of aquaculture operations. Ren et al. [22] suggested a forecasting model on the basis of deep belief networks to achieve dissolved oxygen content prediction. To analyse the original data space, a variational mode decomposition (VMD) data processing approach was used. Te suggested model can predict DO concentration in temporal series rapidly as well as reliably, and its forecasting performance is equivalent to that of existing frameworks like AdaBoost, decision trees, CNN, and other similar models.
In fsh farming, Zambrano et al. [23] introduced an ML model for manually observed water quality prediction. In cases where the number of measurements was restricted, the author used RF, MLR, and ANN to assess data from water quality indicators that are regularly recorded in fsh growth and farming. Te suggested model achieves the goals of predicting and estimating unseen factors based on observable data. When the water pond variables are examined only two times per day, the model employs random forests to anticipate DO, the temperature of the pond, pH, and ammonia, as well as ammonium. In contrast to earlier studies in the literature, we use machine learning to detect primary driving elements (for the measurement of QOW), which impact aquatic livestock development and productivity in commercial freshwater ponds. Grow-out period (in this study, the grow-out period was 190-210 days). Te desire to reduce the cost of fsh farming grows as the price of fsh meal and inorganic fertilisers rises. Tis can be partially resolved by implementing a comprehensive farming system. To improve plant nutrient uptake, promote native fsh development, and eventually boost fsh production, fertilisers are added to fsh ponds. Te availability of natural food in pond water lowers the demand for synthetic feeds among fsh, which in turn lowers productivity [24]. We use a series of fltering and attributes extraction methods to examine the efect of QOW variables throughout the growing season of prawns, as well as their value for prawn production, by taking into account both linear and nonlinear correlations among QOW factors along with the outcome of prawn production.

Data Collection and ML Framework
Te data for this study was gathered during a grow-out season at a well-known prawn farm in Australia. Te amount of time spent in culture (DoC) varied between 190 and 200 days. Te water quality data has been taken from various ponds, each of which was set to a constant area of 10,000 square metres. DO, salinity, pH, and turbidity and temperature are among the fve QOW variables that were measured twice a day. For 135 days, turbidity and salinity were monitored one time each day. Each QOW variable's weekly averages are considered over the last 135 days. For diferent ponds, growth (such as average prawn mass along with yield at harvesting time) was measured to categorise them into low-, medium-, and high-producing ponds. Te classifcation process entails separating the ponds' performances based on all measured QOW characteristics. Te ML technique used to solve this problem is depicted in Figure 2.
Te pond's weekly averages of QOW variables have been taken as input, while the pond's performance considering the pond's class, growth, and yield have been used as output (target). Classifer models, which have been a series of complementary ML models exhibiting distinct patterns and treated as learning skills, employed the input and output data. Diferent models have been employed to increase variety in learning the linking attributes between QOW data and pond performance, with a focus on reducing over-ftting issues. Using 10-fold cross-validation, the classifcation performance of the various prediction models has been assessed. Figure 2 depicts the attribute extraction and selection approach that has been used to assess the value of each QOW characteristic individually and to produce a relative rating for pond performance diferentiation. Correlation-based Feature Selection (CFS), mutual Information (MI), and ReliefF (RLF) fltering feature selection techniques were used to refne and fne-tune time series data for every QOW variable of all ponds independently. Based on data fuctuations, these algorithms calculate the relevance of every QOW variable. Te capacity of every QOW variable to predict and diferentiate between high-and low-performing ponds is shown in merit scores. All QOW factors have been ranked according to their merit score. Overall harvest performance is linked to QOW factors at various points during the growing season. Te dataset has been used to assess the impact of each QOW variable during each week of the prawn growth season. Te QOW variable's time series data has been incorporated into MI, CFs, and RLF models. A 10-fold cross-validation approach was used to calculate the weekly infuence. Te aggregate merit score for each week aids in distinguishing between ponds that perform poorly and those that perform well. Characteristic features for QOW variables were defned as feature merit scores over 95 percentiles. Tis procedure was carried out independently for each QOW variable.

Results
Te experimental fndings of identifying ponds as high-or low-performing as a function of the observed QOW factors are shown in Figure 3. A total of 5 runs of the tests have been completed for each model independently. F scores were used to assess the performance of many models along with accuracy metrics, with the main weighting given to the F score. When the yield is utilised as a performance indicator, NNs, SVMs, and NBs deliver the highest accurate forecast. Using growth as a performance indicator, high-and lowperforming ponds have been identifed. For growth metrics, a larger number of training data sets were employed, resulting in improved classifcation results. It may be inferred that all of the QOW factors have a signifcant impact on prawn production and that these QOW variables can be used to discriminate between ponds with high and poor production outcomes.
When growth is used as a harvest parameter, the average F score and accuracy found to be 0.86 and 0.84, respectively. When yield has been used as a harvest parameter, however, the average F score and accuracy have been found to be 0.85 and 0.78, respectively, as shown in Figure 4. Other algorithms are outperformed by DT, NNs, and SVM, which produce the most accurate forecast that is also independent of the harvest metric. Te temporal series data for each QOW variable has been supplied independently into the attribute selection algorithms such as MI, CFS, and RLF in order to evaluate the signifcance of the harvest result for both prawn growth and yield. Figure 5 and Figure 6 depict the merit score of all QOW factors as well as their ranking based on both harvest measures. When growth is used as a harvest measure, the two most relevant QOW factors have been found to be temperature and salinity.
Temperature, dissolved oxygen, and salinity have much higher average merit scores than the other QOW factors, Merit score Normalization Generation of final score for each week Identification of weekly feature with 95% above percentile merit score making them the most impactful QOW variables. When yield is used as the harvest parameter instead of growth, the most signifcant QOW factors are found to be temperature and salt. Te three QOW factors revealed have a strong predictive capacity to distinguish between poor and highperforming ponds, which aids in keeping them within industry standard levels and so enhancing yield output. Te infuence of organism exposure on QOW factors has not been investigated in this study. Even though organisms' growth and survival may be afected by exposure, the focus is on the infuence of QOW factors on growth and yield under ideal conditions. Te infuence of each QOW variable at every point of the time series data during the prawn growth season, from stocking to harvesting time, has been evaluated individually and is indicated in Figure 7. As it is observed, the merit score has been found to be greater than 0.90 for temperature for the 2 nd and 7 th weeks, whereas salinity for the 19 th and 20 th weeks is considered as performance metrics. On the other hand, DO was found to have a higher metric score on the 17 th week along with temperature on the 4 th week when yield is accounted for as a performance parameter. Figure 8 depicts the merit score for every week of the grow-out season, which has been refecting the relevance of top-ranked QOW factors. Te salinity diference between   high-and low-performing ponds became increasingly noticeable towards the end of the season, thereby allowing high-and low-performing prawn-cultivating ponds to be distinguished. Temperature is related to metabolism in general, and higher levels indicate more development, as shown in Figure 8. Te temperature variations are greatest during the 2 nd , 4 th , and 7 th weeks; hence, they have been chosen via feature selection algorithms to contribute the most in distinguishing between high and low prawn cultivating performance ponds. Te maximum temperature (harvest matric yield) has been observed during the 4 th and 13 th weeks, and the minimum temperature (harvest matric yield) has been detected in the 10 th and 15 th weeks. Whereas the maximum temperature (harvest matric growth) has been identifed in the 7 th week and the minimum maximum temperature (harvest matric growth) was experienced during the 15 th week.

Discussion
Diferent QOW factors have a big impact on the development and survival of aquatic cattle. Despite the fact that earlier studies in the literature had not focused on defning the signifcant QOW factors or exploring how their fuctuations impact the development along with the survival of various aquatic livestock species, the machine learning algorithms provided here were tested on prawn development and yield, but they may readily be applied to other comparable biosystems. Changes in QOW characteristics and their impact on catfsh, tilapia, and other livestock development and survival described in the present study on greenhouse ponds and the longer duration of a culture diferent species of prawn are distinguishable and novel from other reported work. Forecasting QOW is also a critical responsibility for aquaculture farm managers. Te fndings of the presented study may be combined with QOW predictions from other studies to help in establish an early warning system to aid farm managers in making better decisions.
Ponds that perform well and those that don't have been identifed using growth as a performance measure. A conclusion that can be drawn is that all QOW variables have a signifcant impact on prawn output and can be utilised to distinguish between ponds with high and low production results. Temperature and salinity have been determined to be the two most important QOW parameters when growth is considered as a harvest indicator. Te three QOW parameters can efectively discriminate between ponds that perform poorly and those that perform well, helping to keep them within acceptable ranges for the industry and so increasing yield production. Te salinity diference between ponds with good and poor performances became more apparent as the season progressed.

Conclusion
We introduced a series of machine learning (ML) algorithms in this research to study how QOW factors infuence the harvesting season outcome of aquatic livestock (prawn) in freshwater ponds. Te proposed model outperforms other similar existing models in terms of pond classifcation accuracy, QOW variables ranking in terms of afecting yield, and growth of prawns in pond. A data set obtained from a prawn harvesting farm has been used to achieve experimental results. Using the provided data-driven ML technique, it is feasible to properly distinguish high-and low-performing ponds. DO and Salinity along with temperature, are determined to have the most impact on the performance of all the QOW factors during the harvesting season. Because optimum growth in the frst few months might have a substantial impact on the fnal harvesting outcome, the temperature efect has been anticipated to directly afect the harvesting season. Te most crucial component in diferentiating high-and low-performance ponds is determined to be the diference in DO and salinity in the fnal third of the grow-out season. In conclusion, machine learning techniques showed great promise for producing decision support for aquaculture producers in order to stimulate scenarios that lead to higherperforming ponds and avoid the circumstances leading to low harvest results for the prawn cultivating sectors. Depending on the natural conditions, individual farms have distinct growth seasons, QOW requirements, customer requirements, and market concerns that infuence management standards. Te methods given in this presented work are data-oriented, and they can be used to run experiments and create fndings using farm-specifc data. For predicting changes in dissolved oxygen content from time series data, we will propose the prediction model CSELM, which combines two novel techniques: the kmethod clustering based on DTW for efciency and precision by sensibly grouping input data and utilizing their common trends, and the Softplus input vector based on PLS for enhancing ELM.

Data Availability
Data will be made available upon request to the corresponding author.

Conflicts of Interest
Te authors declare that there are no conficts of interest.