5G-Oriented IoT Big Data Analysis Method System

,e application degree and application scope of 5G Internet of ,ings technology and big data analysis technology are becoming wider and wider, bringing opportunities for the development of traditional enterprises and providing technological innovation support for the development of new enterprises. Based on 5G Internet of ,ings technology and big data technology, this paper designs and studies an intelligent agricultural monitoring platform. We collect crop growth data and monitor crop growth status through this platform to study the 5G-oriented IoT big data analysis method system. ,is paper studies the data collection and storage issues involved in the huge agricultural IoTdata environment. ,is article analyzes the specific sources of agricultural big data, the specific methods of data collection, and the methods of various database storage technologies. Combining wireless sensor network technology, large-source data processing technology, and distributed data storage technology, a method is proposed to solve the problem of rural Internet data collection and storage in the big data environment. ,is paper proposes a spatiotemporal block processing TSBPS to store the first detection data. ,e method uses spatiotemporal preblocking, data compression, and caching to significantly improve the recording speed of near real-time storage andmicrodetection data. In the experimental part of this article, experiments are carried out on the key parts of the IOT-HSQM system model that may limit storage or query performance. Experimental results show that this article compares TSBPS and direct writing methods. ,e maximum write speed increased by 79%, and the average write speed increased by 42%. ,e IOT-HSQM system model can meet the requirements of compiling and query performance and statistical analysis.


Introduction
From the emergence of 5G-oriented Internet of ings technology to today's Internet of ings technology and big data analysis technology having entered almost every field of society, 5G-oriented Internet of ings technology is developing rapidly and will become more and more important in the future. Similarly, the Internet of ings technology has also affected the agricultural sector, leading to major changes in production methods in many sectors. e research on 5Goriented IoT big data analysis system is of great significance for China to strengthen the application of 5G-oriented IoT technology [1,2].
In foreign countries, many scholars have conducted research on big data analysis methods and have achieved good results. For example, Gill et al. proposed a power big data analysis system based on electricity. e architecture combines the powerful functions of big data and emerging cloud computing technology and includes three key technologies, namely, multidimensional index based on distributed grid files, automatic SQL to Hive QL conversion tool and hybrid data storage model, and support data update operation [3,4]. Saxena et al. proposed a new entity discovery model based on statistical big data analysis in the field of science and technology, which analyzes various function sets and calculates the integrated function value of related technologies and agents. In addition, emerging models are discovered by using 3 criteria (such as vision, executable, and activity) [5]. Based on big data technology, Gill et al. analyzed the advanced operation management, maintenance schedule, network planning, and asset management of the distribution network and studied the auxiliary decisionmaking platform of the distribution network. e application of the auxiliary decision-making platform of the distribution network in situation analysis and load forecasting is described [3].
Speaking of big data analysis technology, current domestic scholars have carried out related research on "big data analysis." Yang et al. proposed a general statistical database cluster mechanism (IOT-StatisticDB) for big data analysis in the Internet of ings. In IOT-StatisticDB, the statistical function is performed by the statistical operators inside the DBMS kernel, so complex statistical queries can be expressed in standard SQL format. In addition, statistical analysis is performed on multiple servers in a distributed and parallel manner, so performance can be greatly improved, which has been confirmed [6]. Lu et al. drew visual graphics according to the forwarding relationship. e invention discloses a microblog forwarding visual analysis system based on big data analysis technology. According to the visual analysis method and system of microblog forwarding based on big data analysis technology, independent microblogs are associated to form the topological relationship of text [7]. Zhang et al. studied the application of data analysis such as sentiment analysis to IoT and SN combined data stored in SQL databases, and further studied algorithms and configurations to minimize the delay in data set processing and result retrieval [8], but the configuration equipment requirements are high and the data is precise, so the cost is relatively high.
is article explores how big data affects the construction of an open IoT ecosystem, focusing on the importance of big data to the Internet of ings, an open IoT ecosystem framework with advanced communication and analysis technologies, and the integration of big data and the Internet of ings. e resulting new applications explored the big data technology framework in the Internet of ings. e development and application of 5G-oriented Internet of ings technology in the agricultural field has slowly begun to spread, and the concept of 5G-oriented agricultural Internet of ings has been introduced. Based on the ZigBee protocol, sensor data will be collected on a computer and then can be stored in a database and displayed on the interface in real time [9,10]. e data transmitted from the serial port to the computer is received through programming and stored in the HBase database. HBase table database can store this modeling data in the table format we designed. : one is table conversion, and the other is  application tools, such as sqoop and kettle tools. sqoop  supports a variety of relational database migration. ere are  several conversion methods to convert traditional arrays  into HBase tables, such as basic conversion, segment conversion, and embedded conversion [11,12]. e basic conversion is to add the functions of a traditional table to the column clusters in the table. When we need to migrate data, we need to design the table structure model in advance. Here, we mainly use sqoop tool for data migration.

Agricultural Internet of ings System.
e agricultural Internet of ings technology uses various sensors and other detection equipment to obtain and collect field information in various fields of agriculture, forestry, and animal husbandry, such as cultivated land, horticultural parks, and aquaculture, and stabilize the continuous transmission of this data through a reliable transmission channel. e server receives a large amount of information and performs calculation, analysis, and integration, and finally the terminal field device displays the data in a visual form. Operators can make corresponding adjustments through the IoT agricultural system to maximize agricultural production profits [13,14]. It can be seen that the agricultural Internet of ings is an Internet of ings application system that covers all aspects of agriculture, such as seedling selection, planting, fertilization, harvesting, production, processing, and circulation. Platform business-level technologies are applied in a multilayer structure from low to high, including the following five levels: (1) Transmission layer: it consists of two parts: wireless communication (long-distance and short-distance) and wired communication (industrial ethernet ring network). e backbone network is mainly wired communication, and the industrial ethernet ring network is mainly used in the industrial field to ensure network transmission. Equipment level communication technologies include ZigBee, rf433, and RS232, which can provide reliable, convenient, and fast data transmission and exchange.
(2) Business layer: all types of information used in the growth factor and quality traceability monitoring system, including basic environmental data, business management information, geographic information, and video information. e quality inspection and quality traceability database of IoT applications needs to be extended and improved by providing basic services such as business intelligence (BI), collaboration technology, and workflow [15,16]. (3) Application layer: it is divided into multiple entrances according to roles, namely, regulators, producers, suppliers, channel providers, and consumers. (4) User layer: it uses multiple terminal methods based on Android, PC, IOS, and other multi-operating system query support architectures, including various information security methods, data storage, access and transmission security, through security systems and technologies to ensure safe access to information in the event of malicious attacks, to protect data security and data integrity. e main key technologies are information security technology and system security technology in network integration and heterogeneous system environments.

HBase Sensor Network Data Storage System Design
Unzip the downloaded HBase package and install it on one of the nodes, and then unzip it using the tar-zxvf-hbase-bin.tar.gz command. en add the Java environment variable to the HBase env.sh file, and close the zookeeper: export HBase transferred by HBase_-MANGES_ZK � false. Modify the configuration file information by editing hbase-site.xml. Modify the server area file in the directory to add node information [17,18]. Copy HBase to another node to complete the installation. Among them, slaver1 is hmaster and the other two slaver1 are hregionserver.

Storm Cluster Construction.
Unzip storm and place the installation file in the installation directory, then modify storm.yaml to configure Zookeeper node information, and add it to storm.zookeeper.servers. Add the numbus node name to nimbus.seeds. Modify the log storage path information on strom.local.dir [19]. Add the name of the node running the drpc service on drpc.servers. en copy the storm installation directory to the remaining nodes to complete the storm installation.

Designing Interpersonal Data Query.
e fast statistical analysis model can ensure effective statistical analysis of the loaded data, but before statistical analysis, the query and loading of the data depend on the query speed of the storage system. erefore, it is particularly important to increase the speed of receiving sensory data from the storage system.

Establishing an Index for Intermediate Data Stored on
HBase. Druid's built-in index supports fast placement in many cases and will not become an obstacle to fast statistical analysis. Since HBase only supports fast row key scanning, for the intermediary perception of HBase storage, the correspondence between the unanswered key field and the array key of the common question must be determined [20,21]. Similar to the HBase index processing method at the microdata level, the index is stored on the index server. e index server uses HBase and Redis to store index data. HBase stores all index data, and Redis temporarily stores index data. Since Redis is a cache based on fast access to memory, frequently accessed index data is stored in Redis. When running a data query, first get the results from Redis, which facilitates quick placement and saves query time. When searching index data, if the corresponding data is not in the Redis cache, it will search HBase and asynchronously replace the index cache on the index server.

Query Design Priority.
When designing a query, follow the following principles: (1) First query the cache, reduce the cache, and then query the mesoaware data from the data storage system. (2) Query data in the mesoknowledge storage system.
For the same query requirements, if Druid and HBase are used at the same time, Druid queries the data first. (3) If the statistical analysis is to receive results from Druid and is based on a fast statistical analysis model, first obtain the results from Druid. (4) When the query results of different rough and detailed data in the mediation perception data are available, the coarse-grained meso perception data will be queried first. (5) When searching for data or performing analysis and calculations from a media perception system, you are performing tasks that do not interact with each other in parallel.

Building and Extracting Big Data
e rapid development of ICT and computer technology has made data resources an increasingly important factor of production. e connection method of the Internet of Everything has led to an explosive growth in data volume, leading to changes in social driving forces [22,23]. Data is the basis of information and knowledge, and the rapid growth of data has also led to the explosive growth of information and knowledge in human society. Data mining is the basic technology of knowledge discovery. Data mining refers to the process of discovering new relationships, trends, and patterns hidden in the data by processing and analyzing large amounts of data. Data mining is to extract information and discover knowledge from a large amount of data under unknown conditions. erefore, the information obtained through data mining has three characteristics: unknown beforehand, high practical value, and high efficiency. Data mining expands the application of data from basic data indexing and query to advanced applications of decision-making knowledge and analysis and prediction, which reflects the connotation of deep mining of data.

Data Preprocessing.
As the first and most important step of the entire data mining process, high-quality data is an important prerequisite for successful data mining. First of all, the goal of data mining must be clearly defined, and the business direction of the data mining application must be clearly defined. In other words, determine what important knowledge and information can be discovered from a large amount of data [24,25]. Some require close cooperation with domain experts or end users to clarify the requirements of the actual application scenarios of data mining. After determining the mining target, the corresponding database Mobile Information Systems 3 and data warehouse for the target must be collected. Once the target data is determined, the data must be processed (such as noise removal and data conversion) to make it highquality data to be mined. e following data mining algorithms are processed on this database.

Data Mining.
According to the different characteristics of the data and the requirements of the user or the actual operating system, select specific algorithms for data mining, such as prediction, sorting, grouping, finding sequence patterns, finding related rules, or anomaly analysis. is article focuses on analyzing specific data mining algorithms.

Data Analysis Screen.
If the data mining model is redundant or missing, or does not meet the needs of the user, it should be deleted and the previous steps of the process should be repeated. If the data mining model can meet the verification criteria and user needs, the results should be analyzed and expressed in natural language. e model or relationship obtained from data mining may be redundant or irrelevant to the mining goal, so it should be eliminated at this time. Even the results obtained do not meet the needs of users or do not meet the requirements of mining [26,27]. For now, it is necessary to reselect data sources and algorithms and set new parameter values for re-mining. e result of mining is ultimately useful knowledge for users, so it must be visualized, and the mining results must be obtained after completion, induction, and evaluation.

Data Collection and Agreement.
In the Internet of ings, most data collection is done through cutting-edge devices, which are often referred to as cutting-edge devices because they are usually located at the edge of the network. In other words, they exist in one place, such as a person or an element interacting with the network. e original data is received by these devices and can be retransmitted to the network. e Internet of ings is driven by economic value, which can significantly improve social and economic benefits and save social production costs.

Data Collection Agreement.
In order to create an appropriate channel to receive data from the latest devices, although there is no consistent protocol to support data collection, similar mechanisms have emerged in specific areas. Of course, in order to accept the protocol between devices communicating with each other, a cloud capable of centralized analysis is needed. Some companies follow a specific path, while others are influenced by open source software.

Data Integration and Management.
Data retention strategies are also important. is is because storing these huge high-resolution data is neither feasible nor necessary in many cases. Strategies include data collection and sampling and corresponding storage of source data, especially when the data has a fairly high level of redundancy. ese data structures often provide various analysis tools and libraries.

Big Data Technology Framework.
Hadoop execution analysis requires low latency requirements, including realtime model development, iterative, or interactive analysis, which is one of the main limitations of distributed settings. In this case, especially when multiple data (such as many machine learning algorithms) must be transferred, the Hadoop framework can be very expensive for basic HDFS communication. For example, the Spark framework was developed to solve Hadoop problems and has now evolved into its own ecosystem.

Machine Learning and Data Mining.
From the organization or application level, a bunch of IoT terminals are needed. Stack elements should include M2M-level data collection, data processing functions, Internet data sharing, insight and device delivery functions (potential functions), and decision-makers (automatic or manual). Many groups and companies define the nature and structure of this IoT stack, which can provide the core framework, platform, related services, and solution development for front-end applications and computer and back-end support.

System Hardware Path
Test. e intelligent agricultural system based on the Internet of ings contains several functional modules. In order to ensure the normal connection of the intelligent agricultural system, a communication program is designed to test whether each unit can be successfully connected. e communication program mainly includes the following test points: gateway management, controller management, sensor management, and connection status display. In the system change settings, configure the IP address and port number of the client and server to ensure that communication can be connected.
In the system manager, add the name and address of the system to be tested. In the gateway control, enter the number of each gateway node in the system. In the sensor management, enter the sensor number to be tested by the system. Finally, click the login mode to view; we can see that each material is in a channel, and the module can be connected normally. After many tests, it is ensured that all parts of the system are connected and can operate normally.

Comparative Experiment of TSBPS Writing Method.
Aiming at the experiment of the data writing throughput of the microsensing data layer, different amounts of data are extracted from the historical data of the sensor for experimental writing. When writing, HDFS uses the IOT-HSQM directory structure and file naming method. Compare the average data writing speed and maximum data writing speed of direct data storage in HDFS, as well as the caching method, and then group them according to space-time blocks.
In order to ensure the accuracy of the test results, most of the original perception data will be temporarily stored in the memory, and when writing, the display time of the data will be modified to the current time in advance. e experiment should use the maximum writing speed and the maximum average speed of the two writing methods, where the maximum writing speed refers to the maximum instantaneous writing speed that can be achieved during the writing process, and the average maximum writing speed refers to maintaining stability. Under this premise, the maximum write speed that can be maintained without exceeding the Redis cache threshold is the standard here.
In the experiment, 12 simulated writing clients were deployed on 6 computers. SocketServer is used to monitor the actual status of Redis and uniformly control the upload speed and data volume of each simulated data terminal. For the convenience of statistics, the number of transport data is directly used here as the unit of the written data volume.

Storm System Performance Experiment and Data
Collection. In the continuous data generation process in the data source system, too much data may appear. At present, the computing power of the Storm system will be reduced, and the entire system may crash in severe cases. In order to test whether the Kafka message queue in the system can temporarily store the data cache when the data increases, multithreading technology is used to control its performance. Each thread produces the same amount of data, which can be verified by comparing the number of threads and computing performance.

Application Analysis of K-means Clustering Algorithm in Plant Growth Condition Clustering.
We put the experimental data through 15 iterations and put the results of each iteration into the out folder, and check the files to get the final cluster centers and clustering results. After five sets of test data, almost each requires 15 iterations to get the final category. After five sets of test data, the similarity in the same category is higher and more compact. Run the K-means MapReduce program, after each iteration, a new cluster center is generated, and then iterated, and finally all the data is divided into 4 clusters: Finally, output the results of our iteration in the out folder, and finally the iteration is completed and divided into 4 clusters. Table 1 shows the centers of 4 clusters. As shown in Table 1, from the results of the clustering, the optimal state of the cluster shows that the suitable temperature for tomatoes is 24-26 degrees during the day. When the temperature and humidity are high, the light is between 30,000 and 40,000 rex. e relative humidity is 65-85%, the relative air humidity is 45%-65%, the pH is 6-7, and the concentration of carbon dioxide in the air is 300 microliters/liter. Some variables close to the center are clustered into this category, then plants grow well under this category of conditions. We can adjust the conditions of different greenhouses according to these conditions to improve plant growth. From this, we can know that such an approximate range is better for our plant growth. In other conditions, we try to gather those conditions to the variable range of the conditions, and adopt heating, increasing carbon dioxide concentration, and increasing light intensity.
is article compares the time drawn by the calculation with the time of a single machine. It can be seen that the speed-up ratio between the cluster and the time of a single machine will be more obvious as the number of computing nodes increases. However, when the amount of data is small, the time it takes to initialize the MapReduce calculation cannot be ignored, and the calculation time consumes more than the time of a single machine. When the amount of data is small, the extra time comes from the overhead of the Hadoop library. Before starting any map or reduce tasks, this library needs to do a lot of checks, but once started, Hadoop's mapper and reducer will run at full speed. A single machine is more suitable for calculation of small data volume, as shown in Figure 1.
e result of the calculation is shown in Figure 1. It can be seen that the cluster takes more time than the single machine, and as the number of nodes increases, the time change will not be too large. e data in this article has still more clusters than single machines, relatively speaking, it takes more time than the first group, and the cluster time has decreased with the increase in the number of nodes. For Node 3, we can see that both the cluster time and the standalone time have increased, but the cluster time is less than the single machine, and as the number of nodes increases, the time drops faster.

Storm System Performance Test and Analysis.
In the process of continuously generating data by the data source system, excessive data volume may occur. At this time, the computing power of the storm system will decline, and the entire system may crash in severe cases. In order to test whether the Kafka message queue in the system can temporarily cache data when the data surges, multithreading technology is used to test its performance. Each thread produces the same amount of data, which is verified by comparing the number of threads and the computational efficiency. Part of the data is shown in Figure 2. e method of repeatedly sending data is used to achieve the requirement of large data volume.
e storm system accumulates the abnormal values in the sent data to test storm's performance system.
It can be seen from Figure 2 that, with the increase in the amount of calculated data, the calculation time curve of storm does not show a geometric change, but an approximate straight line. erefore, it can be concluded that the increase in the amount of calculation data does not weaken storm's computing power. It shows that, under the action of Kafka components, data is effectively cached, ensuring the smooth and efficient operation of the Storm system.
For the read performance test, when 10,000, 100,000, and 1 million records have been written in the data table, test the query of 10, 100, and 1,000 records from the data table, respectively; the required system response time is shown in Figure 3.

Mobile Information Systems
As shown in Figure 3, it can be seen from the test results that the response time of the data query is generally maintained at the millisecond level. When querying in the same data table that stores the data records, the system timeconsuming will increase with the increase of the query data. When the number of records in the data storage is different, but the number of records in the query is the same, the system response time will increase as the total number of storage tables increases.

Experimental Analysis of the Writing Method of the Microsensing Data Layer.
e microaware data layer is based on the TSBPS method of writing, and the maximum instantaneous speed depends on the data compression situation and the service capacity of the Redis cache. e maximum average speed is the data speed that can be received when the amount of data cached by Redis is basically stable. It is restricted by data compression and the speed of Redis writing to HDFS. In theory, the speed of writing to HDFS from Redis can be close to the maximum of HDFS and since the data written is compressed data, compared to the original data, the writing speed is further improved according to the compression ratio. e experimental results are shown in Figure 4.
As can be seen from Figure 4, the TSBPS method is compared with direct writing, the maximum writing speed has increased by 79%, and the average writing speed has increased by 42%. e writing speed of TSBPS method is much higher than that of direct writing. erefore, the method of caching and then clustering writing in the model can better meet the high throughput requirements than direct write operations to HDFS.

Conclusions
is paper designs an agricultural data processing model represented by sensor data in the agricultural field. First, determine the characteristics and types of agricultural information, and then submodules to realize the various functions of the station, and involve the collection, transmission, storage, and processing of the entire agricultural field data. e data interface unit and data loading unit are based on wireless sensor networks. e data storage unit based on the distributed file system and the table structure for storing data, sensor data, and file management module are data query modules, user access, and user management modules that apply web query. Finally, distributed agriculture analyzes the data in the database.
Based on the research and analysis in the field of agricultural Internet of ings, this paper conducts design research on data collection and storage in the field of Internet of ings under the big data environment. In order to collect data from the sensor network in the agricultural Internet of ings, the real-time data processing system storm big and HBase data storage were analyzed and researched. With the continuous development of industrial technology, the number of data and system clusters will continue to grow.
en, various new technical problems appeared. When developing sensor networks, more reliable, stable, and efficient structures can be explored, and data fusion technology of nodes in the network can be used to reduce the energy consumption of data transmission. In the big data processing system, further research and exploration are  needed to achieve load balancing and system processing efficiency, and the application of the system in data collection and data mining needs to be further improved. e research on offline data processing also needs further research.
is paper proposes a spatiotemporal block processing TSBPS to store the first detection data. e method uses spatiotemporal preblocking, data compression, and caching to significantly improve the recording speed of near realtime storage and microdetection data. e perception data layer creates indexes in the original perception data files in HDFS and creates effective data in HBase in the index server to effectively query the original perception data and the cleaned effective data.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.