5G Edge Computing Enabled Directional Data Collection for Medical Community Electronic Health Records

It is important to promote the development and application of hospital information system, community health service system, etc. However, it is difficult to realize the intercommunication between various information systems because it is not enough to realize the in-depth management of health information. To address these issues, we design the 5G edge computing-assisted architecture for medical community. Then, we formulate the directional data collection (DDC) problem to gather the EMR/HER data from the medical community to minimize the service error under the deadline constraint of data collection deadline. Moreover, we design the data direction prediction algorithm (DDPA) to predict the data collection direction and propose the data collection planning algorithm (DCPA) to minimize the data collecting time cost. Through the numerical simulation experiments, we demonstrate that our proposed algorithms can decrease the total time cost by 62.48% and improve the data quality by 36.47% through the designed system, respectively.


Introduction
Recently, the core role of smart medicine is the construction of hospital information platform based on electronic medical records and regional health information platform with resident's electronic health records [1]. Electronic health records [2] can improve the phenomenon of information asymmetry between doctors and patients and satisfy the demand-oriented development of medical service reform. e construction of electronic health records is the focus of future smart medicine [3], which can not only satisfy the diversity requirements of medical and health reform but also accelerate and strengthen the development of information technology in medical and health institutions. Recently, the vigorous development of information technology in China's medical and health industry has given birth to the vigorous development and application of hospital information system (HIS), community health service system (CHSS), and other information systems. However, it is difficult to realize the intercommunication between various information systems because it is not enough to realize the in-depth management of health information.
e National Health Commission of the People's Republic of China [4] is committed to promoting the 5Gassisted medical action project. On the basis of medical and health information, it can promote not only the implementation of hierarchical diagnosis and treatment but also the reform of public medical institutions and public health management mode. en, 5G-assisted medical care aims to improve the efficiency of medical institutions and integrate high-quality medical resources such as electronic health records (EHRs) [5], overcoming the shortcoming of traditional medical services and enabling patients and improving the upgrading of traditional medical service mode. e scientific and technological application of medical community [6] electronic health records have been attracting the attention of many researchers and enterprises. Reference [7] compared achievement of and improvement in quality standards for diabetes at practices using EHRs with those at practices using paper records. Reference [7] examined the effects of electronic health records on the safety of patients in medical facilities. Reference [8] analyzed the costs and benefits of EHRs in six community health centers (CHCs) that serve disadvantaged patients.
However, the above researches do not deeply study the requirements and data characteristics of medical community platform for EHR management. In terms of credibility, reliability, and real-time, it is necessary to deeply study the directional collection mechanism of archival data, such as large-scale mobile terminal sensing under 5G, diversified archival data collection, and archival information sharing under the medical community.
So, there are some major challenges as follows: (i) e diversity and complexity of medical community seriously restrict the classification efficiency and marking accuracy of medical community electronic health records data, which affects the intelligent management efficiency of EHRs. (ii) e differences of medical level and service objects between different medical institutions in the medical community make the sharing of electronic records, which is an important basis for specialist collaboration, more complex. e low efficient and precision data sharing will seriously restrict the medical community's ability to solve major diseases, and it is difficult to form a complementary development mode. (iii) How to accurately and timely collect the data of medical community electronic health record management has become a difficult problem because it is difficult to predict which medical structure will produce what type of electronic health record data at what time.
Our key contributions can be summarized as follows: (i) We design the 5G edge computing-assisted architecture for medical community. (ii) We formulate the directional data collection (DDC) problem to gather the EMR/HER data from the medical community with minimizing the service error under the deadline constraint of data collection. (iii) We design the data direction prediction algorithm (DDPA) to predict the data collection direction and propose the data collection planning algorithm (DCPA) to minimize the data collecting time cost. (iv) We conduct extensive simulations for the designed system and proposed algorithms. e results show that our proposed algorithms can decrease the total time cost by 62.48% and improve the data quality by 69.95% through the designed system, respectively. e rest of the paper is organized as follows. We review the state-of-the-art research in Section 2. We design the 5Gassisted edge computing system in Section 3. We present the system model and formulate the DDC problem in Section 4. We design the intelligent data collection scheme with the assistance of random forest for solving the DDC problem in Section 5. We conduct the simulations in Section 6. We conclude this work in Section 7.

Related Work
e current situation of medical archives management in the medical community is discussed as follows. Reference [9] found the significant deficiencies in the practice of warfarin management and suggestive evidence that anticoagulation services can partially ameliorate these deficiencies. Reference [10] described a randomized trial of a program to identify and treat depression among high utilizers of general medical care. Reference [11] designed an intelligent archive management system by integrating 5G network and Internet of ings for smart hospitals. Reference [12] used the exome sequencing for infants in intensive care units to determine the diagnostic yield and use of clinical exome sequencing in critically ill infants. Reference [13] proposed a novel drug supply chain management using hyper ledger fabric based on block-chain technology to handle secure drug supply chain records.
About the status of data collection, reference [14] extracted security data that plays an important role in detecting security anomaly toward security measurement. Reference [15] provided a theoretical model of privacy in which data collection requires the consent of consumers who are fully aware of the consequences of consent. Reference [16] considered a scenario where an unmanned aerial vehicle collects data from a set of sensors on a straight line. Reference [17] proposed a low redundancy data collection scheme to reduce the delay as well as energy consumption for monitoring network by using matrix completion technique. Reference [18] proposed a practical framework called Privacy Protector, patient privacy-protected data collection, with the objective of preventing these types of attacks.

5G-Assisted Edge Computing System
First of all, according to the requirements of medical community e-health records management, based on the complex environment of regional medical institutions information platform, we design the mathematical model of e-health records management and its 5G application system framework. Secondly, by deploying multiple mobile terminal nodes, we design the medical community electronic health records management 5G architecture, to provide realtime and reliable communication guarantee for large-scale medical community electronic health records data application business. en, in order to ensure the real-time and reliability of data sharing of medical community electronic health records, a massive data collection mechanism based on edge computing is established. Finally, based on the above requirements, we combine the large-scale mobile communication of 5G with the massive data real-time collection technology of edge computing to study the ap-plication mechanism of data directional collection, so as to provide the reliability, credibility, and feasibility guarantee for the data update and sharing application of medical community electronic health records management.
According to the requirements of regional medical institutions information platform construction, we analyze the information interconnection and regional differences between community health service centers and municipal hospitals.
en, we introduce edge computing into 5G through the organic allocation and deep integration between the mobile terminals and cloud computing server. e edge computing reasonably allocates the storage, computing, and network services resources between the computing center and the mobile terminals, so as to achieve the local optimal division of labor and cooperation before the network service quality and user experience quality. erefore, the introduction of edge computing into 5G can satisfy the computing and communication needs of mobile terminals with distributed and random characteristics. Hence, edge computing can well solve the geographical deployment characteristics of 5G nodes scattered between community health service centers and municipal hospitals. Moreover, the edge computing architecture with 5G is shown in Figure 1. Here, the 5G platform is the center, i.e., the municipal hospital, and several subnets of edge community service center are deployed. e network control ability of these center subnets is the same as that of the servers in the platform, where the architecture can effectively reduce the calculation delay and improve the storage efficiency of medical community electronic health record data. e 5G architecture shown in Figure 1 can provide convenient services, health management services, traditional Chinese medicine (TCM) health care services, and other services. is architecture can give full play to its advantages in data sharing and family doctor follow-up, continuously improve the accessibility and effectiveness of services, comprehensively improve service level and satisfaction, and provide medical services and health management services to the majority of residents conveniently and quickly. At the same time, the medical community platform can solve the following problem, lack of medical resources shortage, difficulty to see a doctor, and realize the integration of health resources and then improve the level of primary health care through the establishment of complete electronic health records for residents. en, we integrate the sharing of medical records and test results, medical images, medication records, and patients' basic health information between secondary and tertiary comprehensive medical institutions in the community to realize the sharing of high-quality medical resources in the region.
With the rapid development of 5G edge terminals used to collect electronic health records data, how to reasonably allocate and effectively recover the diversity resources of 5G has become a key problem. In 5G environment, the distribution and recovery of resources and the reconstruction of network topology are dynamic.
ere is an unknown mapping and interference relationship between 5G real-time resource statistics, computing task resource allocation and task scheduling, and 5G network edge computing terminal trusted resource information. ese relationships are realtime and random. It is the main goal of network resource management to make 5G system execution efficiency and resource utilization always in the best state. It is well known that 5G supports a large amount of traffic. e resource request queue is very easy to overflow, which makes the arrival rate and processing efficiency of resource request signaling and computing task control signaling between the network control center and the edge terminal irregular, and the reliability of resource allocation and computing task unbalanced among different services. In order to improve the instantaneous resource management level of 5G and the utilization rate of global resources and make 5G better communication support for medical community electronic health records management, we design the 5G edge computing architecture as shown in Figure 2, where we deployed with multiple edge terminals, multiple autonomous base stations, and multiple autonomous control networks.
In Figure 2, edge computing terminals share EHR information and exchange unified standard data sources through regional platform interfaces of medical institutions. According to the edge computing architecture shown in Figure 2, medical institutions improve the interconnection architecture of regional health information platform and guide the electronic medical record system and electronic health record management system of medical institutions under their jurisdiction. In particular, electronic health records need to achieve unified data interface standards of medical institutions, medical insurance, community, and other related systems, so as to facilitate information sharing. e common data element established in the 5G control center of medical community can efficiently improve the real-time sharing efficiency of electronic medical record (EMR) and EHR. erefore, the electronic medical records and electronic health records storage system of medical institutions in medical community must follow the standardized description of national public health data element attributes, describe the extracted data element attributes, conduct business modeling, and realize data sharing.
Medical institutions at all levels should use the ID number as the main identification code for the transmission and circulation of information in the diagnosis and treatment of public health services in the business system, so as to ensure the effective collection of archival information. Each edge computing terminal server should ensure the validity and timeliness of data transmission, verification, process tracking, and traceability, so as to ensure that all kinds of information can be uploaded to 5G information platform timely, accurately, and comprehensively. erefore, how to collect and improve data from the community and medical institutions at all levels in accordance with the electronic medical record information standards and unified specification of disease coding and other important databases, to ensure the quality of data, has become the key of medical community archives management. e process of data collection should have the following properties:    (i) e process of data collection should be based on the main index of patient identity, correctly associated with the previous diagnosis and treatment data, and form a complete and standardized medical record file, which is convenient for medical staff to use. (ii) e process of data collection should be carried out according to the interface specification of regional information platform. e diagnosis and treatment information should be uploaded and collected into health records in time. (iii) e process of data collection should implement the codes of clinical symptoms, diagnosis, surgery, drugs, inspection, and so on released by the state to ensure the standardization and unification of diagnosis and treatment information uploaded to the platform. Figure 3 gives a toy example for the Directional Data Collection and application of medical community electronic health record with the above characteristics. In this scenario, cloud platform, edge computing terminal, and 5G platform are effectively integrated into the directional data collection of medical community electronic health records, such as community classification, data directional collection, and data storage. e edge computing procedure is illustrated as follows: (1) e communities generate their EMR/HER data, which can be collected by a corresponding opportunistically encountered edge computing terminal. (2) e edge computing terminal analyzes and mines the valid information from collected EMR/HER data, e.g., the location, resident ID, and his/her historical records. en, the dataset is sent to the cloud servers in real time. e edge computing terminal predicts their medical demand according to the historical records by using the algorithm proposed in Section 4.1 and returns the prediction results to the 5G cloud servers. (3) e 5G cloud servers periodically compute the data collection route via the algorithms in Section 4.2 for data collection terminal according to the information of communities and residents from edge computing terminals. (4) e data is collected by the data collection terminal through the route calculated in the above step.

System Model.
Suppose that the 5G platform of medical community includes m medical institutions and n communities. According to the scope of service and medical level, the mapping relationship between each community and different medical institutions is established, denoted by C � c ij m,n i�1,j�1 . Here, the electronic file data is generated by the medical institutions according to the community it serves, which is used to obtain the medical needs and feedback of residents. Note that each medical institution initiates one data collection task in the time dimension. So, T � t i m i�1 denotes the task set. For convenience, we define the data collection process as follows.
Definition 1 (data collection task). e process of edge computing terminal completing data collection task is to select the corresponding observation values from a series of candidate data samples corresponding to the community for 5G platform.
For each data collection task t i , we assume that it has a candidate sample set A i shown as follows: en, let a i (0) denote the optimal data sample corresponding to the task t i . erefore, the data error caused by task t i can be calculated by e resident set who provides the EMR/HER dataset from all communities can be represented as follows: Here, b j i represents the data in task t i , which is provided by the j-th community when the i-th task is initiated by i-th medical institution when there is the mapping relationship belonging to C. So, we can calculate the data matrix corresponding to data collection tasks from all communities through the following equation: Here, we redefine the data sample a i (k) in A i as a i (k) t , which is calculated according to the time dependence of edge computing. Note that we can update the mapping relationship between each community and different medical institutions through the following equation: Here, c ij is the tolerance threshold of the data collection error, which is a given empirical value.
In order to analyze the direction accuracy of data collection conveniently, we give the following definition of data collection aggregation reliability.
Definition 2 (data collection task). For m data collection tasks and n communities, the direction of data collection is accurate when each task satisfies the following conditions: (1) e i-th task and j-th community have the mapping relationship, i.e., c ij � 1. (2) A i and B i have the same rank for each task t i .
(3) All the data samples collected by all the medical institutions have consistency; i.e., the following function f(A) is valid: (4) Let Γ(t i ) denote the data collection time cost. e time cost of the i-th task is not larger than the deadline τ i .

Problem
Formulation. e objective of directional data collection (DDC) problem is to design a data collection scheme based on 5G edge computing system to gather the EMR/HER data from the medical community to minimize the service error under the deadline constraint of data collection. x ij is a binary variable to indicate whether data of task t i is collected from j-th community. x ij � 1 if data of task t i is collected from j-th community. x ij � 0 otherwise. e DDC problem can be formulated as follows: Constraint (a) gives the value range of x i,j . Constraint (b) ensures that the rank of candidate data sample set is equal to that of data in task t i on the basis of C. Constraint (c) ensures that each mapping between the task and its corresponding community is valid. Constraint (d) ensures that the time cost of data collection in each task is not larger than its deadline.
We list the frequently used notations in Table 1.

Intelligent Data Collection Scheme
In this section, we propose our 5G edge computing enabled directional data collection (5EDDC) algorithm. en, the detailed description of the proposed algorithm is presented in two phases: data direction prediction and data collection planning.

Data Direction Prediction Algorithm. Medical behaviors and requirements of residents in different communities
actually reflect the regeneration direction of EMR, which contains a lot of medical information, such as common diseases and medical habits. Based on the implementation of a variant of algorithms [19], we design Algorithm 1 for solving the problem of data direction prediction. First, we predict the medical behavior of residents through the following steps: Step 1. Feature extraction takes into account the following features for each behavior in community generating along the historical records B i with the last data collection task t i and candidate sample set A i of current medical community: time of day, medical behavior starting time, medical community name and its location, resident ID, and medical treatment time of each location. e above procedure is denoted as function Feature Extraction(dataset).
Step 2. Random forest-based prediction needs the input vector n t representing the information of a resident, which includes the origin community and destination medical institute, as well as the corresponding extracted features from step 1. en, the data generating time of any community can be predicted based on temporal data dependencies and spatial data correlations. e above process is denoted as function Random Forests.
Second, we can predict the data direction of community for collecting data via the Interval-based Historical Average (IHA) [20] as shown in the following equation: Here, E[t a ] is the expectation of data generating time of task t a based on historical EMR/HER data, λ is the mean absolute error of the random forests [21] for data generating time prediction, and days is the days of historical EMR/EHR dataset. t p is the first data collection starting time after E[t a ] + q in the p-th day of the dataset, where E[t a ] + q is the generating time of EMR/HER data for recording residents in a-th community at the q-th day. us, t p − E[t a ] − q is the historical data collection time on the q-th day. Finally, we calculate the EMR/EHR data generating time of communities and data collection direction of medical community.
In Algorithm 1, the Sink(F, B) function is used to find all the data collection sinks of EMR/EHR at its medical community. In addition, Algorithm 1 updates all the data samples and records between any two communities and computes their collecting time through the following steps: (1) Find all a i , which can satisfy the tolerance (2) Extract all the features from historical dataset

Data Collection Planning Algorithm.
Based on the direction prediction of data collection, the DDC problem is equivalent to finding a data route to collect the EMR/HER data for all selected medical communities with minimum time cost. e above data collection planning (DCP) problem can be formulated as follows: Moreover, we design the data collection planning algorithm (DCPA) to solve the DCP problem. e basic idea is given as follows (see Algorithm 2): (1) Transform A i and integrate into A (line 5) (2) Update B (line 6) (3) For each element in B, we first divide it into two separate sets P 1 and P 2 , and then remove half of elements in B (lines 8-10), and then find the corresponding subroutes (lines 11-14) (4) Integrate all the subroutes into the final data collection planning P (lines [15][16][17][18] where the symbol ⊎ represents the integration of some routes.

Numerical Experiments
In this section, we conduct extensive simulations to verify the performance of our proposed algorithms with different number of medical institutions, number of communities, days of month, and number of residents of a community.

Data Description.
e dataset used in our experiments is from the electronic records system of Changshu No. 1 People's Hospital. e dataset shows a kind of representative medical community data. It is generated by the medical community supported by Changshu No. 1 People's Hospital, which covers the period from January 1, 2018, to November 30, 2018. e dataset includes the record data of medical institute and residents, and GPS data of medical institutes. e data record contains various fields, such as time of day, medical behavior starting time, medical community name and its location, resident ID, medical treatment time of each

Notation Description c ij
Mapping relationship between i-th community and j-th different medical institution C Set of mappings t i i-th data collection task T Set of tasks A i Set of candidate data samples belonging to i-th task Optimal data sample corresponding to task t i e i Data error caused by task t i b j i Data in task t i provided by the j-th community when the i-th task is initiated by i-th medical institution B i Set of data in task t i on the basis of C a i (k) t k-th data sample calculated according to the time dependence of edge computing c ij Tolerance threshold of the data collection error between the i-th task and j-th community f(A) Consistence function τ i Deadline of task t i location, and the number of patients of the corresponding medical institute, etc. e GPS data contains latitude, longitude, and treatment time of medical institute.

Simulation Setup and Benchmark.
We assume that there are 10 medical institutions to provide medical care for 10 communities, which are supported by Changshu No. 1 People's Hospital. e residents receive medical treatment from the above medical institutions. In our simulation, we evaluate the total time cost of data collection, and data quality calculated by equation (2). All the simulations were run on a cloud server ECS [23] with 12-core Intel Xeon Platinum 8269CY and 48 GB memory. e other parameter settings of our simulations are listed in Table 2.
We develop the data collection algorithm DCA in [22] as the benchmark algorithm for comparison, which can make an efficient tradeoff between the data collection efficiency and energy consumption through the combination of the energy of the emotional device wireless device.

Performance Evaluation.
In this subsection, we evaluate the performance of our algorithms and DCA in the scenario shown in Figure 4. Tables 3 and 4 give the locations of medical institutions and communities in the area, respectively. e above information is calculated based on Google Maps. Figure 5 shows the prediction errors on two different days, i.e., March 10, 2018, andOctober 20, 2018. e prediction results demonstrate the effectiveness of our proposed prediction algorithm DDPA. e average prediction error on March 10, 2018, and October 20, 2018, is 3.21% and 1.93%, respectively. Figures 6 and 7 show the impact of medical institutions on total time cost and data quality of our algorithms and Input: T, dataset Output: A, B (1) for i � 2 to m do (2) Forests(F, A, B); ALGORITHM 1: Data direction prediction algorithm (DDPA). (8) Take (|B|/2) elements to P 1 ; (9) Take another (|B|/2) elements to P 2 ; (10) Remove (|B|/2) elements from B; (11) Find subroute p count is is because the data quality of our algorithms is better than the those obtained by DCA, respectively. Figures 8 and 9 show the impact of communities on total time cost and data quality of our algorithms and DCA, respectively. e average total time cost of our algorithms and DCA is 5.75 hours and 1.63 hours, respectively. e average data quality of our algorithms and DCA is 73.27% and 97.59%, respectively. e results show that our        algorithms reduces 71.67% of total time cost of DCA on average, and improves 33.20% of data quality of DCA, respectively. Figures 10 and 11 show the impact of days of month on total time cost and data quality of our algorithms and DCA, respectively. e average total time cost of our algorithms and DCA is 16.15 hours and 9.99 hours, respectively. e average data quality of our algorithms and DCA is 63.31% and 94.01%, respectively. e results show that our algorithms can reduce 38.12% of total time cost of DCA on average, and improve 48.49% of data quality of DCA, respectively. Figures 12 and 13 show the impact of communities on total time cost and data quality of our algorithms and DCA, respectively. e average total time cost of our algorithms and DCA is 8.92 hours and 2.86 hours, respectively. e average data quality of our algorithms and DCA is 68.44% and 95.21%, respectively. e results show that our   algorithms can reduce 67.76% of total time cost of DCA on average and improve 39.11% of data quality of DCA, respectively.
Overall, our algorithms can significantly decrease the total time cost and improve the data quality through the designed data direction prediction algorithm and data collection planning algorithm.

Conclusion
In this article, we have designed the 5G edge computing architecture for medical community to improve the effectiveness and efficiency of EMR/EHR data collection. First, we formulate the directional data collection (DDC) problem to gather the EMR/HER data from the medical community for minimizing the service error under the deadline constraint of data collection deadline. Second, we design the data direction prediction algorithm (DDPA) to predict the data collection direction, and propose the data collection      planning algorithm (DCPA) to minimize the data collecting time cost. Finally, through the numerical simulation experiments, we demonstrate that our proposed algorithms can decrease the total time cost by 62.48% and improve the data quality by 36.47% through the designed system, respectively.

Data Availability
e labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.