Understanding the User-Generated Geographic Information by Utilizing Big Data Analytics for Health Care

There are two main ways to achieve an active lifestyle, the first is to make an effort to exercise and second is to have the activity as part of your daily routine. The study's major purpose is to examine the influence of various kinds of physical engagements on density dispersion of participants in Shanghai, China, and even prototype check-in data from a Location-Based Social Network (LBSN) utilizing a mix of spatial, temporal, and visualization methodologies. This paper evaluates Weibo used for big data evaluation and its dependability in some types rather than physically collected proofs by investigating the relationship between time, class, place, frequency, and place of check-in built on geographic features and related consequences. Kernel density estimation has been used for geographical assessment. Physical activities and frequency allocation are formed as a result of hour-to-day consumption habits. Our observations are based on customer check-in activities in physical venues such as gyms, parks, and playing fields, the prevalence of check-ins, peak times for visiting fun parks, and gender disparities, and we applied relative difference formulation to reveal the gender difference in a much better way. The purpose of this research is to investigate the influence of physical activity and health-related standard of living on well-being in a selection of Shanghai inhabitants.


Introduction
Location-Based Social Media Networks (LBSNs) have come a long way since we first took a leap in 2007 when Facebook opened up for use. e LBSN model is one that has been widely used for a variety of purposes including commercial, governmental, and nonprofit works. ough location-based social networking has still been regarded as an innovation, the paradigm is rapidly expanding. People have been taking advantage of the location-tracking data and the resulting insights to build more effective social applications for marketing, advertising, commerce, and even to help in the discovery of real-world phenomena. To date, over one billion people around the globe have joined social networks, and many of these social networks do not use geocoding to get their information. Instead, they often rely on manually curated lists of user-submitted location identifiers. As a result, a wealth of new information is being generated each day from such a model, and researchers are beginning to mine these records in a systematic way. e input is typically supplemented with facts, visuals, geo-locations, and textual data, which would be used to make more research on many aspects of people's actions. Past studies used either actually acquired fact for groups in specific classes, including such leisure or LBSN data for the entire society with no preset deployments. If properly classified, the numerous aspects of the LBSN facts would prove to be a powerful resource of information for assessing people's activity in a variety of fields like as entertainment, education, tourism, dining, and aviation.
As a result, in this research, we will fill the gap in research of employing LBSN statistics in article or content by assessing which amusement places Shanghai citizens desire to attend. Several studies have been conducted in an attempt to analyze and imitate human actions through geo-data. e latest research, for instance, leverages check-in records from globally prominent LBSNs such as Twitter, Facebook, and Foursquare to reveal connections and trends among consumers such as gender, expert or less skilled groups, and age groupings [1][2][3].
Based on the check-in time, the user may have spent some time at a certain place. is is a potential feature of spatial and temporal aspect. Some spatio-temporal features are related to the check-in time, longitude, latitude, location difference, and check-in time difference. In these features, we can only retrieve information for a location, and the time difference is very small. If we set the threshold of the checkin time difference, then we can select check-in data of different timestamps in the same location. e check-in time can help us to know where a person spent her/his time, and the check-in time difference can help us to understand the daily movement patterns of each user. However, this information does not help us to understand the time spent at different locations or the movement paths in LBSN.
Other researchers, to our knowledge, did not incorporate this earlier. As per significance, we focused on three unique parts of investigation on Weibo's check-in data from the city of Shanghai for two years, from July 2015 to June 2017, to identify spatio-temporal patterns and inhabitant's predictions utilizing physical engagement sites and density prediction. As a consequence, three key aspects of the assessment are highlighted in the newest study. Our input to the existing study is on these topics: (i) Time variations of an hour, a week, and a daytime (ii) Data collection and physical action site study (iii) Using spatial analysis to model and predict density Section 2 of this study contains relevant material on big data, LBSNs, and the important significance in a range of sectors and also articles on Weibo, Shanghai, and China. Section 3 summarizes both the dataset and the analytical approach. Section 4 contains the facts and a narrative, while Section 5 contains the study's findings and suggestions.

Related Work
A big issue that we should consider is the question, where do we find the "Big Data?" Actually, even big data analysis does not exist without facts although this does not mean that big data has to include all the data, and sometimes, some data may be more important. In some areas, particularly social sciences, "Big Data" involves massive open online courses (MOOCs), blogs, wikis, video recordings, live tweets, live chats, microblogs, video images, and documents that capture data (i.e., digital documents, websites, images, and videos) in a manner that enable the monitoring and management of these data over time and space, and this term is more often associated with Internet data and specifically Internet Big Data although the term "Internet Big Data" can also be applied to traditional data sources too. However, Big Data is not to be considered the opposite of "small data," with the latter referring to the data that is more easily handled and analyzed [4][5][6][7][8]. Instead, it is the integration of and the interrelation between small and large datasets.
Big Data is a type of data, collected and processed in a structured and unstructured way. On the one hand, a Big Data application is focused on collecting data that is big, complex, and diverse. On the other hand, another Big Data application has a focus on transforming the collected data into information and knowledge. Ovadia demonstrated and emphasized the importance of "Big Data" for intellectuals and social scientists, claiming that "Big Data" is too important to overlook since many social scientific study needs a large amount of facts and enormous datasets [9].
Several research fields, including time and human mobility, space geography, human behavior, and metropolitan activities, originated with statistical data collected via surveys, visit records, questioners, interviews, and other handcrafted datasets [10,11]. ese methodologies comprising geo-information, have recently spread powerful resources of information for such scientific papers, [6,12,13]. Tracking users' movements and actions have become simple because of the fast progress of mobile technology and the extensive use of mobile devices [14]. ough giving an approximated space just next to the mobile's base transmitter in which the calls were routed, the dataset demonstrated competence in forecasting user positions with little time, which was then used to anticipate user activities [15].
Zhu presented numerous GIS (geographic information system) components and their importance in pattern extraction and municipal studies by demonstrating how to evaluate and display the spatio-temporal parts of reusable garbage, collection, and rehabilitation [16][17][18]. e authors in these researches utilized these data for health purposes [19][20][21][22][23][24][25] and medical data security [18,26,27]. ese researches are based upon endoscope image, medical image registration, and soft tissue modeling [28][29][30]. e term "business intelligence" (BI) first appeared in the late 1990s [31], At the organizational level, "Big Data" is a significant area. Prediction, ad hoc inquiries, and aggregation-based reporting, along with processing organized and unorganized data and combining "Big Data"-based systems, all contribute to better decision-making [32][33][34].
e BI systems include data warehouses for gathering clean, precise, and comprehensive data from a variety of origins, and also online analytical processing (OLAP) for real-time multidimensional analysis with processes, for example, combining, screening, roll-up, spin, and exploration for details [35].
OLAP is such a well-recognized and well-respected methodology for "Big Data" research in BI systems [36]. However, while BI, OLAP, and data warehouse are powerful tools for coping with vast volumes of data and a vast variety of operations, they also provide a challenge owing to the considerable cost, space, and computing resources necessary [37][38][39][40]. e authors introduced new data mining techniques in these researches [41][42][43]. Digital social networks have been demonstrated to be the most significant source of "Big Data" to research personal activity since they have been used and are fast expanding in virtually every location across the world.
Customers of the LBSNs' digital services may upload and exchange their actions, preferences, and whereabouts, resulting in massive amounts of data for numerous researches on many areas. Papers [44,45] go into excessive details regarding the human behavior analytic techniques. Lindqvist [46] discovered the usage of LBSNs, which was encouraged by earlier research studies; for example, practical studies and socio-spatial aspects utilizing LBSNs [47] and a personalized geo-social suggestion relied on a dataset from two independent LBSNs, namely,Gowalla and Foursquare [48,49]. By gathering regular users at various locations, researchers in two UK cities used comparable check-in data from LBSN to enhance the recommendation algorithms.
Li [50] did a thorough examination of location-based data from 2.4 million places in 14 states to determine the elements affecting place popularity. e research's findings identified three major factors that influence a location's fame: site profile, site age, and site type. Another study on consumer actions at numerous venue categories focused on "Food" in Riyadh, Saudi Arabia, and revealed that when clients attend food venues, they seem to be more receptive to discussing their behaviors. e check-ins of about 19,000 Swarm (Foursquare) members from three metropolitan centers, mainly San Francisco, Hong Kong, and New York were utilized to evaluate linkages between different sites at different times of the day [51].
Several researches have been conducted all across the universe to investigate different consumer and check-in characteristics utilizing LBSN sources of information like Twitter and Foursquare. ese traits have been used in a variety of sectors, together with transportation trends, venue categorization, and urban development and growth [52]. Weibo, a well-known Chinese LBSN, was used in this investigation and demonstrated to be beneficial. In research for Shenzhen [53], Weibo data were utilized to analyze the request powers of tourist charms. Another study on people's action trends and engagements was assumed in order to observe urban boundaries in Beijing [54]which likewise used Weibo check-in. In a comparable vein, Shi et al. [52] utilized Weibo data to examine attributes of tourism sites based on numerous details provided by the LBSN, and the evaluation was combined with feelings from user reviews. Wu et al. [55] performed spatio-temporal assessment grounded on the period of day and the variance in check-in patterns between weekdays and weekends [56,57] also examined check-in research containing 21 of Wuhan's most famous lakes.

Study Area and Data Source.
Shanghai, like the other megacities in East Asia, experiences urban sprawl. is is particularly the case in low-and middle-income groups and among minority groups, such as the ethnic minorities (like Zhuang, Dong, Miao, and Hui), while the rich and highincome groups have higher purchasing power are more likely to move into the inner city and tend to be concentrated in the central city. e hour of the day and the difference in check-in trends across weekdays and weekends were used to do a spatio-temporal evaluation. Figure 1 displays the research domain, and it can be seen that we focused and utilized the data of 10 districts. e data are gathered from Weibo, which contains people' check-in details while they are at a specific place. e check-ins from Weibo contain latitude, longitude, accuracy (e.g., 1.8 meters), address, time, and content. In addition, Weibo has been updated more than 500 million times in a day, covering about 1/5 of all of Weibo's activity, which makes it one of the most extensive and authoritative datasets in the field of big data. e spatio-temporal check-ins gathered in this research is collected from the city of Shanghai, which covers four prefectures.
ere are three major problems associated with collecting the spatio-temporal check-ins of the Weibo from a single city. First, it is very difficult to extract real locations from a large number of posts with various locations. us, we have to use heuristics such as the latitude and longitude of the home location to extract places. Second, as Weibo user accounts contain a location-based tag, the real location of the user is more or less accurate, but the real location of posts is different from the location of the corresponding Weibo users. ird, some posts just contain a location without other information such as a picture. e number of visits has risen to 500 million by the end of 2018, with 462 million monthly regular individuals and 200 million daily active users. is study focuses on two years of socially collected spatio-temporal check-ins from Weibo in Shanghai, from July 2015 to June 2017. e fundamental reason for utilizing LBSN is to exchange activities and observations, which results in the establishment of a new close social fellowship group. is allows experts to deduce a broad spectrum of individual action and pleasure from the geo-data collected by these LBSNs.
is study's data originated from LBSN's Weibo account. We utilized the Weibo API (application programming interface) based on Python to collect data from check-ins in Shanghai.
is was compiled in 2017 within over 3.5 million cumulative check-ins from approximately two million people. For the current study, the data were converted from Java-Script Object Notation (JSON), the standard API Java programming language, to comma-separated values (CSV) using Mongo DB. e information processing route is depicted in Figure 2.
JSON is a tiny data transfer standard that uses humanreadable language to transmit data entities, while Java is an object-oriented programming environment [58]. e information was combined into one file in the CSV (commaseparated values) format for more processing and analysis with the tools indicated.
All of the contributors' information, including geolocation, might be recorded in a database. We collected the data in the CSV format and then used a criterion to Computational Intelligence and Neuroscience determine the significance of the results. e criterion figure is depicted in Figure 3. e CSV standard is the most prevalent and significant standard for databases and spreadsheets, and it uses commas as delimiters for different parameters. In the JSON file standard, keys (ID, latitude, longitude, etc.) are used as headers for the CSV file format, while values (5404478798, 121.544449, 31.268159, etc.) are used as descriptive data. In the CSV file, Table 1 shows an illustration of a "check-in."

Social Media Data Analytics Framework.
A social media data analytics framework was constructed to provide the analytical capability of a business unit or an enterprise organization that needs to use the social media to analyze the social media data in their systems. Social media data are the social media messages, tweets, blog posts, pictures, and so on created by the user(s). Social media data analytics framework can analyze this data, process them, and provide the required information. Social media data analytics tools are tools that perform real-time data analytics and report on social media. ey are used to analyze the social media data, discover valuable insights about social data, and deliver actionable recommendations. Our broad geographical assessment approach is depicted in Figure 4. e first element is broken into two parts: data collection (downloading Weibo data) and data cleaning. Following that, the LBSN data is analyzed , as shown in Figure 5. We used ArcGIS software for the spatial analysis and tableau for the graphical representation of temporal analysis.

Spatial Method.
KDE is used to estimate the distribution of data and can be useful in estimating probability distributions of numerical variables and to calculate a density-based visual perception in a given image     Computational Intelligence and Neuroscience [47,48,52,56,[59][60][61][62][63][64][65][66][67][68]. e method works by determining an exact probability distribution function in a given image and then calculating the integral of this probability distribution function to obtain corresponding intensity estimation. ere are a number of different estimation techniques for numerical variables. e most frequently used methods are the kernel density estimate (KDE), the Parzen window (PW), and the Epanechnikov window. As the name indicates, Kernel density estimate uses the kernel function as its basis for a density estimate. e Parzen window method is a variation of KDE that uses a Gaussian kernel function.
e Epanechnikov window technique is a nonparametric density estimation approach that makes use of the Epanechnikov window's flattening capabilities. Since the 1960s, the Epanechnikov window has been utilized as a nonparametric concentration prediction approach. However, the KDE and PW methods are parametric methods. e Epanechnikov window method is a nonparametric method. Parametric methods are based on distribution functions. To compute a parameter estimate using a parametric method, it is necessary to specify the parameter values. Parametric methods are also commonly used in image analysis and computer vision. A parametric method assumes that the object that is to be described has a unique parameter. Parametric density estimation is widely used to estimate the parameters in the distribution of physical quantities and to estimate probability distributions.
We utilized the KDE method here in our research because for modeling spatial-densities, the KDE method has also been used in fields such as health, marketing, and environment [48,60,61]. Recently, as it is used in the analysis of spatial data for determining distribution of the phenomena in geospatial data, many researchers applied the technique to this field. One of the basic topics is that the density of points is determined, but the points are distributed on grids due to the data structure.
ere are many methods of determining the density. If the grid size is large, many points have been concentrated in each grid, and the points are not sparsely distributed. us, the density of the points is high. us, it is thought that the density is determined by using the density of the grid. is process is called density-grid transformation.
KDE has been used in studying the patterns of visitors in green parks [66,[69][70][71][72][73][74][75][76][77]. ese studies, mostly concerned with the visitors' number, are based on the assumption that people who visit a green park have similar behavior. To identify the number of people who visit a green park, the visitors are often modeled using kernel density estimation. e number of visitors in this method is simply defined by the area of each kernel. e mean of these individuals is called the mean estimator of the total number of people who visit a green park. It is not known whether this method gives an unbiased estimate of the number of people who visit a green park.
Let E be a set of historical data where e j � < x, y > is the geo-coordinates of a location, 1 ≤ j ≤ n for an individual i, h j is the Euclidean distance to k-th near neighbor e j in the training data. e KDE is stated as follows: K h j e, e j , (1)

Results
Shanghai City, with such a population of 22,125,000 residents and a land area of 4,015 square kilometers, has become one of the world's fastest-growing cities [78,79]. Over two years, data have been accumulated on amusement check-ins. Every single check-in was allotted a value that best suited the physical activities taken out at that place, like gym, sports, and park exercise. Figure 6 depict the total locations of such activities in our study area, and the total number of locations is 865, as can be perceived in the given figure.
We investigated the geographical variation of check-in data with KDE and showed the Weibo geolocation check-in dataset with ArcGIS. From July 2015 to June 2017, the overall check-in intensity in Shanghai was depicted in Figure 4. Sections highlighted in black represent a bigger number of persons, a higher frequency of activity, and a better knowledge of social network usage. It is just not surprising that the seven districts look thicker than the other three districts even though the three districts have a bigger area. Figure 7 demonstrates the temporal fluctuations in the number of visitors over 24 hours. Even though visitors participated at all hours of the day, most check-ins were recorded between 05 PM and 09 PM among the entertainment sites investigated. Until midnight, the tendency will continue to rise. Figure 8 illustrates the digit of check-ins for each day with the gender difference, and it can be observed that weekends have a bigger size of check-ins than weekdays but the number of males is higher than that of females for each day, and it is shocking.
Check-ins are disparaged at the district level to provide a more accurate picture of entertainment place distribution in Shanghai City. Figure 9 shows that the distribution of checkins is the greatest in the Pudong region, preceded by the Huangpu district.
is pattern can be explained by the fact that the Pudong new area district is larger than some other districts. Another factor to bear in mind is that check-in dispersion is stronger in the city region than in the outskirts. Gender differences may also be evident, with male check-ins outnumbering female check-ins throughout all areas. Figure 10 illustrates the total overall check-ins made each day in all the districts of study area. It can be witnessed that if it is the day or a district, the number of check-ins made by men is higher than that of females.
We applied an analysis on our given data for revealing the comparison of gender difference in the physical strain activities which are really necessary for a healthy life. Table 2 shows the percentage of number of check-ins.

Computational Intelligence and Neuroscience
To compare districts and days, we used a relative difference (d r ) of a given gender. When there is a big difference, the absolute difference is small. If the difference between females and males is large, the relative difference is smaller. For example, we calculate the difference between total male check-ins and male check-ins per day in    Computational Intelligence and Neuroscience weekdays in the districts. ere are some districts with a relatively large difference, and there are some districts where the difference is small. It is generally used as a quantitative pointer of quality control and quality assurance and is stated as follows: Finally, the analysis examines the difference between allmale group and female-male group in Shanghai and the time span of two years to form male-only groups and female-only groups. In this context, all-male group means that the total check-ins of male is more than that of females, whereas the female-only group means that females' check-in is more than males' check-in. Tables 3 and 4 shows that there are significant differences among gender in both days in week and districts. e results show that the frequency of women   Computational Intelligence and Neuroscience in the districts is less than the men in all days of the week, except Friday. is indicates that male users prefer other days to Fridays while women users are more active on Friday. Table 3 shows the difference between male and female user check-ins on the different days of the week. As expected, we see significant difference among the days of the week. Overall, Saturday has the highest number of male check-ins, and Friday has the highest number of female check-ins. is indicates that different days of the week are equally important in terms of check-ins. However, we see some differences in the number of check-ins among the different districts. We see some differences in the number of check-ins in the different districts. Table 4 shows that districts Jingan, Putuo, Xuhui, and Yangpu have a significant difference between male and female check-ins. ese results confirm the previous observation that there is difference in the distribution of the male and female users in the city.

Discussion
e study of urban environment is attracting increasing research interest and has developed rapidly in recent years. However, there is still a gap between academic and popular knowledge about activities, and most existing research are conducted in the form of case studies in one or two places. Most of the research data and conclusions are based on the investigation of one city or district. e research results have limited significance for the whole city. However, most study data are examined using statistical approaches that do not take into account spatio-temporal features, so they cannot represent the popularity of activity across time and location.  is not only contradicts the features of the urban area but also fails to suit the demands of modern city design. In this paper, we aim to analyze the characteristics of the popular physical activities in Shanghai from a spatio-temporal perspective and compared the gender difference. We study how the Shanghai public's popularity for different kinds of physical activities has changed over time, which popular location is most favored by the public, and in which places the physical activity is most favored. e data of Shanghai Weibo, the largest social media platform in China, is used to conduct the investigation. In summary, we find those as follows: (1) ere are some places with higher popularity compared to others and (2) there is also difference in the popularity of physical activities over time and space. We believe that these findings can not only provide references for urban management but also help to understand the public's favorite locations in one city.
Several limitations exist in this study. Data security is really necessary for dataset or any method [80,81]. First, since the data we collected were based on online records, the results may be biased because different Internet access habits among different geographical areas affect the data. For example, people in rural areas may have more Internet access and thus have more chances of accessing social media than people in urban areas. Second, we only selected a single city as the sample, and this limited the generalization of our results.
ird, although WeChat check-in data is a very powerful tool in the evaluation of physical activities quality, this approach still contains subjective information, which cannot be quantified. erefore, the evaluation of data quality in our study should be revised based on more and more data-driven evaluation methods.
Accessibility of the data is a major hurdle to LBSN research due to data and confidentiality safety considerations. e ability of LBSNs to reveal users' and their contacts' present geolocation which raises serious privacy concerns.
ey are worried about their privacy, so are administrative or business users that communicate data over LBSNs. Personal information is sometimes given freely or inadvertently. ough the data are acquired frequently by offering customers special rights and prizes in exchange for their information, this is not at all times the case.
Our findings indicate that this new channel has been relatively successful in terms of both social media check-in and activity tracking activity. However, despite this success, this channel only covers roughly half of all locations. For this reason, our findings have important implications for how location-based data might be used. In terms of data validity, the study provides a unique opportunity to compare Weibo check-ins with activity tracking. Only a few studies have compared the two channels. As with other research, we found that the Weibo data were accurate, but not complete. As discussed in the introduction section, we believe this is not a reflection of the data validity but rather an aspect of the usage of these services.
Our findings contribute to the understanding of user behavior and the understanding of the behavior of social media users in general. Because check-ins are executed by a moderately small group of Weibo users, they have been used in other studies as a good pointer of Weibo users' day-to-day activity [71] However, our findings reveal that the number of locations has not been increased significantly in Shanghai. When compared to prior studies, it can be noted that all of the studies show that female consumers are much more active than males during the various activities in Shanghai [69-74, 82, 83].

Conclusion
We looked at consumers' check-ins in 10 distinct districts of Shanghai, focusing on different aspects of geo-referenced data. We utilized the KDE method by collecting the checkins of different users of Weibo microblog and all the checkins were collected during the users posted when they were doing any physical activity in gyms, playing grounds, and other parks as well. Our data shows that most people submitted their physical action check-ins in Shanghai's city center, which is separated into seven districts. Pudong new area and Huangpu are denser districts than others, and weekends have more check-ins compared with other days. e conclusions reveal that male users are more active in physical activities either in gym or playing ground in all the districts of Shanghai and its shocking revelation. e purpose of our research is to create awareness among people about the benefits of physical activities on health. e research may be valuable in detecting more overcrowded areas in Shanghai so that regulating or management institutions can more efficiently watch and aid such districts, notably in events, public action, and urban development, among other things.

Conflicts of Interest
e authors declare that there are no conflicts of interest.