Research on Precision Marketing of Real Estate Market Based on Data Mining

As one of the basic industries in China, the real estate industry contributes much to the national GDP every year and plays an important role in stimulating the economy. After years of development, the real estate industry has accumulated a large number of sales data, includes the data of customer and construction. However, the utilization of them is still in the stage of extensive collection and use, which is not suﬃcient for the accumulated data. Therefore, a precise marketing management system for real estate enterprises based on data mining technology is developed in this paper. Through big data mining, the target customers can be accurately subdivided and portraited, which realizes the matching and prediction of resources. On this basis, accurate marketing and promotion of multichannel collaboration are implemented, which realizes the innovation of real estate marketing mode in the era of big data.


Introduction
With the rapid development of information technology, the use of Internet+, 5G, cloud computing and mobile terminals has brought us into the era of big data [1]. Big data technology has become a hot topic at present that the arrival of big data era promotes the interaction between human beings and makes exchange of information more convenient, which also makes the Internet economy more closely linked, so that people can create wealth without leaving home [2][3][4]. "Big data" has penetrated into all walks of life, which subverts the operation thinking and marketing mode of many traditional industries. In this context, big data touches the nerves of the real estate industry, and real estate enterprises can accurately understand the demand of consumers, so as to formulate accurate marketing strategy [5].
As an emerging field, data mining has a wide range of application prospects, and has been widely used in all walks of life [6,7]. e combination of real estate field and data mining technology is a topic that researchers are devoted to studying in recent years. It is meaningful to using data mining technology to research and analyze the application of big data in real estate marketing.

Overview of Data Mining Technology
Precision marketing is based on the collation and analysis of customer s' data, which can accurately grasp the demand of customers, provide customers with appropriate products and marketing means, and realize the company's interests. Data mining, including clustering analysis, discriminant analysis and factor analysis, is commonly used in practical technology [8].

Definition of Data
Mining. Berry and Linoff [9] defined data mining as the process of exploring and analyzing large amounts of data in order to discover meaningful patterns and rules. A large number of data may be partly noise data or fuzzy data. e object of data mining can be a database, a file system, or any other data collection organized together. e data set in the real estate database includes real estate sales data and feature data of real estate. By randomly extracting part of the data, and data conversion, analysis and other processing, we can find the key data that needed for the development of real estate sales strategy [10,11].
Data mining is divided into directed one and nonoriented. e purpose of directed data mining is to interpret or classify a specific target domain. e purpose of nondirectional data mining is to find out the pattern or similarity between batch data without preseting target domain or class.

Steps of Data Mining.
e specific steps of data mining often take different steps or processes depending on the industry, technology itself and its situation. In addition, whether the data is complete or not and whether the professionals are skilled will also have an impact on the process. erefore, the industry generally believes that the degree of systematization and standardization of data mining process, and there is a positive correlation with the value of the information. Usually, the main steps of data mining can be broken down into the following programs [12][13][14]: (1) Understand the data and the source of the data (2) Acquire relevant knowledge and technology (3) Integrate and check data (4) Remove erroneous or inconsistent data (5) Establish models and assumptions (6) Actual data mining work (7) Test and verify the mining results (8) Interpretation and application e above steps show that there are a lot of related preparatory activities before the real implementation of data mining. While statistics show that data preprocessing takes more than 80% of the data mining work, including data filtering, format conversion, variable integration and data table linking [15][16][17] is shown in Figure 1.

Methods of Data Mining.
e purpose of data mining is to mine valuable information. In order to realize data mining, we need to adopt certain methods, which is helpful to the better realization of data mining [18]. It can be completed by different methods and means, while users can choose the appropriate method according to their own needs. e following content describes several common methods: e word "summary" is not difficult to understand literally, which is to use simple and concise sentences to summarize the more complex issues, so that people can understand the content of the exposition in a short time. In the same way, data summary is to concentrate the existing data in certain ways, such as through statistical methods such as sum and average, e calculation results of the data are reflected by charts. In addition, the common chart models are column chart and pie chart [19]. Data mining i is actually a process of data summary, but its research is more in-depth, which needs a more comprehensive summary of the data from a deep level and a wide angle. Generally, the data are analyzed, synthesized, abstracted and summarized in order from low level to high level, so as to find out some internal relations of these data, and judge the direction of future data development through these laws. According to the methods, it can be divided into multidimensional data analysis method and attribute oriented induction method.

Classification
Mining. Classification mining refers to the classification of data in order to mine its potential value. It mainly classifies the data by mapping, so as to correctly classify the data to be processed. ere are many ways to construct classifiers, while statistical method, machine learning method and decision tree method are commonly used [20,21].

Cluster.
Cluster refers to aggregate classification, that is, the data with the same characteristics are classified, aggregated and stored separately. In this way, the relationship between data is very clear, and the data belonging to the same category have the same or similar categories; on the contrary, data in different categories have different categories which is helpful to the later work. Statistical method and machine learning method [22] are commonly used in clustering methods.

Design of Precise Marketing Management
System Based on Data Mining 3.1. Demand Analysis. e purpose of functional requirements analysis is to clarify the functional indicators of the real estate precise marketing management system, that is, the function points that the system should have. In the development of software, there are many tools used to describe users' requirements. is section will discuss them in detail through use case diagrams. e overall use case of precision marketing management system is shown in Figure 2. e users of the target system mainly include enterprise leaders, market leaders, salesmen and system administrators. In addition, the business of the system can be roughly divided into real estate management, house management, building management, sales management, customer management, decision support and system management.
e real estate management mainly completes the maintenance of the basic information of the real estate, including the business of adding, deleting, modifying and querying the information of the real estate, and the users mainly include the person in charge of the market and the system administrator.
Building management mainly realizes the maintenance of building basic information, including the business of building information addition, building information deletion, building information modification, building information query and so on. e users involved include market leader and system administrator.
Housing management is the management of room information, which mainly maintains the specific information of the room, including the business of adding housing information, deleting housing information, modifying housing information, querying housing information, etc., and the users involved are market leaders and system administrators.
Sales management mainly maintains the basic information of real estate sales, including sales opportunity management, sales record management, sales performance management, etc., and the users involved are market leaders and salesmen.
Customer management includes customer information addition, customer information deletion, customer information modification, customer information query, etc. the user roles involved include enterprise leader, market leader and salesman. Among them, enterprise leader can only be responsible for customer information query.
Decision support includes sales forecasting, performance statistics, weekly and monthly performance reports. e users involved include enterprise leaders, market leaders and salesmen.
System management is mainly for system administrators, providing the maintenance of basic system information, including user management, data backup, data restore, permission setting, etc.

Overall Framework of System.
In the overall design of the system, first of all, we need to design the overall framework of the system. e target system is mainly used for the daily sales management of real estate enterprises which mainly targets at the enterprise leaders, market leaders, salesmen and system administrators, and provides different services for diverse users. According to the actual t needs of the system, B/S mode is selected as the structure, which is a distributed application architecture based on Web. It can meet the needs of different types of users and is suitable for the development of precision marketing management system [23].
In the process of execution, it will give all the interaction between users to the browser side of the system, that is, the interface layer, and the business and data related operations are handed over to the web server to complete, so that the browser and the web server work together to complete the processing of requests. e architecture of precision marketing management system is shown in Figure 3.

Interface Layer.
e interface layer is usually called user layer or application layer. e main function of this layer is to complete the interaction with users. On the one hand, it receives the request messages sent by users to the system, on the other hand, it feeds back the request results completed by the server to users for browsing. For users, this layer is the intuitive that their evaluation of the software will be directly reflected in the user's sense of operation on the interface layer. When a user sends out an access request, the interface layer will receive the request, and then send it to the business logic layer and data access layer. At the same time, the processing results will be fed back to the user's browser through HTML and displayed to the user.

Business Logic Layer.
e business logic layer is responsible for executing the part that needs business logic judgment and processing in the user request which is located in the middle of the three layers. It is a bridge between the client and the database and it can be said that it is the core layer of the three layers, and its role is very important. When the interface layer receives the user's request, it will send the request of the logical processing part to the layer, which is responsible for executing it. At the same time, it will also send the request of data processing to the data access layer. In the three layers, the business logic layer has its own responsibilities from the top to the bottom. It also acts as a callee's identity, in this layer, involves a lot of business relations. erefore, the business logic layer is the core part of the whole architecture.

Data Access Layer.
Data access layer is also known as application data source layer. e main function of this layer is to complete the operation of database, including data call and data processing.
rough the access layer, you can query, modify, delete and update database tables, and it will provide data call and execution related services for the middle layer. Because the operation of the system cannot be separated from the operation, and the database design of any system is relatively complex and takes up more resources, which makes the performance requirements of the database system higher, so the access mode of the database system should be optimized as much as possible, in order to improve the overall efficiency of the system.

Advantages of
ree-Tier Architecture. e reason why the architecture design adopts the three-tier architecture mode is as follows: (i) e three-tier partition makes the system more flexible, and realizes the maintainability and expansibility of the later functions and performance.
Because of the independence of the three layers, it is easy to transplant the database (ii) It accords with the design idea of "high cohesion, low coupling" in software engineering (iii) e three layers are independent of each other, and the related functions are relatively clear, which facilitates the development of developers, effectively improves the efficiency of system development work, and shortens the cycle

Deployment and Functional Structure of
System. e precision marketing management system based on Net environment is discussed in this paper, B/S mode is adopted for system structure, C# is used for foreground programming, SQL Server 2005 is used for backstage database, and ADO is applied between foreground and background Net connection to achieve interaction. For the system developed by B/S mode, the deployment can be divided into three parts: client, application server, and database server. e request submitted by the client will be sent to the application server for processing, and then through the operation of the database, the response to the user's request is realized. e deployment of the system is shown in Figure 4.
In the work of system design, the first is to design the overall architecture. When the overall architecture is determined, the design will have an overall direction. e next work is to design the function of the system, that is, to design some functions that the target system needs. is is the most concerned problem of users, which means, it can help users complete specific functions. e target system is applied in the real estate sales management. erefore, in the demand analysis stage, the author talked with the staff related to sales management in depth, and defined the functional requirements of the system, which also laid a good foundation for the design of functional modules. e requirements of modular design are summarized as follows: (1) In the process of module division, the system should be divided according to the hierarchy, that is, the system should be first divided from the overall perspective, and then the modules after division should be further divided, so that the final module can be realized in a certain way.

Scientific Programming
(2) All the modules divided should be as independent as possible. In other words, there should be no association between these modules under special circumstances. Of course, this situation is not absolute, but should be avoided as far as possible.  Scientific Programming estate management, building management, house management, sales management, customer management, users management, decision support and system management. e function module is shown in Figure 5. According to the process of concept design is the process of data entity design. Based on the previous analysis of the real estate precision marketing management system, part of the data entity design of the system is given below. e main entities of the system include user entity, house entity, building entity, room entity, customer entity, sales contract entity, etc. e content of database logical design is based on the entities obtained from conceptual design, and transforms each entity into the actual data physical storage structure.
is section gives the relevant data tables of the real estate precision marketing management system, and each data table includes the attribute description of each entity. e data tables of the target system include user information   table, house type table, room information table, building  information table, property information table, property sales  information table, customer information table, sales contract table, etc.  e structure of user information table is shown in  Table 1. e data table is used to store the basic user information of the system, including the fields of user ID, user name, user type, user real name, gender, age, etc., in which the user ID is the primary key. e structure of the house type table is shown in Table 2. e data table is used to store the house type information of the room. While this fields included are mainly house type ID, house type name and description, in which the house type ID is the primary key. e structure of the room is shown in Table 3. e data table is used to store the basic information of the room, mainly including room ID, building ID, floor, room number, house type ID, etc. among them, the room ID is the primary key, and the building ID and unit type ID are foreign keys. e structure of the building type is shown in Table 4, which is used to store the building type information, including the fields of type ID, type name, type description, etc., in which the type ID is the primary key. e structure of the building is shown in Table 5. e data table is used to store the basic information of the building. While the fields include building ID, Property ID, unit number, floor area, building type ID, etc. Among them, the building ID is the primary key, and the building ID and building type ID are foreign keys. e structure of the real estate is shown in Table 6. e data table is used to store the basic information of the real estate. e fields mainly include the property name, developer, floor area, building area, etc., and the real estate ID is the primary key of the data table.
e structure of the real estate sales information table is shown in Table 7. e data table is used to store the sales of real estate. e main fields include sales ID, ID, ID, average price, and number of households sold. Among them, the sales ID is the primary key, and the real estate ID, building ID and room ID are all foreign keys.
e structure of the customer information is shown in Table 8. e data table is used to store the basic information of customers, including the fields of customer ID, customer name, contact number, age, occupation, etc., in which the customer ID is the primary key of the data table.
e structure of the sales contract table is shown in Table 9. e data table is used to store the sales contract information. e fields include sales contract ID, customer name, purchase time, purchase price, payment method, etc. among them, the sales contract ID is the primary key of the data table.

Interface Design.
e interface design mainly includes the definition of external interface and internal interface. e detailed design is shown in Table 10.

Operation Design
(1) Combination of Running Modules. e system mainly takes a window as a module. Generally, a window completes a specific function. While the main window realizes the connection and combination of different functions between modules by opening another sub window. Each module is relatively independent, and the program has good portability. Moreover, the cooperation and data sharing between modules are realized by transferring the reference of data items.
(2) Operation Control. e user opens the system login window and enters the name and password. en the system jumps to the corresponding background according to the user type corresponding to the name, so as to realize different operations with diverse permissions and roles.
(3) Running Time. e running time of each module should be controlled within 1-2 seconds (most of which is in response to the user's action). As the system adopts message driven mode, it will effectively improve the utilization of computer.

Decision Tree Construction Algorithm.
e construction algorithm of decision tree can be completed by training set T, where T � 〈x, C j 〉 , and x � (a 1 , a 2 , . . . , a n ) is a training example, it has n attributes listed in the attribute table (A 1 , A 2 , . . . A n ) where a i is the value of attribute A i . C j ∈ C � C 1 , C 2 , . . . , C m is the classification result of X [24][25][26]. e algorithm is divided into the following steps: Select the attribute AI from the attribute table as the classification attribute; If there are K i values in attribute AI, T is divided into k i subsets T 1 , . . ., T K , where T ij � 〈x, C〉|〈x, c〉 { }| ∈ T and the attribute value a of X is the K i value; Delete the attribute AI from the attribute table For each T ij (1⩽j⩽K 1 ), order T � T ij If the property sheet is not empty, return (1), otherwise output At present, the mature decision tree methods are ID3 and C45. Cart, SLIQ, etc.  Information entropy is called average information quantity in information theory, which is an average value used to measure the information transmitted, which includes a finite number of mutually exclusive and joint complete events. ey all appear with a certain probability, which is represented by the mathematical formula [27]: a group of events X 1 , · · · , X r appears with a given probability p(X 1 ), · · · , p(X r ) while the mean value H (x) is the information entropy, and its value is equal to the mathematical expectation of the (self ) information quantity I (x) of each event In the traditional ID3 algorithm, the information entropy is used as the standard of attributes selection, and the value of information entropy is obtained based on data calculation. en it is selected by comparing the size of each information entropy, and the item corresponding to the information entropy is taken as the root node of the decision tree. After the example set is divided into subsets by using this attribute, the entropy value of the system is the minimum. It is expected that the average path of the nonleaf node to reach each descendant leaf node is the shortest, and the average depth of the decision tree generated is smaller [28]. In addition, it can be seen that the more fuzzy and disorderly the training case set is in target classification, the higher its entropy is, the clearer the training case set is in target classification, while the more ordered it is, and the lower its entropy is. ID3 algorithm is based on the principle of "the attribute with greater information gain is more beneficial to the classification of training cases". In each step of the algorithm, "the attribute in the table that can best classify the training case set" is selected. Moreover, the information gain of an attribute is the decrease of system entropy due to the use of this attribute to divide the sample, e key operation of ID3 algorithm is to calculate and compare the information of each attribute [29,30]. e above detailed introduction of ID3 algorithm, in order to better achieve data mining, here will be the basic strategy of ID3 algorithm. e implementation of ID3 algorithm is as follows: (1) Each node given in the training sample is taken as the root node of the decision tree to start the process of creating the decision tree. (2) ese root nodes are judged and analyzed. If they belong to the same class, they are set as leaf nodes, and the nodes set as leaf nodes are marked.
(3) For the samples that do not belong to the same class, the entropy based measure that called information gain is used as the heuristic information, and the best attribute that can be reclassified is selected from the heuristic information, which becomes the test or decision attribute of the node.

Decision Tree Generation and ID3
Algorithm. e ID3 algorithm proposed by J. R. Quinlan is an earlier and most famous decision tree induction algorithm. Given a set of nonclass attributes C 1 , C 2 , . . .. . ., C n , Category attribute C and record training set S, a decision tree can be constructed by ID3 algorithm. e ID3 algorithm of decision tree induction algorithm is described as follows [31].
//Returns a decision tree Function ID3 (R: a nonclass attribute set, C: a Category attribute, s: a training set) Begin If s is null, a single node with the value of failure is returned; If s is composed of records whose values are the same category attribute values, and returns a single node with this value; If R is null, a single node is returned, whose value is the most frequent Category attribute value found in s record; e attribute with the maximum gain (D, s) value between attributes in R is assigned to d; Assign the value of attribute d to {d j |J � 1, 2, 3, . . ., m}; e subsets of s composed of records corresponding to d j corresponding to D are assigned to {s j |1, 2, 3, . . ., m}; en return a tree whose root is marked D and its branch is marked d1, d2, d3, . . ., d m ; Combined with the previous analysis, this section analyzes the implementation of data mining with specific cases. Suppose that the following customer information exists in the database, as shown in Table 11.
(1) Step 1: transformation of data. According to the basic data of customer, the required data is transformed by generalization to higher-level concepts which are given as follows: Data by age are shown in Table 12.
Statistics by income are shown in Table 13.
According to the statistics of purchase area, the statistics are shown in Table 14. Statistics by marital status are shown in Table 15. (2) e second step. Get the expected information and information gain. e key to construct a good decision tree is how to choose good logical judgment or attribute. It has been found that the smaller the tree is, the stronger the prediction ability is. To construct a decision tree as small as possible, the key is to choose the appropriate logical judgment or attribute. Information gain here is used to select attributes. e calculation formula of the degree is as follows Among them, the data set is s, M is the classification number of S, CI is a certain classification label, PI is the probability that any sample belongs to CI, and S i is the number of samples on classification CI. Expected information: Info � 0.710086325 Information gain: gain � 0.289129554 It can be seen that income has information gain in attributes, so it is selected as splitting attribute. Node n is marked with age and grows a branch for each attribute value.
en Yuanzu divided them according to this.
Step 3: generate decision tree and extract rules According to the above data, a decision tree can be generated. e classification rules are extracted from the decision tree. R1: if income � middle and occupation � professor then, purchase area � big;    Purchase area Statistical data 120.00 m2 47 120.00 m2 above 44

Conclusion
With the application of mobile Internet technology, 5G, cloud computing and other network technologies, enterprises have an increasing need for big data, especially in the real estate industry. e era of big data brings real estate marketing not only a challenge, but also an opportunity. Real estate enterprises must seize the business opportunity of big data, adjust their marketing mode in time, and promote the successful transformation and upgrading of real estate enterprises. Based on this background, this paper uses data mining technology to design a precise marketing management system for real estate. Its biggest advantage is to realize sales forecast through mining and analyzing customer data, which provides reference for sales personnel to formulate marketing strategies.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.