Research on Patent-Knowledge Representation and Automatic Classification Based on Situation Mapping

A patent is a type of long-term literature containing the most complete design information in most ﬁelds. Thus, it can provide designers with valuable guides for solving various design problems in diﬀerent ﬁelds. Establishing the mapping relationship between the design problem and patent knowledge in diﬀerent ﬁelds is the key to creating an incentive channel for the transfer and combination of multidisciplinary patent knowledge. According to the situation mapping between the design problem and patent knowledge, a method for patent-knowledge representation and automatic patent classiﬁcation is proposed in this paper. The problem situation is described using four dimensions, namely, function, performance, relationship, and emotion, according to the design-problem types. Multigranular situation attributes are extracted from diﬀerent design problems. A structured attribute database is established. The mapping relationship between the problem situation and patent knowledge is developed. To realize the eﬀective utilization of the patent knowledge, this study investigates an automatic patent-classiﬁcation method using the situation attributes as classiﬁcation categories. An application system of patent knowledge is developed by this method. The system can support the search for patent knowledge related to the design problem and eﬀectively assist designers in achieving an innovative design process.


Introduction
According to a study by the World Intellectual Property Organization, 90%-95% of inventions in the world are reported by patents, and 80% of them are not recorded in any other texts. Effective use of patent resources can shorten product development time by 60% and save 40% of the cost incurred on research and development [1]. Patent knowledge is not only a carrier for innovative achievement but also a significant resource to expand the knowledge space and promote the level of inventions. erefore, it is vital to effectively exploit the tacit knowledge in patent literature in order to provide designers with more valuable design information for the innovative design of computer-aided products.
One problem is identifying the motive force of knowledge discovery, and it always starts with the representation of the problem situation. Constrained by limited short-term processing capacity and the focus on current complex tasks, the human brain finds it difficult to shift from one mode of thinking to another [2][3][4]. Verifying every possible process for cognitive-information processing can be overwhelming, and the way in which people usually solve problems depends on analogical thinking [5]. Whether they solve familiar or new problems, people always start finding similarities between the current and previous problems and then activate the knowledge associated with them. e analogicalthinking mechanism of the brain represents a more efficient way of life, thereby enabling humans to almost always find effective solutions to problems rather than scanning the memory by trial and error. With regard to the exertion of analogical mechanism in the problem-solving process, current researchers believe that analogical problem-solving requires problem solvers to see some similarities between the current and existing problems in the memory and identify whether they are surfaces, features, semantics, relationships, structures, and so on [6]. e essence of analogical thinking is the transfer of design knowledge from one situation to another using a mapping process. It aims to find a set of oneto-one correspondences (often incomplete) between one body of knowledge information and the aspects of another [7][8][9][10]. To accomplish the situation transfer between problems and knowledge, their mapping should be established, and relevant knowledge resources should be organized systematically.
According to user requirements, this paper holds that the design problem is divided into four dimensions: function, performance, relationship, and emotion. e design problem from functional dimension reflects user's description of product's functional requirements; the design problem from performance dimension reflects user's description of product's technical performance for functional requirements; the design problem from relationship dimension reflects user's description of the relationship between components of the product; and the design problem from emotional dimension reflects user's description of the product's appearance or manifestation. To accurately describe the main design problems, extracting different situation attributes from the design problem is necessary, and these attributes are helpful in clearly defining the design problems. Noh [11] and Cong [12] extracted the inventive principle of a design problem as a situation attribute in order to mine the technical information of patent knowledge and establish a unified representation model of a patent. Li [13] extracted the functional information of a product as a situation attribute of the design problem and realized effective mining of tacit knowledge in patent knowledge. Trappey [14,15], Yu [16], and Lee [17] considered the trend in product technology evolution as situation attributes of design problems to forecast the direction of future technological improvement. Chen [18] considered scientific effects as situation attributes of design problems and deeply analyzed the working principle of technology in a patent. Cardillo [19] and Li [20] selected engineering parameters as situation attributes of design problems to analyze and solve technical conflicts in systems. However, the attributes of a complex system usually contain function, performance, relationship, and emotion. Integrity is lacking in establishing the mapping relationship between the design problems and patent knowledge from a single situation attribute. is limitation cannot support the systematic organization of patent knowledge from different aspects of a product design according to the need of the designers.
To support the complex mapping process between the design problems and patent knowledge, systematically organizing the patent knowledge according to different situation attributes is necessary. However, the workload of manual reading, labeling, and classifying patents according to the situation attributes is heavy, which greatly reduces the efficient utilization of patent knowledge. As a main tool for patent-text processing and automatic patent classification, computer natural-language-processing technology is an effective alternative to manual work to improve the utilization of patent knowledge. Ghareb [21], Labani [22], and Chen [23] proposed several methods for feature selection of patent texts, which could effectively support the attribute extraction of patent texts. Zhu [24] proposed an automatic requirement-oriented patent-classification method to better meet various patent-management requirements. Wu [25] proposed an automatic classification method based on selforganizing maps and support vector machine (SVM), which can help in effectively analyzing the quality of a patent. Lai [26] proposed a new approach based on cocitation analysis of bibliometrics to assess the similarity of patents to support the establishment of a classification system. Liu [27] proposed a hybrid patent-classification method to analyze query patents and effectively predict their classes. Chen [28] proposed a novel three-phase categorization method that could classify patents down to the subgroup level with reasonable accuracy. However, most of the aforementioned studies were limited to the analysis of patent text or effective extraction of key words to improve the accuracy of automatic patent classification. On the other hand, there were only a few studies on how to improve the accuracy of automatic patent classification from multiple dimensions of the whole system, including the function, performance, relationship, and emotion.
A method for patent-knowledge representation and automatic patent classification based on situation mapping is proposed in this paper. According to the types of design problems, we described the problem situation in terms of four dimensions, namely, function, performance, relationship, and emotion, and extracted the granularity attributes from the different situation dimensions. We established a structured attribute database and a mapping relationship between the problem situation and patent knowledge based on the semantic similarity of the situation attributes. In addition, an automatic classification method of patent information based on situation attributes was proposed, and an experiment on automatic patent classification using situation attributes as categories was carried out using a computer. e established classifier in the experiment was used to classify a large number of unknown patent texts when the classification results satisfied the general use. e application system of the patent knowledge was developed by the proposed method, which realized effective use of patent knowledge in solving innovative design problems.

Mapping Process between Problem Situation and Patent Knowledge
Because of the diversity and complexity of the problems in different design phases and different fields, extracting the problem situation attributes from different dimensions is necessary to precisely establish the mapping relationship between the problem situation and patent knowledge. In addition, the process of creatively solving a problem involves concretization of the problem situation, which leads to the required patent knowledge that belongs to different abstract granularities in different design phases. erefore, extracting the different granular attributes according to the abstraction level of the problem situation is significant in order to establish a complete mapping relationship in the solution process. Next, we present the mapping process between the problem situation and patent knowledge based on multidimensional and multigranular situation attributes.

Mapping Process between the Problem Situation and Patent Knowledge Based on Multidimensional Attributes.
According to the user's requirements of design [29], we select function, performance, relationship, and emotion as the attributes of the problem situation. e attribute of the function reflects the ultimate purpose of the product design or the initial design requirements, and it reflects the embodiment of the value of the technology. e performance attribute focuses on solving the core problem to improve product performance and perfect the whole system. e relationship attribute aims to build up or cut off the connection among the product components to make the whole system more integrated and perfect. e emotion attribute is mainly used to satisfy the emotional or spiritual desire of the designers by changing the appearance or expression of the products. e mapping process between the problem situation and patent knowledge based on the situation attributes from these dimensions is shown in Figure 1.

Mapping Process between the Problem Situation and Patent Knowledge Based on Multigranular Situation
Attributes. Granulation is a method of summarizing knowledge, and granularity is a measure of knowledge abstraction [30]. Multiple characterizations of the design problems represent the process of gradually clarifying the problem situation. Rough situation attributes are used to find inaccurate solutions in patent knowledge. When the problem constraints are more specific about the problem situation, the more refined situation attributes are used to search for more accurate solutions in the patent knowledge.
is interactive process is continued between the problem and knowledge spaces until the exact solution is obtained. For example, the situation attribute can be considered as "separating substances" extracted from the design problem of "how to clean stains on clothes." According to different solutions, "separating substances" can be defined as "removing substances" or "decomposing substances." Further, "removing substances" can be separated into "removing solids," "removing liquids," "removing gas," and so on. e specific mapping process is shown in Figure 2.

Normalized Representation of Problem Situation and Extraction of Attributes.
Because of the various representations of a problem situation, great difficulties are encountered in the extraction of situation attributes. erefore, describing the problem situation in a standardized manner is necessary. According to the cognitive mechanism of human beings and the characteristics of the problem situation, the form "the object is ? and the operation to the object is ?" e combination of "operation + object" can be extracted as a situation attribute. For example, in the "Let clothes clean" event, the problem situation is described by "the object of operation is clothes, and the operation used to reach the goal is washing." e situation attribute is "washing + clothes." e performance dimension mainly aims to resolve the technical conflicts of the whole system to improve the product performance.
us, the form "to improve the performance of products, the parameter needed to be optimized is ?, and the parameter that becomes worse is ?" can describe the problem situation. e optimized and worse parameters are extracted as situation attributes. For example, in the event of "improving aircraft flight stability," the problem situation can be described as "to improve the performance of products, the parameter needed to be optimized is the intensity, and the parameter that can worsen the situation is the weight." e intensity and weight parameters are extracted as situation attributes. In the relationship dimension, because any unit of a system is composed of two components and the components interact, the method of material field analysis can be used to optimize the interaction among components of the products. "Component 1 (role sender) is ?, component 2 (role receiver) is ?, and the interaction between them is ?" is used to describe the problem situation. e two components and their interaction are extracted as situation attributes. For example, in the "bearing failure because of dust" event, "component 1 (role sender) is dust, component 2 (role receiver) is bearing, and the interaction between them is harmful effect" is used to describe the problem situation. Dust (component 1), bearing (component 2), and harmful effect are extracted as situation attributes. From the emotional dimension, a concise emotion vocabulary is provided to describe the appearance or expression of products, and they can accurately reflect the feelings of the designer about the product. " e feeling got from the product is ?" can be used to describe the problem situation, and the feeling is extracted as a situation attribute. For example, in the "designing a gorgeous dress" event, "the gorgeous feeling obtained from the product is ?" can be used to describe the problem situation, and gorgeous feeling is extracted as a situation attribute. e normalized expression of the problem situation and the extraction of situation attributes are shown in Figure 3.

Establishment of the Structured Attribute Database about the Problem Situation
To realize the complex mapping relationship between the problem situation and knowledge patent, establishing a structured attribute database from different dimensions and different abstract granularities is necessary. From the function dimension, the functional basis is selected as the function attributes according to Hertz's [31] combination and classification of functions and flows. As a type of standardized representation of a function, a functional basis is highly abstracted and integrated. It is divided into two layers. e first layer consists of 11 categories, namely, shift, regulate, absorb, combine, detect, stabilize, import, accumulate, output, produce, and separate. e second layer consists of 52 subcategories, which include stabilizing the motion parameters, stabilizing the process parameters, stabilizing the geometric parameters, and so on. Figure 4 shows these layers. e performance dimension is divided into two layers according to the technical contradictions in TRIZ theory. e first layer consists of 39 categories, including speed, force, power, reliability, and so on. e second layer consists of 102 subcategories, including angular velocity, linear : the feeling got from the product is ?
Situation representation: to improve the performance of product, the parameter needed to be optimized is ?, and the parameter which will be worse is ?   velocity, ampere force, Coulomb force, and so on. ese layers are shown in Figure 5. e relationship dimension is divided into two layers according to substance-field analysis in TRIZ theory. e first layer consists of four categories, namely, incomplete, harmful integrity, insufficient integrity, and excess integrity models. e second layer contains approximately 25 subcategories, including the following: the model lacks component 1, the model lacks component 2, the model of component 1 is harmful to component 2, and so on. ey are shown in Figure 6. e emotion dimension is first divided into two layers in this paper. e first layer consists of 42 categories, which include color, harmony, stabilization, and so on. e second layer consists of 162 subcategories, which include fashion, simplicity, appropriateness, brightness, maturity, and so on. Figure 7 shows these layers.
To facilitate computer storage of the patent knowledge, structuralizing the situation attributes is necessary. e set of situation attributes for each patent can be represented as follows: where PC is the patent knowledge and F(function), P(performance), R(relationship), and E(emotion) are the four dimensions of the situation attributes. e function attributes can be represented as F � [y 1 , y 2 , · · · , y n ] and y i � (x 1 , x 2 , · · · , x n ), where y i and x i are the situation attributes of the first and second layers, respectively. e attributes of performance, relationship, and emotion can be represented in the same manner.

Mapping Relationship between the Problem Situation and Patent Knowledge Based on the Attributes
According to the attribute database, a single mapping relationship between the problem situation and patent knowledge is created. To provide the designers with accurate and abundant patent knowledge resources related to their problems, deeply analyzing the problem situation and patent knowledge is necessary to understand the design intentions and to accurately introduce the patent knowledge to the designers as much as possible. Because the design-problem situation and patent knowledge are described in the form of natural language, they are very subjective and different. Establishing a coreference relationship among the situation attributes is an essential requirement to obtain the correlation of patent knowledge. Using the semantic characteristics of natural language is a reliable method for obtaining the relationship among the situation attributes by computing the semantic similarity. However, the selected semantic of words is closely related to the context, and some differences exist in different text environments. us, establishing the coreference relationship of the attributes may not possibly be obtained because of their context. Word2vec in Google was an efficient tool to represent words as numerical vectors in 2013 [32]. Its CBOW (Bag-of-words model) could generate high-dimensional feature vectors of a target word according to the frequency of the words near the target word in the text. A feature vector can effectively reflect the semantic weight of the target words in the text. us, the semantic relationship of the target words can be obtained by calculating the cosine similarity of the feature vectors of the different target words. erefore, the Word2vec tool is used to establish the coreference relationship among the situation attributes to obtain the correlation of the patent knowledge and effectively expand its application scope. e situation attributes of the design problem can be obtained from the designers. However, the situation attributes of the patent knowledge cannot be directly obtained from the patent knowledge until deep reading has been performed by an expert group. Because of the large number of patents and their quick updates, the manual deep-reading method and patent classification greatly affect the efficiency of patent-knowledge utilization. According to the characteristic of natural language, computer natural-language-processing technology is used to extract the representative feature words from the patent texts, and an appropriate classification algorithm is selected to automatically classify the patent text using situation attributes as categories. Because the situation attributes of products or systems include four categories, namely, function, performance, relationship, and emotion, they are described in the form of natural language, and all of them can be realized by computer natural-language-processing technology. Because of space constraints, this paper presents only the situation attributes from the function dimension as an example to show the experimental process of automatic classification of the patent text. Other attributes are processed in the same way.

Automatic Patent Classification Based on Function Attributes
According to the general method of automatic classification using computer natural-language-processing technology, we propose the automatic classification process based on the situation attributes, as shown in Figure 8. According to the international classification standard (IPC), Chinese patents related to engineering technology are downloaded, including operation, transportation, mechanical engineering, weapons, blasting, lighting, heating, electricity, and so on. Situation attributes are extracted from the patent and manually labeled in different dimensions and different abstract granularities. ese labeled patents constitute the text set of automatic classification experiments, and the text set is divided into training and test sets using a certain separation ratio. Feature words, which are representative of the patent information, are selected from the training and test sets. eir feature vectors are obtained by calculating the word frequency in the text set. en, appropriate classificationtraining methods are selected to develop a classifier. e test set is classified by the classifier, and the results are compared with its manual labels. us, the classifier accuracy can be evaluated. When the classification results finally match the defined target, the classifier can be applied to the classification of any unknown patent. e details of each step will be explained in the later sections.

Patent-Text Preparation.
Because the patent text contains a lot of information, including the patent number, title, abstract, claim, description, and so on, no unified conclusion can be arrived at regarding which parts are currently appropriate to be a representative of the patent text. e title and abstract of a patent summarize the main design problems and invention contents of the patent, covering the core creativity and novelty information of the patent. In addition, Zhang [33], Liang [34] have tested "title + abstract" as the sample of patent information extraction, and the results show that "title + abstract" can effectively express the main creative information of a patent. erefore, this paper considers that the form of "title + abstract" can better meet the extraction of situation attributes. 843 samples of invention patents are randomly downloaded from a Chinese

Feature Selection.
To select the feature words that can represent the text information, the text needs to be divided into a single word. e mature open-source Chinese lexical analysis system ICTCLAS [35] is used to segment the representative of the patent title and abstract. Considering the small number of patent samples and the limited length of each text, the feature selection method of the term frequency-inverse document frequency (TF-IDF) [36] is selected to calculate the weight of each feature word in the text. According to the sequence of word weights, the top 80% of the words are selected as feature words in the experiments. TF-IDF evaluates the importance of a word or phrase to a text by counting its frequency. If a word or phrase frequently appears in a text and rarely in the other texts, we consider that the word or phrase has a highly distinguished ability among the categories and is suitable for classification. e formula for calculating the eigenvalues of TF-IDF is expressed as follows: where tf(t j , d i ) is the term frequency and idf(d i ) is the inverse text frequency.
where m ij is the number of times feature word t j appears in text d i , k is the text serial number, and m ik is the total number of times the words appear in text d i .
where N is the number of all texts and n represents the number of texts that contains term t j . TF-IDF tends to filter out the common words and reserve important words.

Vectorization of the Patent Text.
After the feature words are selected, the eigenvalue calculated in the feature selection is used as the weight of the feature word.
us, text d i � (t 1 , t 2 , · · · , t n ), which is expressed as d i � (t 1 , w 1 ; t 2 , w 2 ; · · · ; t n , w n ), is simply expressed as d i � (w 1 , w 2 , · · · , w n ). e feature vectors of all texts form the spatial eigenvector of the whole text set.

Classification Experiment and Result Analysis.
e 11 function attributes, namely, shift, regulate, absorb, combine, detect, stabilize, import, accumulate, output, produce, and separate, are selected as classification categories, and the text set is divided into two types of ratios: 80% and 20% training and testing sets, respectively, and 70% and 30% training and testing sets, respectively. Different textlearning algorithms are used to establish the classifier in the training set. e classifier is tested in the test set, and the classification accuracy is calculated to analyze the experimental results. At present, various classificationlearning algorithms are available, including the K-nearest neighbor, SVM [37], naive Bayesian, multilayer perceptron (MLP) [38], and so on. Because of the differences in the experimental sets, no consistent conclusion can be arrived at to identify which classification algorithm is more effective. e present study selects the MLP and SVM in carrying out the experiments. e harmonic mean F1score of P (precision) and R (recall) is used as evaluation criteria. P refers to the number of correctly classified texts divided by the number of classified texts, and R refers to the number of correctly classified texts divided by the actual number of texts in the test set. e formula is expressed as follows: e classification results in each category are shown in Figures 9 and 10.
From the experimental data, the following conclusions are obtained. Reading by expert group Figure 8: Process of automatic patent classification based on the situation attributes.

Mobile Information Systems
(1) When a classifier is established using the SVM algorithm, the accuracy of F, except that in the three categories of produce, absorb, and combine, is approximately 50%. e accuracy of F in the other category is as high as 83%. Compared with the random classification, F has significantly higher accuracy, which indicates the feasibility of the computer natural-language-processing technology and the big advantage of automatic classification of the patent text.
(2) e accuracy of F is higher when the text set is divided into 80% training set and 20% test set than that when the text set is divided into 70% training set and 30% test set, except for the combine, accumulate,  is result indicates that properly increasing the proportion of training set in the process of automatic patent-text classification can effectively improve the classification accuracy.
(3) Comparison of the accuracy of the two classification algorithms clearly shows that the accuracy of the SVM algorithm is significantly higher than that of the MLP algorithm except for the produce and combine categories. is result indicates that the SVM algorithm is better for automatic classification of the patent text.
At present, the accuracy of the patent automatic classification is not ideal. But the impact of different classification algorithms on the classification results will provide the basis for the subsequent establishment of a larger range of corpus, so as to effectively improve the accuracy of patent classification.

Development of Patent-Knowledge System Based on Situation Attributes
A total of 65548 patent texts (in the form of "title + abstract") are automatically obtained from a professional patent database using a Web crawler program. Using the situation attributes (from the four dimensions of function, performance, relationship, and emotion) as categories, these patent texts are classified by the automatic classification method proposed in this paper, and a classification set is established.
To analyze the problem situation and retrieve the analogical knowledge in the design process, a patent-knowledge-retrieval system based on situation attributes is developed using Java in the Eclipse rapid development platform. e users can describe the problem situation using natural language from the four dimensions: function, performance, relationship, and emotion. For example, the users can input the situation attribute in the form of "operation and object" about the design problem from the function dimension. When the situation attribute input matches one of all the situation attributes in the attribute database, the system pushes the designer patent texts that belong to the situation attribute. When the situation attribute input cannot match any situation attribute in the attribute database, the system immediately calculates the semantic similarity between the input and each situation attribute in the database and pushes three to five situation attributes that are most similar to the input. e users can choose one situation attribute, and the system pushes the patent texts that belong to the selected situation attribute to the users. For example, in the matter of the problem of Sewage treatment in the manufacturing industry, the situation attribute of "treat + sewage" can be extracted from the function dimension. en, the user inputs "treat" and "sewage" to text boxes, respectively. Next, the user needs to choose one situation attribute from five as required. Finally, the tool provides 810 patents for the user and the patents' categories include lighting, heating, electricity, and so on, which can effectively inspire the user to complete the innovative design process. e main interfaces of the tool's application process are shown in Figures 11 and 12.

Conclusion
is study investigates a method of patent-knowledge representation and automatic classification based on situation mapping. e method can effectively assist designers to obtain patent knowledge related to their design problem so that they can develop an innovative design process. e main content includes the following: (1) On the basis of situation attributes, the mapping process between the problem situation and patent knowledge is developed, and the multidimensional and multigranular attribute database is established. (2) e mapping relationship between the problem situation and patent knowledge is formed, and the coreference relationship among the situation attributes is established. (3) An automatic patent-classification method based on situation attributes is proposed, and using the classification method, an application system of patent knowledge is developed.
Because of the complexity and diversity of problem situations, the mapping relationship between the problem situation and patent knowledge needs to be improved further. Related research will be carried out on the following four aspects: (1) Further research is to establish more abundant situation attributes and coreference relationship, so as to form various mappings between the design problem and patent knowledge. (2) It is necessary to propose a more reasonable and efficient automatic classification method, so as to improve the classification accuracy of Chinese patent or English patent. (3) e next step of research will consider how to get the weight distribution of situation attributes according to user requirements.

Mobile Information Systems
Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors have no conflicts of interest to declare.