The cultural element is the minimum unit of a cultural system. The systematic categorizing, organizing, and retrieval of the traditional Chinese cultural elements are essential prerequisites for the realization of effective extracting and rational utilization, as well as the prerequisite for exploiting the contemporary value of the traditional Chinese culture. To build an objective, integrated, and reliable classification method and a system of traditional Chinese cultural elements, this study takes the text of
Traditional Chinese culture was formed by the precipitation and accumulation of psychological and behavioral characteristics in its long history [
Cultural elements are the basic units that constitute the cultural system. The use of traditional Chinese cultural elements has been increasingly valued by scholars who are in the field of cultural creation and management. Incorporating excellent elements of traditional Chinese culture in production, creative design, urban/rural planning, and construction will not only gain wider audiences and greater economic returns but also promote the creative transformation and development of traditional Chinese culture. Furthermore, it is an essential spirit in the “Notice on building a National Cultural Big Data System” by the Publicity Department of the Central Committee of the Chinese Communist Party. Finally, it becomes the consensus of industries and fields such as film and television creation, animation creation, industrial design, tourism design, and architectural design [
Under the economic environment of the network platform, classifying, organizing, and retrieving of Chinese cultural elements have been a prerequisite for effective extraction, rational usage, and finally modernizing traditional cultural elements. There are still some deficiencies in this research aspect. First, scholars have not yet made agreements on the definition and classification of traditional Chinese cultural elements. Second, in terms of the classification of traditional Chinese cultural elements, most studies adopt a mixed method of subjective classification or natural language processing and qualitative analysis. There is a need for a reliable way to analyze the data only using the quantitative data analysis method. Also, the current classification is based on cultural element themes, while it lacks a system that illustrates relationships between categorizations, and further reshapes the picture of traditional Chinese cultural structures.
In China’s long history, a vast number of books and classics have been accumulated and preserved in the written form to these days. They are the most direct vehicles for analyzing traditional Chinese culture. The systematic classification work of these classics has made some progress [
This study proposes that the Chinese traditional cultural system is a set of material and nonmaterial element systems formed in the long-term survival practice. Traditional Chinese cultural elements are the basic unit of material and nonmaterial carriers that consist of the above system. This research proposes a theoretical method that directly extracts a complete classification of cultural elements from traditional Chinese cultural classics textrual data sources using natural language processing techniques, complex network model with its community detection method. The whole process leads to an objective, complete, and valid classification system of traditional Chinese cultural elements.
Based on the literature review of previous studies, this paper is organized as follows: Section
Although research regarding cultural elements’ extraction has made much progress, there is still a paucity of work, especially in establishing an objective, complete, and reliable classification method and system for traditional Chinese cultural elements. For example, Zhou et al. propose that cultural elements are condensations of cultural characteristics [
Corrêa et al. use a bipartite graph to represent the semantic structure of the text. They construct edges of the bipartite network between target words and feature words of target word contexts, and ignore the relationship between the feature words. The network model constructed in this way is applied to achieve the word sense disambiguation task, which shows good results and robustness in the case of small samples [
The dataset includes two parts. (1) All text of Taiping Imperial Encyclopedia and (2) The China Biographical Database, supporting word segmentation in the process of language processing of Taiping Imperial Encyclopedia.
The book—
Electing the
The
Four steps in this construction process are OOV detection, word segmentation, themes of cultural elements extraction, and association. Mutual information and adjacency entropy are employed for rule-based OOV detection. Then, the THULAC is used to achieve word segmentation of text chapters. The TF-IDF algorithm is applied to extract the keywords of cultural elements’ topics. Finally, the association of keywords was obtained from the Ochiia co-occurrences coefficient calculation. The construction process of the complex network of traditional cultural elements is shown in Figure
The construction process of the complex network of traditional cultural elements.
Since there is no space between Chinese characters, it needs to use word segmentation tools to segment words before keyword extraction. No matter how large a dictionary is used to train word segmentation tools, a certain proportion of words in actual word segmentation are outside the dictionary, which are called Out-Of-Vocabulary (OOV) [
Each word is an independent linguistic unit. For words consisting of more than two characters, there are associations between characters. A stronger association infers a higher possibility that these characters will form a word. Mutual information can be used to evaluate and quantify this association between characters [
Besides using the mutual information to measure the internal cohesiveness, the adjacency entropy is also applied to facilitate the match of word boundaries. Entropy is the measurement of uncertainty, while information entropy is used to quantify the uncertainty of information. In general, a higher value of adjacency entropy implies that neighbor words of a character or a string are more diverse, and therefore more likely to be the word boundary. Adjacency entropy can be directionally categorized into left entropy and right entropy. For instance, the calculation method of left adjacency entropy is
Consequently, by comparing the
A randomly selected part of OOVs in
Num | Category | OOVs |
---|---|---|
1 | Names, ceremony, family names | Shangqing, Yishao, Fuxi, Diku, Erzhurong, Hou’andu, Taishang, Zaofu, Huanwen, Yuanjun, Anlingchan, Cuihong, Shaokang, Caogong, Erzhu, Siming, Wuding, Chiyou, Dingji, Hebo, Zhumeng, Jieli, Shikuang, Baosi |
2 | Ceremony, system | Dixia, Qingxu, Baowei, Qiansui, Jiaosheng, Jiaosi, Zhanshuai, Xiangxing, Yuelv, Ziyi, Shizhen, Liudian, Mingjing, Tuntian, Fengchan, Yudie, Yuanqiu, Jieyu, Lushan |
3 | Direction, geographical names, astronomy | Sufang, Jinyang, Lelang, Fuyu, Qianmu, Yanzhou, Gaochang, Fengxiang, Wusun, Jinxian, Fusang, Wangshi, Tuguhun, Maotou, Huoshan, Chenliu |
4 | Plants, animals, medicinal materials | Juyou, Zhongru, Qiongqiong, Zishiying, Longxu, Zhuyu, Junyi, Luanniao, Qilin, MaifanQingling, Xuanniao, Yushi, Pichao, Qicao |
6 | Official, government office | Shilang, Zongzheng, Taifu, Bieji, Yuren, Duzhi, Zuocheng, Liangzhoumu, Sikong, Duhu, Taiguan, Dajiangjun, Taishi, Huangmen, Shaofu |
7 | Utensils | Tuohu, Tanqi, Chupu, Shuijing, Gongshi, Guizan, Hunyi |
8 | Books, work of art | Yifu, Wenshi, Yuanri, Ruiyingtu, Tianyujing, Shangzi, Qingying, Hetu |
9 | Decade of a century, solar term, season | Houwei, Xiaojing, Jidong, Baique, Yiyue, Liuyue, Wugeng, Mingjia |
10 | Behavior | Baochou, Guqin, Xingjian, Chanyunyun, Shangshu, Gujiu |
As an unsupervised learning method, the accuracy and recall rate of the OOV detection based on mutual information and adjacency entropy still need to be improved. Therefore, a rule-based OOV detection method, as a supplement of the unsupervised method, is introduced to process texts of the
The hierarchical textual structure in
In this process, OOVs are detected via a hybrid method of mutual information, adjacency entropy, and rules. Moreover, words from the CBDB database, such as dynasties, era names, person names, place names, official names, and literary works are also selected. Taken together, OOVs serve as the customized dictionary for word segmentation, which consists of 631,522 words.
Three chapters, “Volume 213-Officials Section 11,” “Volume 782-Barbarian Tribes Section 3-East Barbarian Tribes Subsection 3,” and “Volume 43-Earth Section 8” from the
Artificial word segmentation and word annotation serve as the standard. Then, automatic segmentation and annotations by applying the THULAC with and without customized dictionaries are compared and validated. The THULAC (THU Lexical Analyzer for Chinese) is a Chinese lexical analysis toolkit developed by the Natural Language Processing and Computational Social Science Lab in Tsinghua University [
It is noticeable that segmentation with customized dictionary outstrips that without across all three chapters. All three metrics (Precision, Recall, and F-measure) remarkably improve in each section: in the Earth Section (from 0.55, 0.49, 0.50 to 0.78, 0.72, 0.73, respectively); in the Official Section (from 0.53, 0.50, and 0.50 to 0.72, 0.70, 0.70, respectively); in the East Barbarian Tribes Subsection (from 0.40, 0.29, 0.33 to 0.75, 0.65, 0.67, respectively). Considering the three indicators and the tolerance of large-scale text data to noisy data, the THULAC tool with a custom dictionary can meet the research needs. The word-segmentation results of sample texts with the THULAC NLP tool are shown in Table
Word-segmentation results of sample texts with the THULAC NLP tool.
Metrics | No custom dictionary | Custom dictionary | ||||
---|---|---|---|---|---|---|
Earth | Dongyi | Officials | Earth | Dongyi | Officials | |
Precision | 0.55 | 0.40 | 0.53 | 0.78 | 0.75 | 0.72 |
Recall | 0.49 | 0.29 | 0.50 | 0.72 | 0.65 | 0.70 |
F-measure | 0.50 | 0.33 | 0.50 | 0.73 | 0.67 | 0.70 |
Word-segmentation results of typical data source with the THULAC tool.
Number | Section | Word-segmentation examples |
---|---|---|
1 | Philosophy writing (Zi) | Lizhe^ti^ye^renqing^you^aiyue^wuxing^you^xingmie^gu^li^xiangyin^zhi^li^shizhong^zhi^ai^hunyin^zhi^yi^zhaopin^zhi^biao^zunbei^shangxia^you^ti |
“Propriety is the decency. Human emotions have sorrow and joy; the five elements have rise and fall. Therefore, it sets up feast customs, funeral customs, marriage customs, and an imperial appointment system. Everything is in hierarchical order, from top to bottom” | ||
2 | Heaven | hetuditongji^yue^yuzhe^tiandi^zhi^shi^ye |
dunjiakaishantu^yue^huoshan^nanyue^youyun^shi^yuhu | ||
In | ||
3 | Earth | huayangguozhi^yue^jianwei^nan’an^xiannan^you^emeishan^qu^xian^bashili^ditu^yun^you^xianyao^hanwu^qiu^bu^neng^de |
In | ||
4 | Criminal law | duiyue^chen^shao^ye^song^shi^yun^putian^zhi^xia^mofei^wangtu |
shuaitu^zhi^bin^mofei^wangchen^jin^jun^tianzi^ze^wo^tianzi^chen^ye | ||
Answered “I have been familiar with the “Book of Songs” since childhood, and there is a poem in the book saying: “the whole world, is it the king’s soil; all people on the Earth, are the king’s ministers”. Now that the king of Zhou is the ruler of the world, then I am a minister of the emperor” | ||
5 | Officials | Shiji^yue^Fansui^shuo^Qinzhaowang^yue^Wuzixu^Tuozai^er^chu^zhaoguan^ye^xing^er^zhaofu^zhiyu^Lingshui^wuyi^huqikou^xixing^pufu^Bai^jishou^routan^gufu^chuixiao^qi^yu^wushi^zu^wuguo^helu^wei^bo^shichen^de^jin^mou^ru^zixu^jiazhi^yi^youqiu^zhongshen^bu^fu^jian^shi^chen^zhi^shuo^xing^ye |
The |
Different keyword extraction algorithms can meet the needs of different scenarios. For example, the method based on intermittency is especially suitable for short text and single document text [
The TF-IDF algorithm considers both the word frequency and the reverse document frequency: from the perspective of word frequency, the higher the frequency of a word appearing in a single document, the more prominent the topic is represented by the word; from the perspective of reverse document frequency, it is considered that a word appears in all documents. The frequency of occurrence is high, the general importance of the word is high, and the topic represented is less significant [
Let
At last, the weight coefficient matrix of the theme collection of cultural elements
Keywords extraction of topics provides candidate nodes for the complex network of Chinese traditional cultural elements. This section will focus on the association between extracted keywords to establish complex relationships between nodes and reflect the relevance of the topic of traditional cultural elements. The network of traditional Chinese culture element
In the present study, it is set that
Several main statistical characteristics, including average degree
The average path length is also known as characteristic path length, which is defined as the average number of edges in the shortest paths between all vertex pairs, given by
In this study, the average path length
The clustering coefficient of a vertex in the network quantifies how close its neighbors are. In this study, the clustering coefficient of a Chinese traditional cultural element represents the extent to which its neighbor cultural elements tend to cluster together. A large clustering coefficient resulting from closely related cultural elements indicates a specific topic has emerged. A clustering coefficient of a vertex can be expressed as
The main statistical characteristics of the complex network of traditional Chinese cultural elements are summarized in Table
The main statistical characteristics of the complex network of traditional Chinese cultural elements.
Statistical characteristics | Vertex | Edge | ||||
---|---|---|---|---|---|---|
Value | 10423 | 68923 | 13.23 | 3.41 | 9 | 0.69 |
Degree distribution
Figure
Log-log scatter plot of degree distribution for the complex network of Traditional Chinese cultural elements.
Scale-free networks have broad implications for the structure and dynamics of complex systems, one of which is the heterogeneity of vertex. That is, minority high-degree core nodes are in the dominant position while low-degree vertices are located to the periphery of the network. Applying this perspective, the influence of Chinese traditional cultural elements is heterogeneous, and most of the elements only appear in this cultural system. However, some elements appear frequently together with the above elements, and it seems that their influence is always reflected everywhere. These elements constitute the core elements of Chinese traditional culture. The top 30 cultural elements with the highest degree and dominant effect in the Chinese traditional complex network are listed in Table
Top 30 cultural elements with highest degree.
No. | Element | |
---|---|---|
1 | 2453 | |
2 | 1638 | |
3 | 1412 | |
4 | 1394 | |
5 | 1343 | |
6 | 1270 | |
7 | 1206 | |
8 | 989 | |
9 | 933 | |
10 | 913 | |
11 | 902 | |
12 | 867 | |
13 | 752 | |
14 | 711 | |
15 | 700 | |
16 | 682 | |
17 | 679 | |
18 | 617 | |
19 | 608 | |
20 | 603 | |
21 | 602 | |
22 | 572 | |
23 | 547 | |
24 | 520 | |
25 | 513 | |
26 | 506 | |
27 | 486 | |
28 | 486 | |
29 | 452 | |
30 | 446 |
Clustering coefficient-degree correlation of the network indicates the relationship between degree and clustering coefficient of the node. Generally, provided that in scale-free networks, the clustering coefficient of a node with degree
The scatter distribution of the clustering coefficient of the node
The scaling of
Community detection in complex networks as employed in this study can provide a method to distinguish the topics of traditional Chinese cultural elements from each other. Community and its detection have been widely applied in social networks and other complex network analyses. In a complex network, communities are defined as the sets of nodes where each set of nodes is densely interconnected while sparser connections between the sets [
Figure
Dual-line plot with modularity (on left
Size, proportion, and cumulative proportion of communities (topics) in the complex network of traditional Chinese cultural elements.
No. | Size | Proportion (%) | Cumulative proportion (%) |
---|---|---|---|
1 | 1548 | 14.85 | 14.85 |
2 | 1138 | 10.92 | 25.77 |
3 | 1111 | 10.66 | 36.43 |
4 | 1059 | 10.16 | 46.59 |
5 | 1030 | 9.88 | 56.47 |
6 | 798 | 7.66 | 64.13 |
7 | 690 | 6.62 | 70.75 |
8 | 618 | 5.93 | 76.68 |
9 | 460 | 4.41 | 81.09 |
10 | 425 | 4.08 | 85.17 |
11 | 419 | 4.02 | 89.19 |
12 | 269 | 2.58 | 91.77 |
Others | 858 | 8.23 | 100 |
Communities and topics in the complex network of traditional Chinese cultural elements.
Typical elements in traditional Chinese cultural element complex network’s topics. (a) Confucianism. (b) Taoism. (c) Military and diplomacy. (d) Social rule. (e) Political life. (f) Cosmic astrology. (g) Natural disaster. (h) Landforms and geography. (i) Exotic customs. (j) Craft artifacts. (k) Myths and legends. (l) Natural herbs.
Names and hierarchical structures of cultural topics.
Hierarchy | Name | Typical elements |
---|---|---|
Values | Confucianism | Figure |
Taoism | Figure | |
Institutional behavior | Military and diplomacy | Figure |
Social rule | Figure | |
Political life | Figure | |
Symbols | Cosmos and astrology | Figure |
Natural disaster | Figure | |
Landforms and geography | Figure | |
Exotic customs | Figure | |
Craft artifacts | Figure | |
Myths and legends | Figure | |
Natural herbs | Figure |
The Force Atlas algorithm lays out the nodes according to the extent of interdependence between correlated nodes. Thus, nodes with stronger correlations and the communities formed by these nodes are closer to each other in the performed layout. More specifically, colored communities and cultural topics in close positions imply a strong correlation between them, as in Figure
Edgar Schein believes that culture serves the purpose of external adaption and internal integration [
As shown in Figure
In contrast to Confucianism, however, Taoism is marginalized and located in the quadrant II. Its position reveals its emphasis on the relationship between humans and nature that spontaneously formed from social life; the so-called “People follow Earth which follows Heaven; Heaven follows Dao which follows the Nature.” Spontaneous formation refers to Taoist concepts and elements based on myths and legends that are sublimated from craft artifacts and survival skills in people’s everyday lives. From this perspective, craft artifacts are the materialization of social life, while myths and legends are the sublimation of social life and craft artifacts. Furthermore, Taoism is the theoretical origin of the myths and legends and connects social life to nature. The natural herbs topic, located right above quadrants I and II in Figure
On the right side of the natural herb theme, topics in quadrant I involve the natural and exotic environments that affect people’s activities. The topics of cosmos and astrology narrate astronomical phenomena related to divination and disasters; the topic of natural disaster describes various catastrophic natural events, such as wind, rain, thunder, hail, and earthquake, which caused actual social, economic, and property losses. Both topics reflect that people tend to profit and avoid loss when natural environments change at that time. The topic of landforms and geography records mountains, rivers, and administrative divisions within the ruling class. One interesting outcome is that natural disaster is surrounded by the elements of the landforms geography topic due to the association between disaster and geographical conditions. Finally, the topic of exotic customs reports the ethnic minorities, external geographical environment, and foreign cultures outside the ruling region. Thus, although Buddhism has been introduced to China since the Han Dynasty and is inextricably linked with traditional Chinese culture in every aspect, the unshakable dominant role of Confucianism is not affected. Buddhism’s influences focus on external environment adaption of people but the penetration is less at the political and military levels.
The elements from military and diplomacy mainly occupy quadrant IV to coordinate relations with foreign ethnic minorities. These elements detail not only external aggressions, antiaggressions, conquest, negotiation, and indirect rule but also delineate military activities that occurred during dynasty alternation. Because as far as the ruling class is concerned, there is no essential difference between incursions and internal disturbance, as both require decisive intervention.
The third quadrant is primarily involved with the social rule, which aims at internal society coordination and government. This topics’ cultural elements include but are not limited to criminal names, laws, rituals, moral concepts, and specific events. On the other hand, the political life topic, settled at the center of quadrants III and IV, consists of cultural elements from the royal court, officials, government offices, important empresses, relatives, and politician domains. It serves to coordinate domestic affairs, military, and diplomacy, and reflects the highest form of social practice.
It is also found that there exists an intertwined relationship between political life and social rule. There are no apparent color divisions like those among other topics, which to a certain extent reflect the hereditary monarchy thought and governance models of the ruling class. On the right side of the abscissa, social rule results in the orderly progress of daily social production activities, and hence the topic returns to craft artifacts which stand for daily life.
Typical cultural elements of each topic are shown in Figures
Cultural elements are minimal units of a cultural system. The goal of this study was to build an objective, complete, and reliable categorization system of traditional Chinese cultural elements. Using
(1) The research randomly chose three text files and put them to test. An OOV detection method was used, which was based on mutual information and adjacency entropy to process all unregistered words that were retrieved from a custom dictionary. The custom dictionary was used for word segmentation of Temporal Imperial Encyclopedia. With the test on the THULAC NLP tool, the Precision, Recall and F-measure all improved, compared with that of no-custom dictionary being used.
The TF-IDF algorithm is used to extract keywords of traditional Chinese cultural element topics, and the Ochiia coefficient is applied to calculate relationships between keywords. Setting the number of texts as 1000, the importance threshold of topics as
(2) The fraction of nodes with degree distribution of the complex network of traditional Chinese cultural elements follow the power-law distribution, where
(3) The results of community detection suggest that 91.77% of the elements in the complex network of traditional Chinese culture are from the top 12 communities (cultural topics), which are Confucianism, Taoism, military and diplomacy, social rule, political life, cosmos and astrology, natural disaster, landforms and geography, exotic customs, craft artifacts, myths and legends, and natural herbs. These topics are ring-wise distributed in Force Atlas force-directed layout. At last, applied to an orthogonal coordinate system with the Means-Ends of the culture, each topic’s connotation and the relationships between topics are systematically explained.
This study provides a completely quantitative and reliable method of traditional Chinese cultural elements categorization. There are still a few limitations. (1) The textual data source is limited in
The data presented in this study are available on request from the corresponding author.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was funded by the National Key R&D Program of China (grant no. 2017YFB1400400), Youth Talent Promotion Program of Beijing Association for Science and Technology (grant no.2020-2022-16), Social Science Foundation of Beijing (grant no. 19GLC068), and Program for Promoting the Connotative Development of Beijing Information Science & Technology University (grant nos. 521201090A and 5026010961).