A Novel Coverless Text Steganographic Algorithm Based on Polynomial Encryption

Aiming at the problems of low text utilization rate and ambiguity of secret information extraction in the “tag+keyword” coverless information hiding methods, we propose a coverless information hiding method of Chinese text based on polynomial encryption in this paper. The secret data communication is realized by the analogously forged URLs transmission, which improves the security of mobile computing to avoid the risk that the carrier texts are attacked maliciously. The text utilization rate and retrieval success rate are improved by using tags to select multiple keywords to expand the number of keywords in the index table. The keyword’s location in secret message is obtained through text vocabulary matching, and the tag index and location information are encrypted by polynomial for secret information transmission. The split keywords of secret message can be transmitted in disorder. The experimental result shows that the secret message can be successfully extracted by sorting the extracted keywords according to the frequency and relevance of their combined tags. In our experiments with diﬀerent test databases, both the hiding success rate and the average hiding capacity are improved with higher test values than other existing methods.


Introduction
In recent years, smart terminal devices have been widely used. People can communicate with each other or send files in real-time by using some instant messaging apps, e.g., WhatsApp, Facebook Messenger, or WeChat. Most modern instant messaging applications have greatly facilitated human life communication. e researchers have paid more attentions on mobile computing, especially the security of mobile computing. e openness of Internet and the uncertainty of information dissemination have led to the frequent occurrence of citizen information leakage. According to the 46 th China statistical report on Internet development issued by the China Internet Network Information Center (CNNIC) [1], the number of Internet users in China had reached 940 million by the end of June 2020. In these Internet users, there were 20.4% netizens who had suffered personal information leakage, 17% netizens who had suffered online fraudulent, and 9.9% netizens whose user accounts with passwords had been stolen. In the report, the statistical number of the tampered websites in China was as high as 147,682 [1].
To ensure that information is not stolen or tampered with, the technologies of data encryption, intrusion detection, and information hiding are usually used to keep information secure and integrated [2]. For data encryption, there are various complex cryptography algorithms, which make the cipher text difficult to be cracked. However, the cipher text is always presented as some messy and unreadable codes, which is easy to attract the notice of attackers. Intrusion detection systems can prevent unauthorized users from accessing the system. However, the system information security of the legitimate users will be greatly harmed once the illegal attackers disguise themselves to invade the system and modify the data. Unlike data encryption and intrusion detection, the technology about information hiding or steganography is used for covert communication where the secret message is embedded into the cover work available for mobile communication. Especially for steganography, the technology has high imperceptibility, which means that it is hard for attackers to detect or remove the secret message. erefore, information hiding technology plays an important role in the research field of information security, involving secure data sharing and storage for mobile computing.
Conventional information hiding methods make use of the redundancy of digital carriers and the insensitivity of human vision to modify the carrier to achieve the purpose of embedding secret information. However, if the carrier is modified to carry some extra information, it is inevitable to leave traces, which means that the attackers can recognize or destroy the secret information by using some steganalysis tools. To solve the problem of low concealment and low security of the conventional information hiding methods, the coverless information hiding is proposed to share the secret information without modifying the carrier [3] between the sender and receiver. e secret message can be selected or generated from the shared carrier. Because of the characteristic of coverless, the carriers can resist most steganalysis methods and various malicious attacks.
According to the type of the digital cover, the steganography technology can be divided into four categories: image steganography, video steganography, audio steganography, and text steganography [4]. Among the four cover types, the text is frequently used in mobile communication with obvious advantages including small data amount, legibility, and fast transmission speed. erefore, the research on text coverless steganographic algorithm has become a hot topic with high theoretical significance and practical value. According to whether the cover text is modified or not, the steganography algorithms can be divided into "traditional steganography algorithm" and "coverless steganography algorithm" [5]. e text coverless steganography can be further divided into two categories: the methods based on carrier generation [6,7] and the methods based on carrier retrieval [8][9][10]. e steganography methods based on carrier generation need to establish a protocol about natural language text generation, which is shared by the two sides of communication. A steganographic text as the carrier is generated by the secret information control model. Fang et al. [6] used a new linguistic stegosystem based on a Long Short-Term Memory (LSTM) neural network to train language generation models, which achieved good concealment effects. Yang et al. [7,11,12] successively used Convolutional Neural Network (CNN), Markov Model (Markov Model), and Recurrent Neural Network (RNN) to extract text features and jointly trained the steganographic text generation model with Huffman coding. e steganography methods based on carrier retrieval need to retrieve message from the cover text following some rules and communicate by transmitting the cover text. Chen et al. [8] proposed a conception of "tag + keyword" as the core idea for hiding information without carrier text modification, where the tag is calculated and related with Chinese mathematical expression. e method has high concealment, but low hiding capacity, where one character is hidden in one text. Zhou et al. [13] improved the above method by combining multiple keywords hidden in one text to increase the hiding capacity. Liu et al. [14][15][16] used Chinese Pinyin and part of speech as tags and assigned keywords to each tag in a text. e proposed methods used language model of "Word2Vec" to expand the keyword set to increase the success probability of keyword retrieval. Zhang et al. [9] proposed the concept of "word rank map" to generate stegovectors for keyword retrieval. e method was improved to combine the frequent words hash against statistical analyzing attacks in [17] and combine the frequent words distance to increase utilization of retrieved text in [18]. However, the hiding capacity of the "word rank map" based methods in [9,17,18] is still low. In order to improve the coverless information hiding capacity, Ji et al. [10,19] constructed a mapping relationship between binary strings and the number of keywords to realize multiple keywords hidden in one text. e methods proposed above did not consider the distribution of tags in the text when selecting tags, which resulted in low word coverage under a single tag. Chen et al. [20,21] considered the uniform distribution of the parity of Chinese characters' Unicode encoding and designed a coverless information hiding method based on Unicode label. e method improved hiding success rate and hiding capacity, but it also needed a huge carrier text database to generate an index file, where the links for each tag are stored with only one selected keyword in a text. In addition, the transmitted secret message is directly hidden in the carrier text, which reduces the security of the method. Wang et al. [22] proposed a text coverless steganography method based on mathematical expression of Chinese character. Wang and Gao [23] further analyzed the parity of Chinese characters' stroke number and its statistical characteristic and generated a communication codebook based on the exploited "space mapping" concept to further improve the embedding rate.
By analyzing the recent existing works, there are still some drawbacks in aspects of text utilization, information extraction accuracy, hiding capacity, and hiding success rate in the retrieval-based coverless steganography algorithms. In this paper, we propose a novel coverless steganography algorithm based on polynomial encryption for hiding secret message by Chinese text. It uses web page text as carrier and constructs an index file based on "tag + keyword" methods to collect the keywords location in carriers. e index file is shared with both sender and receiver for keywords retrieval. e novelty of the proposed algorithm includes two points: (1) corresponding one tag to multiple keywords and (2) generating retrieval side-information encrypted by polynomial for secret information transmission. e rest of this paper is organized as follows: in Section 2, the related works about encryption and decryption method, index file structure, and the framework of "tag + keyword" based steganographic scheme are reviewed. Section 3 presents our proposed coverless text steganographic scheme. Section 4 illustrates the feasibility of our proposed steganography by analyzing the experimental results with comparison of other "tag + keyword" methods. is is followed by conclusion and future work in Section 5.

Related Works
To present enough background knowledge for our proposed scheme, this section briefly reviews the general idea of secret sharing based on polynomial interpolation and the framework of previous steganography scheme based on "tag + keywork."

Secret Sharing Scheme Based on Polynomial Interpolation.
A secret sharing scheme is to split a secret into n carry shares, which are used to reconstruct the original secret. Given a (k, n) threshold scheme and any shares less than k, nothing is learned about the secret. In our proposed steganography algorithm, Shamir's secret sharing scheme with threshold k � n is applied to encrypt our location side-information with various numbers of data. A dynamic key generation mechanism is established, where the number of keys is determined by the number of values in side-information.

Encryption Process Based on Polynomial.
e n-degree polynomial used for encryption is defined in f(x) � a 0 + a 1 x 1 + a 2 x 2 + a 3 x 3 + · · · + a n−1 x n− 1 + a n x n mod q. (1) In the encryption process, the plain text is the side-information with n integer values P � [p 1 , p 2 , . . . , p n ], which are used to replace the coefficients of the above polynomial. A set of n keys X � [x 1 , x 2 , . . . , x n ] is generated by a secret value s ∈ [0, q) under prime power q. And the first coefficient of polynomial is set to be a 0 � s. en, the polynomial is constructed as follows: (2) Finally, the secret shares as output P � [p 1 , p 2 , . . . , p n ] are calculated as the corresponding value of f(x i ) calculated by Equation (2).

Decryption Process Using Lagrange Polynomials for
Polynomial Interpolation. When the receiver gets the secret shares as cipher text, n equations are formed as follows: e plain text, n unknowns coefficients p 1 , p 2 , . . . , p n , can be decrypted using Lagrange polynomial interpolation method. e Lagrange polynomial is defined in (4), where the corresponding coefficients of calculated L(x) are the output plain text P � [p 1 , p 2 , . . . , p n ].

e Framework of General Coverless Steganographic Schemes Based on "tag + keyword".
e text coverless Steganographic schemes based on "tag + keyword" utilize the natural language processing technology to cut the secret message into different keywords and use big data retrieval technology to retrieve the stegotext. e general framework of "tag + keyword" based method is shown in Figure 1. e implementation process includes four main steps: building a large text database, generating index files, collecting the stegotexts, and extracting hidden information.
Step 1. Building a large text database Using web crawler technology, the text database can be quickly and accurately built by collecting different fields texts from different websites. e steganography scheme based on "tags + keywords" has to build a huge database with size of more than 10 GB in order to keep a higher hiding efficiency. ey also analyze the frequency of keywords and their combinations of each text, which can be used to select carrier texts more efficiently and scientifically.
Step 2. Generating index files e index file plays a crucial role in keeping the efficient retrieval. e index structure of the index file is inverted in the process of its creation process. ere are three levels in index structure. e first level is tag, which is used to locate the secret message word. All the texts in database are preprocessed, and the location tag is selected and retrieved according to the binary strings corresponding to the parity of character's Unicode. e first time for each location tag appearing in one text will be recorded with its keywords in the second level of index file and its "text path + file name" in the third level of index file. After all the texts are processed, the index file is finally created.
Step 3. Collecting the stegotexts In this process, the secret message is firstly split into keywords. For each keyword, a corresponding tag is combined with it, and the "tags + keywords" are retrieved in index file to obtain the available texts' paths. Select one text randomly and add it into the set of stegotexts, which will be sent to the receiver in sequence.
Step 4. Extracting hidden information e process of secret information extraction is opposite Step 3. When the receiver gets all the stegotexts, the corresponding tags are combined with the texts in sequence. For each text, the keyword can be extracted according to the corresponding location tag. After all the received texts have been processed, the secret message can be constructed by combining all extracted keywords in the correct sequence.

The Proposed Coverless Text
Steganographic Scheme

Research Motivation.
Although the existing methods based on "tag + keyword" have fulfilled the basic requirements for coverless text information hiding, there are still two main issues about their disadvantages worthy of attention.

Lower Text Utilization.
In order to ensure that the recipient can correctly extract secret keywords under the guidance of tags from the texts, one tag is only allowed to combine with one keyword in a fixed text during the stage of index file generation. It is possible that each tag appears multiple times in the following content of the fixed text and corresponds to multiple different keywords. In consequence, if there is more than one time for a tag appearance in the text, only the first appearance with its corresponding keyword will be kept and stored in the index file. In order to achieve a higher retrieval success rate, the capacity of text database needs to reach tens of GB or even hundreds of GB if the method based on "tag + keyword" is applied by using one-toone correspondence between tag and keyword. Furthermore, the experimental studies show that the occurrence possibility of one tag appearing multiple times in a text is extremely high. at is bound to ignore much more utilizable keywords in one text. And the utilization rate for one text is not high by using one-to-one "tag + keyword" based method.

Ambiguity in Secret Information Extraction Caused by Sending and Receiving Texts
Nonsequentially. e protocol of location-tags generation is agreed upon in advance by both communicating parties, which means that the order of the tags and their corresponding cover texts are fixed. If the sender does not send the texts in the order of secret information retrieval, or the receiver does not extract the secret information in correct order, or the texts transformation is affected by the channel delay, which results in receiving texts in disorder, all above cases can make the texts disordered, and the secret message extracted corresponding to ordered tags can make the information ambiguous.
To solve the problems mentioned above, we propose a novel coverless steganographic method based on polynomial encryption for Chinese characters. e testing text database was constructed by collecting texts on different web-pages. e uniform resource locator (URL) of the web page instead of web page text was stored in an index file. e polynomial coefficients are replaced by values calculated from all keywords with their corresponding tags and location index in one text. e polynomial encryption algorithm is applied to generate secret information as auxiliary information, which is attached to the end of the text URL. en, the new counterfeit URL will be sent to the receiver to realize the secret communication between two parties. In this method, one text can be utilized for hiding one or more than one secret keywords. Tags are not only used to localize the hidden keywords, but also used to improve the randomness of the algorithm, thereby improving the security of the algorithm. In addition, the proposed method uses Unicode Standard for character encoding, and the parity of encoded Chinese characters is utilized for tag retrieval. It has been proved in the literature [20,21] that the distribution of parity for the encoded web page texts is uniform, which provides a guarantee for uniformity of the generated index file and a high word coverage ability.

3.2.
e Framework of Proposed Method. e overall framework of our proposed coverless steganographic method based on polynomial encryption is shown in Figure 2. As shown in Figure 2, the proposed the method is mainly composed of three stages: index file generation, secret information hiding, and secret information extraction, which will be introduced in detail in Section 3.2.1, 3.2.2, and 3.2.3, respectively.

Index File Generation.
e index file generation is primarily prepared for the subsequent quick retrieval of secret information. e main steps in the process of index file generation include generating a tag-set, initializing an index structure, preprocessing the web page texts, selecting the keywords according to each tag, and adding the keywords and their corresponding web page links in the index table. Algorithm 1 describes the procedure of the index file generation in details, and Figure 3 shows a text processing example for keywords selection and index table expanding with the tag-length m � 4. e structure of generated index file is illustrated in Figure 4.  In the process of lines 6-14 in Algorithm 1, one tag with a specific value may appear more than one time for each one preprocessed text t i . As long as the keywords following the specific tag are not repeated, which means that the "tag + keyword" is unique, the searched keywords and their web page link will also be stored in the index structure under  Tag n their corresponding tag branch. After all texts in the web page text database have been processed, the index file will be generated by merging 2 m tag-index branches.

Secret Information Concealing.
In the process of secret information concealing, the secret message is firstly split into multiple keywords. All the keywords make up a Message-Keyword Set (S MK ). In the loop of processing "tag + keyword" retrieval, it is possible that some specific keyword, which is failed for once retrieval, is further split into smaller keywords. Meanwhile, the set S MK is updated, where the failed retrieved keyword is replaced by more smaller keywords with less Chinese characters. en, the split keywords and the remaining keywords in S MK continue to do the retrieval loop one by one, and S MK is continually updated. e retrieval loop continues until the keyword cannot be split further and there is no remaining keywords in S MK .
Index File is used for each time of retrieval. For each successful retrieval time in the loop, the retrieved "tag + keyword" and its corresponding web page links are recorded. When the loop is finished, all the recorded data sets are used to count up how many keywords in S MK appeared in every carrier text. e text, which has the highest number of keywords, is firstly chosen to extract location information with two location marks for each keyword. All the location marks are selected to make up a Side-Information set (S SI ) and to replace the polynomial coefficients for applying polynomial encryption once. en, the set with encrypted marks S EM is generated, and the encrypted marks as the confidential information are attached to the end of web page URL by some particular rules. After removing all the processed keywords in S MK , the text which has the next highest number of remaining keywords is chosen to repeat the same processing steps mentioned above. Finally, all the analogously forged URLs will be sent to the receiver for secret information transformation. e detailed procedure of secret information concealing is described in Algorithm 2.

Secret Information Extraction.
e process of secret information extraction is opposite to the process of secret information concealing. For each received URL, the recipient firstly split it into parts: the web page URL and the confidential information part. e carrier text is downloaded from the web page URL and preprocessed before keywords extraction. e confidential information can be decrypted to obtain the side-information by applying Lagrangian interpolation. en, we can extract the keyword(s) from the carrier text with side-information. It is possible that the secret message can be restored accurately, even if the sender did not send web page links in order or the receiver received the web page links in disorder. e whole extraction process works as follows: Step 1: according to the agreed protocol, a disordered tag-set L � [l 0 , l 1 , l 2 , . . . . . . , l 2 m −1 ] is generated firstly.
Step 2: when the receiver receives one or more URL messages, each URL is split into two parts: the webpage link and the confidential information part. From the web-page link, the corresponding carrier text is downloaded. From the confidential information part, the number of encrypted marks and the specific data for each mark can be analyzed according to the agreed rules. All the encrypted marks, which are calculated from one URL, constitute a set S EM . And we can judge the number of keywords hidden in the text of a webpage based on the number of elements in S EM . Input: web page text database T � [t 0 , t 1 , t 2 , . . . . . . , t n−1 ] and length of tags m Output: index file 1: Generate the tag-set according to the length of tags m. ere are totally 2 m tags in a tag-set, each of which is a unique binary string with m-bit length. 2: Initialize an inverted index structure for each tag. e index table is a three-level tree structure, where the first-level index is for tags, the second-level index is for keywords, and the third-level index is for web page links. 3: for each text t i , i � 0, 1, 2, . . . . . . , n − 1 do 4: Apply the preprocessing operations on each text t i in web page text database to remove all non-Chinese characters (e.g. numbers, mathematical letters, punctuation marks), and finally the preprocessed text t i with only Chinese characters is generated. 5: Each Chinese character is converted into 0 or 1 according to the parity of its Unicode encoding value. For each preprocessed text t i , a binary string S is generated according to the parity of character encoding. 6: for j � m, m + 1, m + 2, . . . . . . , length(S) do 7: Select the m-bit length binary string as the tag, tag � S(j − m: j − 1) 8: Select four characters in t i following the location index j. 9: Split the 4 characters into different keywords. 10: e first keyword is chosen to be combined with the tag. 11: if "tag + keyword" is unique then 12: e "tag + keyword" information details (including the web page link of t i ) are added into the inverted index structure to expand the index file. 13: end if 14: end for 15: end for ALGORITHM 1: Index file generation.
Step 3: according to the initial value, step size, known prime number, and S EM , the polynomial is reconstructed by applying the Lagrangian interpolation algorithm. e polynomial coefficients can be calculated and constitute the side-information set S SI as output of decryption algorithm.
Step 4: obtain the preprocessed text by deleting all non-Chinese characters, and convert all Chinese characters into a binary string according to the parity of Unicode encoding. e number of secret message keywords hidden in the text may be one or more. For each keyword retrieval, the corresponding tag value is selected according to the tag position index storing in S SI . e binary string is traversed to search the tag and generate a word-list that contains all the text words following the tag. Suppose that the tag is l i and its corresponding keyword is k j . From the side information, the location of k j appeared in the word-list is known and used to extract the keyword.
Step 5: after all web-page links are processed according to Steps 1-4, the extracted keywords are filled in the corresponding positions of secret message according to the tag position index. And finally, the secret message is obtained by correct permutation and combination of all keywords.

Experimental Results and Analysis
e testing database was constructed by 10.28 GB data of web page texts, which were collected from the Internet by crawler tool written in Python. e content of web page texts covers various aspects, including ancient Chinese poems, Input: secret message and index file Output: analogously forged URLs 1: Generate the disordered tag-set L � [l 0 , l 1 , l 2 , . . . . . . , l 2 m −1 ] according to the protocol negotiated by the communication parties. m is the length of tag. 2: Apply preprocessing to the secret message to remove any non-Chinese characters, and then use the word segmentation tool to split the secret message into multiple keywords S MK � [k 0 , k 1 , k 2 , . . . . . . , k num−1 ], where num is the number of keywords in S MK . 3: i � 0 and j � 0 4: while j < num do \(⊳\) is is the process of retrieval for each split keyword in S MK . 5: Select l i + k j as the "tag + keyword" to apply retrieval in the index file. 6: if successful retrieval then 7: Return the set of carrier text URLs U i � [u 0 , u 1 , u 2 , . . . . . .] corresponding to l i + k j 8: j + + 9: i � mod (j, 2 m ) 10: else if e keyword k j 's retrieval is failed in tag l i branch of index file, and the number of characters in keyword k j is more than 1. then 11: e keyword k j is further split into smaller keywords k j1 , k j2 , . . ..

12:
e set S MK is updated, where k j is replaced by k j1 , k j2 , . . ..

13:
num is recalculated. and Let k j � k j1 . 14: else 15: e retrieval for k j is failed. 16: end if 17: end while 18: e returned U j sets corresponding to all successfully retrieved keywords are used to count up the occurrence frequency frq of keywords in every carrier text. 19: e text t x which has the highest value of frq is selected. 20: while frq⩾2 do 21: for each k j in t x with its corresponding tag l i do 22: Record two location marks in the side-information set S SI . One is the location of k j appeared in the word-list which contains all the different words following tag l i in t x . e other is the location of l i in L. 23: end for 24: Apply polynomial encryption for S SI to generate the encrypted marks S EM . 25: Forge an analogous URL with confidential information in S EM . 26: After removing all the processed keywords in S MK , t x with the highest value of renewed frq is selected again. 27: end while 28: for each remaining keyword k j appeared in one or more texts with frq � 1 do 29: Randomly select a URL in U j to obtain t x . 30: Record two location marks in S SI . 31: Apply polynomial encryption with S SI as input to generate S EM . 32: Forge an analogous URL with confidential information. 33: end for 34: All the analogously forged URLs will be sent to the receiver for secret information transformation. ALGORITHM 2: Secret information hiding. Security and Communication Networks 7 novels, and news and knowledge introduction. A number of experiments in this paper are implemented to evaluate the performance of the proposed steganographic algorithm and compared with different existing algorithms in aspects of hiding success rate and average hiding capacity, respectively.

Selections of Parameters.
Taking the size of 1 GB as the gradient, we totally construct five testing databases with 1 GB, 2 GB, 3 GB, 4 GB, and 5 GB of web page data with texts randomly selected from different topical contents. For each web page text database, the index file is generated by setting the tag length m � 4, 5, 6, respectively. e size values of index files, which are generated by different databases with different tag length settings, are shown in Table 1. As can be seen from the table, the index file size increases with both the tag length and the database size. Since the index file stores keywords and web page links, the size of the index file indirectly reflects the success rate of hiding. A larger index file contains more keywords and links inside, which makes the possibility of successful retrieval much higher for hiding information.
e secret message for information hiding is collected and stored in four sets. e experiments for performance comparison are implemented on the four message sets (MS) named as "MS-50-1," "MS-50-1," "MS-100-1," and "MS-100-2," respectively. e first number (50 or 100) in the name means how many secret messages are contained in the set. e second number (1 or 2) in the name means that the average size of the messages is 1 KB or 2 KB.

Success Rate of Hiding.
e success rate (SR) of hiding, also known as retrieval success rate, is used to evaluate the quality of the extracted secret message. It is defined by (5) mathematically.
SR � the number of successfully hidden messages the total number of secret messages .
In this paper, the previous "tag + keyword" method proposed by Chen et al. [20] is simulated and compared with our proposed method in different experimental settings. In terms of the effectiveness of information hiding, our proposed steganography algorithm has two differences in process design compared with the previous "tag + keyword" method. Firstly, in the index file generation process, multiple keywords can be selected for one tag in a text. Even if a relatively smaller web page text database is used, each tag in the index file can correlate with much more keywords or characters, which are enough for keywords retrieval in one secret message. erefore, each word in the text can be fully utilized, and the probability of the secret keyword being retrieved can be increased. Secondly, a polynomial encryption algorithm is used to encrypt the location information and tag index, which is appended to the tail of URLs for transmission. e design of location information and tag index as side-information solves the problem that the disordered data sending and receiving may easily cause the ambiguity of secret message extraction. Figure 5 shows the hiding SR by applying the two simulated methods with different cover text databases and secret message sets. Here, the length of location tag m is set to be 6. It can be seen from Figure 5 that the hiding success rate of our proposed method can nearly approach 100% for any size of web page text database. For both methods, the hiding success rate goes up with the increased size of web page text database. It is obvious that even if the smallest 1 GB database is selected for testing, our proposed method can approach above 95% success rate, which is much higher than the method of Chen et al. [20]. In addition, the hiding success rate hardly changes when the size of the secret message increases from 1 KB to 2 KB and the number of messages increases from 50 to 100, which means that the stability of our proposed method performs good.
e performance of AHC for Chen et al.'s method [20] and our proposed method is illustrated in Figure 6. ere are four groups of test values, which show the comparison AHC values with experiments implemented on the four message sets: "MS-50-1," "MS-50-1," "MS-100-1," and "MS-100-2." Here, the length of location tag m is set to be 6.
It can be seen from Figure 6 that our proposed method has a higher hiding capacity under the same size of web page text database. is is due to the index file generated by our proposed method, which contains a large number of keywords with longer word-length (i.e., more characters). erefore, the possibility of the keyword, which will be split into smaller segments, is reduced. And fewer carrier texts can be used for information hiding of a whole secret message, which leads to an increase in the average hiding capacity.
In addition, in order to demonstrate the improvement of our proposed method, there are more "tag + keyword" methods [8,[19][20][21], which are simulated and implemented to evaluate the performance of AHC. Table 2 lists the comparison values of AHC with different "tag + keyword" methods applied to four sets of secret messages. Here, we use the 1 GB web page text database, and the length of location tag m is set to be 6. It can be seen from Table 2 that the average hidden capacity of our proposed method is significantly higher than the existing similar methods proposed in [8,[19][20][21].

Security Analysis.
In terms of the security performance, the improvement of our proposed method is demonstrated in four aspects. Firstly, the transmission data is the URLs, which contain the links for cover texts downloading and the side information for message keywords extraction. In the process of data transmission, the carrier texts are neither modified nor directly transferred to the receiver, which can reduce the possibility of the risk that the carrier text is  attacked by malicious attackers. Secondly, the randomness of tag selection combining with the message keywords improves the security of our method. For example, the retrieved sets of web page links may be different by using different tags to retrieve the same keywords, and the keywords selected in the same text by different tags are also different. irdly, the number of location information and the tag index for once polynomial encryption with a specific   key is dynamical. It is hard for attackers to guess the encryption process with different keywords splits and combinations. Finally, it is hard for the attacker to reconstruct the corresponding polynomial when the initial value, step size, and prime number are unknown, which means that the side-information decryption is rarely possible.

Conclusion and Future Work
For the existing "tag + keyword" coverless steganography methods, there are some obvious drawbacks in text utilization rate, ambiguity of secret information extraction, and hiding capacity. In this paper, we propose a coverless steganography algorithm of Chinese text based on polynomial encryption. e novelty of our proposed method contains two main points. Firstly, in the index file generation process, one tag corresponded to multiple keywords in a carrier text. erefore, the effectiveness of information hiding in aspects of fully word utilization, and the probability of the secret keyword retrieval can be increased. Secondly, a polynomial encryption algorithm is used to encrypt the location information and tag index, which is appended to the tail of URLs for message transmission. e design of location information and tag index as side-information solves the problem that the disordered data sending and receiving may easily cause the ambiguity of secret message extraction. e experimental result shows that the secret message can be successfully extracted by sorting the extracted keywords according to the frequency and relevance of their combined tags. Compared with the previous "tag + keyword" based methods, our proposed method has a higher hiding success rate and the average hiding capacity in our experimental results. In future research, there are two main issues to explore. One is to analyze the stability of the proposed data hiding scheme with more realistic settings, such as upgrading the website cover text. e other is to extend our proposed data hiding scheme to not only Chinese language, but also other languages with similar writing system, e.g., Japanese, Mongolic, and Korean languages. e basic principle of our proposed method can be applied to many other languages with similar keyword character-feature that all sentences can be divided into different words and all the words consisted with fewer characters (normally less than four). Future studies could investigate the feasibility of the data hiding scheme for different languages.

Data Availability
e text data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.