Robust and Blind AudioWatermarking Scheme Based on Genetic Algorithm in Dual Transform Domain

In order to protect the copyright of audio media in cyberspace, a robust and blind audio watermarking scheme based on the genetic algorithm (GA) is proposed in a dual transform domain. A formula for calculating the embedding depth is developed, and two embedding depths with different values are used to represent the “1” and “0” states of the binary watermark, respectively. In the extracting process, the embedding depth in each audio fragment will be calculated and compared with the average embedding depth to determine the watermark bit by bit, so this scheme can blindly extract the watermark without the original audio. GA will be applied to optimize the algorithm parameters for meeting the performance requirements in different applications. Besides, the embedding rule is further optimized to enhance the transparency based on the principle of minimal modification to the audio. Experimental results prove that the payload capacity reaches 172.27 bps, the bit error rate (BER) is 0.1% under the premise that its transparency is higher than 25 dB, and its robustness is strong against many attacks. Significantly, this scheme can adaptively select the algorithm parameters to satisfy the specific performance requirements.


Introduction
In modern society, the development of network technology has greatly promoted the rapid dissemination of network resources. People can efficiently and conveniently obtain network resources. At the same time, they are also worried about the copyright infringement of these network resources [1]. erefore, how to protect the copyright of these network resources has aroused many scholars' interest. Encryption is a traditional technology to protect the information, but ciphertext cannot be spread more widely compared with plaintext [2]. Sometimes, if the hacker fails to get the correct password, he may corrupt it in a violent manner. erefore, encryption may put information at risk. Information hiding technology is a novel way to secretly embed information into some digital media that can be made public for achieving the purpose of protecting the copyright of the digital media, transmitting confidential information, labelling information, and so on [3]. It not only conceals the information content but also hides the transmission behaviour, which reduces the possibility of being attacked by paralyzing the human perception system. It can be seen from the above analysis that encryption prevents unauthorized hackers from obtaining the information, while information hiding technology conceals the behaviour of transmission [4]. erefore, information hiding technology provides a novel mode for protecting information in cyberspace, and it has been widely used in copyright protection, secret communication, content anticounterfeiting, military intelligence, identity authentication, and so on [5]. ere are four main indicators, including transparency, robustness, security, and payload capacity, to evaluate the performance of information hiding technology. In fact, they are contradictory, so researchers usually focus on some indicators according to the actual application requirements when they develop the hiding scheme. Steganography [6,7] and digital watermarking are the two main branches of information hiding technology. ey both must have high transparency; otherwise, the quality of the carriers that may be images [8], videos [9], audios [10,11] and so on will be degraded because of the embedded information. Steganography usually has large payload capacity to carry large confidential information. However, digital watermarking technology pays more attention to the robustness against external attacks.
Digital watermarking technology protects the copyright of the carrier by embedding the watermark that is difficult to be sensed by the human perception system [12]. In the process of using those carriers, watermarks hidden in the carriers may be destroyed or lost due to some signal processing operations, so watermarking scheme must have strong robustness to withstand those operations. In addition, to make the developed watermarking scheme convenient for practical application, the scheme with blind extraction is more popular. In recent years, digital watermarking technology has been widely used in the field of information security, so many scholars have invested in the research of watermarking scheme, and the core issue is how to effectively improve the overall performance of the watermarking scheme in practical application.
Audio media is one of the most common multimedia resources. ere are a lot of online songs, conversations, and other audio types in cyberspace every day. How to protect the copyright of these audio media has aroused the interest of many researchers. However, there are few research results on audio watermarking schemes, mainly due to the following difficulties. e human ear is very sensitive to the drop in audio quality caused by the presence of the watermark, which will affect the normal use of audio media [13,14]. In addition, many audios editing software can be used to modify audio signals conveniently, which will cause the watermark hidden in the audio to be destroyed and even lost partially or totally. Many existing audio watermarking schemes have not overcome these difficulties well, so there are still many technical issues to be solved in improving their performance.

Related Works.
According to the different domains, audio watermarking schemes are mainly divided into two categories: time domain methods [15,16] and transform domain methods [17][18][19][20][21]. In order to prevent the loss of the information hidden in the audio carriers which may suffer from various attacks, watermarking schemes developed in the transform domain usually have strong robustness. ere are many transform domain schemes, such as discrete cosine transform (DCT) [17], discrete wavelet transform (DWT) [18,19], discrete Fourier transform (DFT) [20], and singular value decomposition (SVD) [21]. DCT focuses most of the energy of the audio fragment on its low-frequency part, so many scholars use DCT to design an audio watermarking scheme. Hu [17] proposed a high-capacity audio watermarking scheme that embedded watermark into the lowfrequency coefficients according to the masking characteristics of the human auditory system in the DCT domain, but its robustness was not strong against some conventional signal processing operations. DWT has multiresolution characteristics, so it is often used to analyse the main component of the signal in both time domain and frequency domain [22]. Huang [18] presented an adaptive watermarking scheme which modified the DWT coefficients using signal-to-noise ratio (SNR) of the audio, and this scheme had good transparency because its embedding formula was optimized by minimizing the difference between the original coefficient and the modified coefficient. Hsu [19] designed an audio watermarking scheme based on the two-stage Lagrange principle and minimum-energy scaling optimisation in the wavelet domain. is scheme also had good transparency, but their BER was high indicating weak robustness against low-pass filtering, time-scaling, and resampling attacks. In [20], a stereo audio watermarking scheme was proposed in discrete Fourier transform (DFT) domain. is scheme calculated the similarity of the audio signals in two stereo channels to develop the embedding and extracting rules. Although the payload capacity of the scheme was low, it had strong robustness when withstanding attacks. In [21], a blind audio watermarking scheme was proposed based on singular value decomposition (SVD) by mixing the watermark with the diagonal matrix of singular value. In recent years, there are many watermarking schemes in multitransform domains, which are usually more robust than the schemes developed in a single transform domain. Lei [23] proposed an audio watermarking scheme in the dual transform domain consisting of SVD and DCT. is scheme processed the audio signal into two-dimensional data blocks which would be applied by SVD and DCT in turn, and finally the watermark was embedded into these SVD-DCT coefficients with larger values. Some audio watermarking schemes based on DWT and SVD are proposed in [24,25]. In addition, some scholars have proposed other watermarking schemes. Wang [26] proposed an audio watermarking scheme with blind extraction by using exponential modulation, and this scheme extracted exponential distance feature parameters by mapping the audio into two-dimensional dates and then embedded watermark on these parameters. ese above watermarking schemes have promoted the development of watermarking technology, but most audio watermarking schemes still have many shortcomings, such as poor transparency which may cause a decline in audio carriers, weak robustness which may lead to loss of the watermark, low security, nonblind extraction, and insufficient capacity. In addition, the parameters of the most existing schemes are set by the designers according to their own experience, which makes those schemes unable to determine their parameters adaptively in different application, thus unable to stimulate their best performance.
In our work, a robust audio watermarking scheme is proposed by combining the energy concentration characteristics of DCT with the multiresolution characteristics of DWT. Firstly, apply DWT on the audio carrier for choosing the data in a specific frequency band, which can improve transparency. Secondly, Apply DCT on the data to focus energy on the low-frequency component to carry the watermark, which can improve robustness. In order to solve the problem of parameter setting, a genetic algorithm is used to optimize the algorithm parameters in different applications.

Contributions.
It can be seen from the above introduction that there are still many problems in the audio watermarking scheme that need further research. In this paper, a robust and blind audio watermarking scheme based on GA is proposed in the DCT-DWT dual transform domain. Our contributions are as follows: (1) Our proposed scheme is developed in the DWT-DCT dual transform domain, and it has strong robustness to prevent from losing the watermark hidden in the carried audio which may suffer from various attacks. Besides, the embedding rules are further optimized based on the principle of minimal modification to the audio to improve transparency. (2) Our proposed scheme is blind when extracting the watermark. is scheme employs two different embedding depths to represent two states of the binary watermark. In the extracting process, the embedding depth in each audio fragment will be calculated and compared with the average embedding depth to determine the watermark information bit by bit, so the extracting process does not need the original audio.
(3) Our proposed scheme uses GA to optimize the important parameters adaptively for meeting the performance requirements in different applications, which can stimulate the optimal performance of the scheme. In many existing watermarking schemes, the algorithm parameters are often set by the designer according to their own experience, which cannot fully stimulate the performance of the algorithm and also cannot adjust the algorithm parameters adaptively in different applications. e remainder of this paper is organized as follows: the principle of the proposed scheme is described in Section 2, including chaotic encryption of the watermark, principle of the embedding algorithm, principle of the extracting algorithm, and optimisation of the parameters based on GA. In Section 3, the performance of this proposed scheme is evaluated in terms of transparency, payload capacity, and robustness. Furthermore, this proposed scheme is compared with some related schemes in recent years according to their experimental results. Finally, we summarize our work and introduce the future research focus in Section 4.

Principle of the Proposed Scheme
Due to the auditory masking effects, the human auditory system cannot effectively capture the extremely small changes from the frequency components in audio media, so watermarks can be embedded in audio media to protect the copyright of audio media. Figure 1 is the principle diagram of the watermarking scheme.
In embedding processing, watermarks will be encrypted firstly and then embedded into the audio media using the proposed embedding algorithm. Finally, the carried audio with the watermark will be uploaded to the Internet. When it is necessary to prove the copyright of the audio media, this carried audio will be implemented with the corresponding extracting algorithm to extract the encrypted watermark which will be decrypted using the correct key. In this scheme, the embedding and extracting algorithms are the core particularly. It can be seen that the extracting process and decryption are symmetric with the embedding process and encryption, respectively, and only those who have the corresponding extracting algorithm and the correct key can obtain the watermark.

Chaotic Encryption of Watermark.
In order to enhance the security of the watermark, it is necessary to encrypt the watermark before embedding it into the audio carrier. Because of its nonperiodical, continuous broadband, noise-like and long-term unpredictability, chaotic encryption is an information security protection technology that has developed rapidly in recent years and is especially suitable for security communications and other related fields [27].
Assume that the watermark can be converted into a binary stream W 1 , as follows: where w 1 (q) ∈ 0, 1 { }, L w is the length of W 1 , and q is the serial number of the element in W 1 . Apply logistic mapping equation to generate a chaotic sequence s(q) with the same size as W 1 as follows: where 0 < x q < 1, x 1 ∈ (0, 1) is the initial value when q � 1 and δ ∈ (0, 1) is a threshold to get s(q). e logistic system is in chaos when 3.5699456 ≤ α ≤ 4. Exclusive OR operation is performed on W 1 and s(q) to obtain the encrypted information W 2 shown in equation (4), where ⊕ represents the exclusive OR operator. e triple key Ch(x 1 , α, δ) will be the unique key that can be used to decrypt W 2 : In order to mark the start and end positions of W 2 , a synchronization code should be added to W 2 . For instance, add "1111 1111 0000 0000" in front of W 2 as the start flag, and add "1111 1111 0000 0000" after W 2 as the end flag.

Principle of the Embedding Algorithm.
Suppose that A is the original audio with L a sample points as follows: where k is the serial number of the element in A and a(k) represents the k th sample point. A is evenly divided into M audio fragments which can be expressed as A l (1 ≤ l ≤ M, M ≥ L w ); then, the length of each audio fragment is N � floor(L a /M), where floor ( ) indicates that the data in brackets is rounded down. Apply r-level DWT on A l to obtain a set of the approximation coefficient AC r and r sets of the detail coefficient DC i (i � 1, 2, . . . , r). AC r contains the main frequency components in A l , which may cause a serious decline in audio quality when there are minor changes in AC r , so the watermark usually is not concealed into AC r . DC i contains the higher frequency components in A l , so the watermark often can be concealed into these frequency band because of the less influence on audio Mathematical Problems in Engineering quality. DC 1 represents the highest frequency band of the audio signal, and the degradation of audio quality caused by embedding the watermark in this frequency band is usually hard for human ear to perceive. Since the high-frequency components are vulnerable to attack, the detail coefficient close to AC r is usually used to carry the watermark. In the following description, the principle of embedding algorithm will be illustrated by taking how to embed 1-bit binary information into the r-level detail coefficient DC r of A l as an example. Divide DC r into two data blocks, one is the former block DC r (j) (j � 1, 2, . . . , N/2 r+1 ) and the other is the latter block DC r (j + N/2 r+1 ), where j is the serial number of the element in DC r . Perform DCT on those two data blocks to obtain two sets of transform domain coefficients (TDC) which can be expressed as TDC 1 (r, j) and TDC 2 (r, j), respectively.
e low-frequency components of TDC 1 (r, j) and TDC 2 (r, j) can be used to carry the watermark because they contain most of the energy of TDC, which is beneficial in improving the scheme's robustness. M l1 and M l2 are the average amplitudes of the low-frequency components of TDC 1 (r, j) and TDC 2 (r, j) as follows: where p � 1, 2, . . . , N/2 r+2 and M l is the average value of M l1 and M l2 , as shown in equation (8). In order to embed 1bit binary information into the audio, TDC 1 (r, p) and TDC 2 (r, p) should be modified by the rules in equations (9) and (10), where λ is the embedding depth in the range of (0, 1). TDC 1 ′ (r, p) and TDC 2 ′ (r, p) are the modified coefficients.
where M l1 ′ and M l2 ′ are the average amplitude of the modified coefficients TDC 1 ′ (r, p) and TDC 2 ′ (r, p) and can be calculated in the following equations: e average value M l ′ of M l1 ′ and M l2 ′ can be calculated in equation (13). en, the variation Δ of M l1 ′ and M l2 ′ can be expressed in equation (14). us, the embedding depth λ of A l can be calculated according to equation (15): Two embedding depths with different values can be used to represent the status of 1-bit binary information "1" and "0," so the embedding rules can be designed in the following equation when embedding the q th bit of the watermark into A l , where o < λ 1 < λ 2 < 1, l � l 0 + q, and A l 0 +1 is the first audio fragment to carry the watermark: e rules shown in equations (9) and (10) can be named as the first rule, and the second rule can be shown as follows:. en, the average amplitudes M l1 ′ and M l2 ′ of C 1 ′ (r, p) and C 2 ′ (r, p) will be calculated in equations (19) and (20) according to the second rule. When M l1 ≥ M l2 , ′ |, and Δ 2 � |M l2 − M l2 ′ | are the variations, as shown in Figure 2(a) according to the first rule and also can be shown in Figure 2(b) according to the second rule: Both Δ 1 and Δ 2 in Figure 2(a) are smaller than those in Figure 2(b), which indicates that the first rule is better than the second rule in improving the scheme transparency when M l1 ≥ M l2 . Similarly, the comparison of the two graphs in Figure 3 about Δ 1 and Δ 2 also implies that the second rule should be chosen to embed the information bit when M l1 < M l2 .
Finally, the inverse discrete cosine transform (IDCT) and the inverse discrete wavelet transform (IDWT) are performed on the modified coefficients in turn to obtain the carried audio fragment A l ′ . e flowchart of the embedding algorithm is shown in Figure 4. e embedding process can be described as follows: Step 1: convert the watermark into a binary stream W 1 and encrypt it to obtain W 2 .
Step 2: add synchronization code at the beginning and end of W 2 .
Step 3: divide the original audio A into M audio fragments A l (1 ≤ l ≤ M).
Step 4: apply DWT on A l to obtain the r-level detail coefficient DC r .
Step 5: divide DC r to obtain the former block DC r (j) and the latter block DC r (j + N/2 r+1 ).
Step 6: apply DCT on two data blocks to obtain TDC 1 (r, j) and TDC 2 (r, j), respectively.
Step 9: apply IDCT and IDWT in turn to recover the carried audio fragment A l ′ .
Step 10: repeat Step 4 to Step 7 until all watermark bits are concealed.
Step 11: recombine all audio fragments to recover the carried audio A ′ .

Principle of the Extracting Algorithm.
e extracting process is symmetric with the embedding process. When extracting the watermark from the carried audio A ′ , the embedding depth in each audio fragment will be calculated according to equation (15), and then, the 1-bit binary information will be determined by comparing it with the overall average embedding depth λ 0 which can be calculated according to equation (21). e extracting rule is expressed in equation (22): When all binary bits are extracted from the audio fragments A l ′ (1 ≤ l ≤ M), the data between the start flag and the end flag is the encrypted W 2 ′ . Remove the synchronization code, and then decrypt W 2 ′ using the triple key Ch(x 1 , α, δ) to obtain the watermark. It can be seen from the principle of the extracting algorithm that this scheme has high security because only those who have the corresponding extracting algorithm and the correct key can access the watermark. e flowchart of the extracting algorithm can be shown in Figure 5. e process of extracting the watermark can be described as follows: Step 1: divide the carried audio A ′ into M audio fragments A l ′ .
Step 2: apply DWT on A l ′ to obtain the r-level detail coefficient DC r ′ .
Step 3: divide DC r ′ to obtain the former block DC r ′ (j) and the latter block DC r ′ (j + N/2 r+1 ).
Step 7: repeat Step 2 to Step 6 until all λ are calculated.
Step 10: remove the synchronization code and decrypt W 2 ′ to obtain the watermark.

Optimization of the Parameters Based on GA.
ree important parameters (r, λ 1 , λ 2 ) of this scheme have an important effect on the overall performance of the scheme. To stimulate the best performance in different applications, GA is used to search for the optimal parameters according to the specific performance indicators. e fitness function Fitness can be constructed with transparency, payload capacity, and robustness as follows: SNR and BER can be expressed in equations (24) and (25), respectively, and Cap represents the payload capacity. SNR 0 and Cap 0 are the thresholds of transparency and payload capacity: Mathematical Problems in Engineering 5 where A and A ′ represent the original audio and the carried audio, respectively, and w(q) and w(q) ′ represent the original watermark and the extracted watermark. e population POP consists of C 1 chromosomes, and the length of each chromosome is C 2 . Chromosomes will be encoded by using a binary encoding approach, as shown in equation (26), where Chrom 1 represents the first chromosome; Bin 1 (r), Bin 1 (λ 1 ), and Bin 1 (λ 2 ) represent the binary of r, λ 1 , and λ 2 ; and their lengths are L r , L λ 1 , and L λ 2 , respectively: where r can be obtained by Converting Bin 1 (r) from binary to decimal. e transformation relationship between the chromosome and the two parameters (λ 1 , λ 2 ) can be shown as follows: where B2D [ ] means converting the data in brackets from binary to decimal and λ 01 < λ 02 < λ 03 < λ 04 ∈ (0, 1). e detailed process can be described as follows: Step 1: parameter initialization. Set the crossover probability p c , the mutation probability p m , λ 01 , λ 02 , λ 03 , λ 04 , Cap 0 and SNR 0 and then generate an initial population POP 0 .
Step 2: calculate the four parameters (r, λ 1 , λ 2 ) and then execute the embedding algorithm in Section 2.2 to obtain the carried audio.
Step 3: attack test. Apply some attacks on the carried audio, and calculate SNR according to equation (24).
Step 4: choose all qualified chromosomes that meet SNR > SNR 0 and Cap > Cap 0 . en, execute the extracting algorithm to calculate BER according to equation (25).
Step 5: calculate the fitness value according to equation (23) to obtain the best chromosome with the largest fitness value.
Step 6: apply selection operation by roulette on the best chromosome to generate the transition population POP 0 ′ .
Step 7: apply crossover operation on two adjacent chromosomes except for the best chromosome to obtain a new transition population POP 0 ″ .
Step 8: apply mutation operation on each chromosome except for the best chromosome to obtain the next generation population POP 1 .
Step 9: repeat Step 2 to Step 8 until the global optimal chromosome appears.

Performance Evaluation
In this section, the performance of this scheme is evaluated, including transparency, security, capacity, robustness, and complexity. e detailed experimental parameters can be described as follows: (1)     D′ e1 (r,j) and D′ e2 (r,j) C′ 1 (r,j)=DCT (D′ e 1 (r,j)) C′ 2 (r,j)=DCT (D′ e 2 (r,j)) The extracted binary bit is '1' The extracted binary bit is '0' Remove sys-code and decryption The core extracting process of information SNR, the subjective difference grades (SDG), and the object difference grade (ODG) will be used to evaluate the transparency of this proposed scheme. SDG refers to the original audio and the carried audio being provided to the same group of listeners to distinguish the difference and give a subjective score. e closer the average score is to 0, the better the audio quality will be. ODG is one of the output values obtained from the perceptual evaluation of audio quality (PEAQ), so it can be used to give an objective score from − 5 to 0 for audio. BER refers to the ratio of the number of erroneous bits to the total number of bits in the extracted information, which can be used to evaluate the robustness of the scheme. Correlation coefficient (NC) refers to the similarity between the original information and the extracted information, as defined in equation (29). It is located in the range of [0, 1]. e larger the NC, the more similar the original information and the extracted information, and the stronger the robustness of the scheme.

Transparency and Capacity.
To illustrate the transparency of this scheme, an audio clip lasting for about 3 seconds was picked up randomly from the tested audio signals to compare the waveform and spectrum before and after embedding information, as shown in Figure 7. It can be seen from the figures that the waveforms of the original audio are very similar to those of the carried audio, and there is no obvious difference between them, so this scheme has high transparency. In the case of no attack, the average experimental results are listed in Table 1. e experimental results in Table 1 confirm that this scheme has higher transparency and robustness compared with [22,26], which can be seen from the results about SNR, SDG, ODG, BER, and NC. In the case of providing the capacity of 172.27 bps, although the transparency of our scheme is slightly lower than that of [19] and BER of our scheme is significantly better than that of [19], which shows that the robustness of our scheme is stronger.

Robustness.
In this section, robustness evaluation can be carried out through the following three steps. Firstly, implement various types of attacks on the carried audio, then use the developed extracting algorithm to extract the watermark in the carried audio, and finally calculate BER and NC of the watermark. e attack types considered in our test are shown in Table 2. Figure 8 shows the waveform comparison of the original audio and the carried audio that they are both compressed by MP3 with 64 kbps (only show an audio clip lasting about 3 seconds randomly). It can be seen from Figure 8 that there is no obvious abnormality in their waveforms. e waveform comparison diagrams after they suffered other attacks are similar to this, which will not be shown here one by one. Table 3 shows the result about BER of the watermark extracted from the carried audio under the above attacks.
e extracted watermarks and their NC values are listed in Figure 9. According to the experimental results in Tables 1  and 3 and Figure 9, the payload capacity and SNR of this scheme reach 172.27 bps and 25 dB, respectively, which indicates that this scheme has large capacity and high transparency, so it can be used to protect the copyright of audio media without affecting the audio quality itself. e extracted watermarks are very similar to the original watermark under most attacks except for additive noise with 20 dB, the values of BER are below 1.60%, and NC values are above 0.9237, which implies that this scheme has strong robustness, so the watermark will not be destroyed or lost due to some conventional signal processing operations in the process of using the audio. Compared with the schemes in the other references, this scheme has better overall performance. Although the transparency of our scheme is slightly lower than that in [19], our scheme is more robust when resisting all the attacks listed in Table 2. Under the same capacity, this scheme has stronger robustness than that in [22] when against most signal processing operations. is proposed scheme also has a larger capacity and better robustness than that in [26]. e average values of BER in [26] reach 2.87% and 17.92% when resisting amplitude scaling, while those of this proposed scheme are only 0.19% and 0.34%. Amplitude scaling is the most common way that audio media may suffer from in the process of being used, so this proposed scheme is more practical.

Security and Complexity.
According to the Kerckhoffs criterion, the security of the scheme should not only depend on the scheme itself but also be further strengthened with      encryption technology. erefore, the size of the key space determines the security degree of the scheme. is scheme uses a triple key to encrypt the watermark in chaos. ree parameters of this triple key are all taken in the real field, so this scheme has infinite key space in theory, but in fact, they are affected by the word length of the computer system; thus, their key space is limited. In addition, the comparison of the spectrogram in Figure 10 also indicates that the characteristics of the carried audio have some changes because of carrying watermarks. e running time of the algorithm can be used to test its complexity. In our scheme, GA is used to optimize the main parameters of this algorithm, so the overall running time is also related to the search efficiency of GA. e average embedding time of this algorithm is 35.4561 seconds, and the average extraction time is 11.6896 seconds.
From the above analysis, it can be seen that our proposed scheme has better performance, mainly because it combines the advantages of DWT and DCT to enhance the robustness, optimizes the important parameters of the scheme by using GA, and adjusts the embedding rules to improve the transparency based on the principle of least modification of TDC. However, the complexity of this scheme is high because it takes more time to search for the best algorithm parameters using GA. In addition, slight changes in the spectrogram may also expose the watermark hidden in the audio media.

Conclusions
In this paper, a robust and blind audio watermarking scheme based on GA is proposed in the dual transform domain, which can be used to protect the copyright of the audio media in cyberspace. is scheme is developed in the DWT-DCT dual transform domain, so it has strong robustness to prevent from losing the watermark hidden in the carried audio which may suffer from various attacks. Furthermore, this scheme utilizes GA to optimize the important parameters adaptively for meeting the performance requirements in different applications. Besides, this scheme adjusts the embedding rules based on the principle of minimal modification to the carried audio to improve transparency. When embedding watermarks, firstly, the carried audio will be divided into many audio fragments, and DWT will be performed on each audio fragment to obtain one set of appropriate wavelet coefficients which will be used to carry the watermark. Secondly, those appropriate wavelet coefficients will be divided into two groups of data blocks which will be implemented by DCT to obtain two groups of TDC, respectively. Finally, two different embedding depths are used to modify these two groups of TDC for embedding the binary watermark according to the designed embedding equation. When extracting the watermark, the embedding depth of each audio fragment will be calculated firstly and then compared with the overall average embedding depth to extract the binary watermark according to the designed extracting equation, so this scheme is blind because it can extract the watermark without the carried audio.
Experimental results confirm that this proposed scheme can be used to embed watermarks into audio media without affecting the audio to be used normally and blindly detect it. Compared with other schemes in the relevant references, it has achieved excellent performance, such as the strong robustness when withstanding MP3 compression, additive noise, low-pass filtering, requantizing, resampling, amplitude scaling, and echo jamming. e SNR of the carried audio reaches more than 25 dB in the case of the payload capacity of 172.27 bps, which indicates that this proposed scheme has good transparency and a large capacity. However, due to the intervention of the genetic algorithm, this algorithm has high complexity. In the next research work, we will strive to reduce complexity and improve the security of the scheme.

Data Availability
All audio signals and images tested in our experiment can be used under the public platform.

Conflicts of Interest
All authors declare no conflicts of interest.