Audio Watermarking Algorithm with a Synchronization Mechanism Based on Spectrum Distribution

In order to solve the problem that it is difficult to extract the watermark when the carried audio is subjected to malicious attacks, audio watermarking algorithm based on spectrum distribution is proposed. An eigenvalue is designed to represent the spectrum distribution of the specified frequency-band, and then the binary watermark is embedded by adjusting the difference of the eigenvalues between two adjacent frequency-bands. (e polarity of embedding depth is determined by comparing the values of the eigenvalues, which will minimize the modification about the audio so as to improve transparency. (e binary watermark can be extracted blindly by judging the difference of the eigenvalues. (e improved synchronization mechanism takes the small frame with the largest energy in the voiced frame as the synchronization mark to search the embedding and extracting location of the watermark, which makes the algorithm have strong robustness. Experimental results show that the proposed algorithm has large payload capacity, good transparency, low complexity, and strong robustness against most attacks.


Related Works.
e rapid development of Internet and computer technology has provided great convenience for people to spread various multimedia resources on the network, but the copyright protection of these multimedia resources has become more and more concerned with people. Digital watermarking technology plays an important role in solving copyright protection problems [1][2][3]. It uses a specific embedding algorithm to conceal a watermark that can prove the author's copyright about multimedia resources. When a copyright dispute occurs, the author extracts the watermark from multimedia resources by using the extracting algorithm corresponding to the embedding algorithm to prove his ownership. Audio digital watermarking technology refers to a kind of security protection technology that embeds the watermark into the audio secretly without much impacting on the audio quality [4], so as to achieve the purposes of copyright tracking, integrity protection, content authentication, and recovery advertising timing.
According to different application purposes, audio digital watermarking can be divided into robust watermarking and fragile watermarking. Robust audio watermarking technology has strong robustness. When the carried audio is subjected to malicious attacks, it can still extract the watermark with a very small bit error rate (BER), so it is mainly used for audio copyright protection [5,6]. Fragile audio watermarking technology means that the watermark is very sensitive to malicious attacks. It can accurately locate and even recover the tampered region of the audio, so it is mainly used for audio integrity protection [7,8]. Most robust watermarking algorithms are developed in frequency-domain, and the signal processing algorithms mainly include discrete Fourier transform (DFT) [9], discrete cosines transform (DCT) [10,11], and discrete wavelets transform (DWT) [12][13][14][15]. Megias et al. [16] presented a blind audio watermarking algorithm in the DFT domain for overcoming the synchronization attack. e algorithm embedded the watermark in the frequency domain and marked the embedding location in the time domain. However, the synchronization mark in the time domain was vulnerable to attack which may lead to the failure in the extracting process because the watermark's location could not be searched. Tewari et al. [17] proposed an audio watermarking algorithm in the DCT domain. e algorithm selected the appropriate audio frame which would be processed by DCT according to the minimum energy threshold, and it embedded the watermark by modifying and quantifying the average energy of the DCT coefficient. is algorithm had strong robustness to MP3 attacks, but its robustness to other attacks needed to be improved. By introducing the spectrum shaping technology in autoregressive model into vector modulation, Hu and Hsu [18] proposed a blind audio watermarking algorithm in the DWT domain. e embedding depth of the algorithm was consistent with the auditory masking threshold and had high embedding capacity, but its robustness was poor. In order to extract the watermark blindly, many scholars use quantization index modulation (QIM) to design audio watermarking algorithm. Hwang et al. [2] presented an audio watermarking algorithm based on QIM and singular value decomposition (SVD). e algorithm directly applied SVD to stereo signal and embedded the watermark according to the ratio of the singular value. It had good robustness against signal processing operations, such as amplitude scaling, compression, and resampling.
In order to improve the robustness of the algorithm, many scholars design water-marking algorithms in a hybrid domain. Merrad and Saadi [19] proposed an audio robust watermarking technology in DWT and DCT hybrid domain according to the characteristics of strong correlation between two continuous samples. e algorithm had good robustness against random cropping, echo addition, amplitude scaling, and so on. Q. L. Wu and M. Wu [20] proposed an audio watermarking algorithm in the DWT-DCT domain. is algorithm embedded binary watermark into audio by modifying the average amplitude of the hybrid transformed coefficients according to the redundancy of the human auditory system. From the experimental results, it had good robustness against conventional signal processing attacks. However, it could not resist synchronization attacks, such as jittering and time scale modification (TSM), because it lacked synchronization mechanism. In a word, with the continuous development of audio watermarking research, scholars have developed more and more watermarking algorithms with different performance. However, most audio watermarking algorithms are robust against conventional signal processing operations, but their resistance to synchronization attacks needs to be improved, mainly because most algorithms lack effective synchronization mechanisms. Synchronization attack will seriously damage the structure of audio data, which will make it difficult to determine the exacting location of the watermark in the audio, resulting in the extraction failure. erefore, how to overcome synchronization attacks has become a very challenging problem in watermarking research. e most common attack types are random clipping, jittering, TSM, and pitch shifting modification (PSM). TSM adjusts the playing time of the audio by changing its playing speed, that is, the overall playing time is compressed or expanded, while the sample rate is almost unchanged. PSM usually modifies the tone of the audio without changing its playback speed. Jittering means deleting or adding one sample every several samples in the audio. Random clipping refers to randomly cutting out several samples in the different parts of the audio. Jiang et al. [21] proposed an audio watermarking algorithm that indirectly realizes synchronization by using the audio frame sequence number based on the global characteristics of audio frames. is scheme is robust against most of signal processing operations and partial synchronization attacks. Liu et al. [22] proposed an audio watermarking algorithm to resist the synchronization attack. e algorithm constructed a logarithmic mean characteristic by using the frequencydomain coefficients and then uses the residuals of the two sets of characteristics to design the watermarking algorithm. e robustness of this algorithm against signal processing operations and partial synchronization attacks was better than most existing watermarking algorithms. Hu et al. [23] proposed an audio watermarking algorithm with synchronization mechanism based on lifting wavelet transform. e algorithm sorted and reconstructed the approximate coefficients to design the embedding and extracting rules according to the expected bit rate of the watermark. It had strong robustness to synchronization attacks, but its transparency was poor. Wu et al. [24] proposed an audio watermarking algorithm with an implicit synchronization mechanism based on SVD and the genetic algorithm (GA). e synchronization mechanism of this algorithm took the sampling point with the largest amplitude in the voiced frame as the synchronization mark to track the location of the watermark in the audio and achieved good results in resisting synchronization attacks. However, the synchronization mechanism used a single sampling point as the synchronization mark, which would lead to inaccuracy in the region where the watermark was located because the sampling point might no longer have the largest amplitude after the audio was attacked. In the research of the synchronization mechanism, the existing synchronization mechanisms mainly include exhaustive search, explicit synchronization [25], implicit synchronization, autocorrelation, and constant watermark. Each synchronization mechanism has its own advantages and disadvantages. It needs to be coordinated with the specific watermark embedding and extracting algorithm in order to give full play to the best performance of the algorithm.

Contributions.
In the above introduction about the audio watermarking algorithm, we have known that the robustness of the algorithm is an important performance on the premise of ensuring good audio quality. After analyzing the characteristics of the audio in the time domain and the frequency domain, this paper proposes a robust watermarking algorithm based on spectrum distribution and improves the synchronization mechanism in reference [24]. e main contributions are as follows: (1) Propose a blind audio watermarking algorithm based on spectrum distribution. An eigenvalue that reflect the spectrum distribution of the audio in a narrow frequency-band is constructed, and then the difference of the eigenvalues between two adjacent frequency-bands is modified by adjusting the DCT coefficients. e embedding algorithm and the extracting algorithm are developed according to the corresponding relationship between the binary watermark and the difference of the eigenvalue. e proposed algorithm has good transparency, strong robustness, and blind extracting ability. In order to obtain good transparency, the polarity of the embedding depth is determined by comparing the values of the eigenvalues in two frequency-bands. e purpose of this is to minimize the modification of the DCT coefficient so as to ensure audio quality. Since the spectrum distribution of the carried audio does not change much under attack, embedding the watermark on the eigenvalue related to the spectral distribution makes this algorithm more robust. In addition, this algorithm can realize blind extraction, which will facilitate the application of this algorithm in practice.
(2) Improve the synchronization mechanism in reference [24]. Since most of the audio content is concentrated on the voiced frame, the synchronization mechanism in reference [24] took the single sampling point with the largest amplitude in the voiced frame as the synchronization mark to design a synchronization mechanism. is method sometimes was inaccurate in determining the region where the watermark was located, mainly because the synchronization mark might not be the original sampling point with the largest amplitude after the audio was attacked. On this basis, the synchronization mechanism needs to be improved. e improved synchronization mechanism divides the voiced frame into many small frames and then uses the small frame with the maximum energy as the synchronization mark and selects a certain number of continuous small frames around it to form the embedding region. e remainder of this paper is organized as follows: In Section 1, we will review related work of audio watermarking algorithms in recent years and then introduce our contributions in this paper. Section 2 describes the improved synchronization mechanism and shows the implementation steps in detail. Section 3 elaborates the principle of the proposed audio watermarking algorithm, and this section will be divided into two parts, including the procedure for embedding the watermark and for extracting the watermark. In Section 4, the implementation steps of the embedding algorithm and the extracting algorithm are set out in detail. Section 5 assesses the performance of this proposed algorithm and compares their performance with three related algorithms. Finally, Section 6 draws up the conclusion and gives the future research plan.

Synchronization Mechanism
Synchronization attack will seriously damage the structure of the audio data, which will lead to the extracting algorithm unable to accurately find the location of watermark in audio, so it is a very challenging type of attack [26,27]. erefore, it is particularly important to design a synchronization mechanism that can accurately search the location of the watermark in the audio. On the basis of the synchronization mechanism proposed in reference [24], it is improved to make the synchronization mechanism have a better performance. is method takes the small frame with the largest energy in a voiced frame as the synchronization mark and takes the audio data in a fixed region around this mark as the carrier for carrying the watermark. e specific steps are described as follows: Step 1: Convert the watermark W � w(q), 1 ≤ q ≤ L w . into a matrix with L 1 rows and L 2 columns. L w is the length of the watermark, and L w � (L 1 × L 2 ) . w(q) is the bit value, and w(q) ∈ 0, 1 { }.
Step 2: Divide the audio into L 1 fragments and select the voiced frames with the largest energy from each audio fragment to carry the watermark, and the length of each voiced frame is N 1 .
Step 3: divide each voiced frame into small frames with N 2 sample-points and calculate the energy of each small frame.
Step 4: take the small frame with the largest energy as the synchronization mark and select (a 1 + a 2 + 1) small frames around it as the region to be used to carry the watermark.
According to the specific position of the synchronization mark in the voiced frame, it can be divided into the following three cases.
(1) If the number of small frames before the synchronization mark is less than a 1 , it indicates that the synchronization mark is closer to the head of the voiced frame, and the watermark location should include (a 1 + a 2 + 1) consecutive small frames starting from the first small frame. (2) If the number of small frames after the synchronization mark is less than a 2 , it indicates that the synchronization mark is closer to the end of the voiced frame, and the watermark location should include (a 1 + a 2 + 1) consecutive small frames at the end. (3) In addition to the above two cases, the location where the watermark is located takes the synchronization mark as the benchmark and selects a 1 small frames forward and a 2 small frames backward.
After the above steps, the selected audio fragment with (a 1 + a 2 + 1) consecutive small frames is used to carry the watermark.

Principle of the Watermarking Algorithm
In this section, audio watermarking algorithm with a synchronization mechanism is developed. e proposed algorithm uses the improved synchronization mechanism to identify the embedding and extracting locations of the watermark. e embedding and extracting rules are developed based on spectrum distribution. It not only resists conventional signal processing attacks but also has excellent performance in resisting synchronization attacks.

Principle of Embedding
Watermarks. DCT has strong "energy concentration" characteristics and good decorrelation, so it has been widely used in the field of image and audio signal processing. Suppose that x(n) is the original audio with N sample-points, and it can be expressed as the following formula: where a n is the amplitude of the sample-point. Dividing x(n) into L 1 audio fragments and obtain the voiced frame with the largest energy with N 1 sample-points in each audio fragments. Use the improved synchronization mechanism in Section 2 to obtain the audio data x c (n 1 ) � a n 1 , 1 ≤ n 1 ≤ N 2 × (a 1 + a 2 + 1) for carrying the watermark, where n 1 is the position number of the audio data in the embedding region. Apply DCT on x c (n 1 ), as shown in the formulas (2) and (3).
x c n 1 cos where X c (0) is the component that its frequency is 0 Hz, and X c (k) is the k th harmonic component. e frequency f k of each harmonic component can be calculated by the following formula: where f s is the sampling frequency. e spectrum of the audio can be used to describe the proportion of each harmonic component in audio, which is related to the frequency and amplitude of harmonics. Divide X c (k) into L fragments to obtain the frequency-bands Each frequency-band contains N 3 spectrum lines, and e spectrum distribution function (SD) shown in the formula (5) is designed to represent the spectrum distribution of the frequency-band X l (i).\hskip3em where f l is the initial frequency of the l th frequency-band, f i is the frequency of the i th harmonic component, and i � (k − lN 3 ). After the audio is attacked, it is assumed that the variation of the DCT coefficient is Δ k . en, the new spectrum distribution SD l ′ can be shown in the following formula: Since Δ k is very small, and N 2 × (a 1 + a 2 + 1) is much larger than at is, when the audio is attacked, SD only changes very little. e following experiment can be used to verify this deduction. e experimental parameters are as follows: N 2 � 20, a 1 � 555, a 2 � 1500, f s � 44.1 kHZ, and N 3 � 200. Four attack types are applied to the tested audio, respectively, including amplitude scaling, noise corruption with 20 dB, low-pass filtering with 4 kHz, and TSM with +5%. e four curves under attacks in Figure 1 are basically consistent with the original curve, which indicates that SD can be used to express the stability of the audio spectrum structure after being attacked.
e experimental results are also consistent with the deduction from the formula (6). It can be seen that the spectrum structure has good stability after the audio is subjected to attacks, the watermark algorithm can be developed according to this feature.
According to the formula (5), the spectrum distributions of X l (i) and X l+1 (i) can be calculated as SD l and SD l+1 , then the average value can be expressed in the following formula: e modified coefficients X l ′ (i) and X l+1 ′ (i) can be expressed as the formulas (8) and (9).
According to formula (5), the spectrum distributions of X l ′ (i) can be calculated as SD l ′ , as shown in the following formula: Similarly, the spectrum distribution of X l+1 ′ (i) can be expressed as SD l+1 ′ � (1 − λ 1 )SD M . e difference of spectrum distribution in two adjacent frequency-bands can be described in the following formula: where λ 1 is the embedding depth, and λ 1 ∈ (−1, 1). λ 1 can be set according to the following formula: In order to prevent the serious degradation of audio quality caused by too much modification of the DCT coefficient, a threshold λ 2 is set to judge whether the watermark can be embedded in the two adjacent frequency-bands. If |SD l − SD l+1 |〉λ 2 SD M , it indicates that two frequency-bands need be set as invalid and no watermark bit can be embedded in them. Otherwise, modify the DCT coefficients according to the following embedding rules.

Principle of Extracting Watermarks.
In the embedding process, the binary watermark can be embedded into two adjacent frequency-bands by adjusting the DCT coefficients.  Security and Communication Networks erefore, the extracting rules is to extract the binary watermark w ′ (q) by calculating the difference between the spectral distributions of two adjacent bands, which can be expressed by the following formula: Figure 2 shows the embedding diagram, and the detailed embedding steps are described as follows:

Procedure for the Embedding Watermark.
Step 1: Convert the watermark w(q) into a binary matrix of (L 1 × L 2 ) Step 2: Divide the original audio into L 1 fragments. In each fragment, the voiced frame with the largest energy is selected to carry the watermark, and the length of the voiced frame is N 1 .
Step 3: Divide each voiced frame into small frames with N 2 sample points and select the small frame with the largest energy as the synchronization mark.
Step 4: Select (a 1 + a 2 + 1) consecutive small frames around the synchronization mark as the embedding location of the watermark. See Step 4 in Section 2 for details.
Step 5: Apply DCT on x c (n 1 ) to obtain DCT coefficients X c (k).
Step 6: Starting from the b 0 spectrum line, select L frequency-bands from X c (k), each frequency-band has N 3 spectrum lines.
Step 7: Calculate SD l , SD l+1 and SD M of the two adjacent frequency-bands.
Step 8: If |SD l − SD l+1 |〉λ 2 SD M , set these two frequency-bands as invalid and go to step 7, otherwise, go to step 9.
Step10: Repeat step 7 to step 9 until L 2 bits watermark are embedded into the voiced frame.
Step 12: Repeat step 3 to step11, until all L 1 lines of binary watermark are embedded into the original audio.
Step 13: Reconstruct all voice frames carrying the binary watermark to obtain the carried audio x ′ (n).
In order to improve the robustness of the algorithm, several voiced frames can be selected in each audio fragment to repeatedly carry the same line of binary watermark. For example, select three voiced frames with the largest energy in the same audio fragment to carry the watermark. Figure 3 shows the extracting diagram, and the detailed extracting steps are described as follows:

Procedure for the Extracting Watermark.
Step 1: Divide the carried audio x ′ (n) into L 1 fragments. In each fragment, the voiced frame with the largest energy is selected to extract the watermark from it, and the length of the voiced frame is N 1 .
Step 2: Divide each voiced frame into small frames with N 2 sample points and select the small frame with the largest energy as the synchronization mark.
Step 3: Select (a 1 + a 2 + 1) consecutive small frames around the synchronization mark as the extracting region of the watermark. See Step 4 in Section 2 for details.
Step 4: Apply DCT on x c ′ (n 1 ) to obtain DCT coefficients X c ′ (k).
Step 5: Starting from the b 0 spectrum line, select L frequency-bands from X c ′ (k), and each band has N 3 spectrum lines.
Step 6: Calculate SD l ′ , SD l+1 ′ and SD M of two adjacent frequency-bands.
Step 8: Extract 1 bit watermark from these two frequency-bands according to the formula (13).
Step 9: Repeat step 6 to step 8 until L 2 bits watermark are extracted from the voiced frame.
Step 10: Repeat step 2 to step 9 until all binary watermark are extracted from the carried audio x ′ (n).

Performance Evaluation
In this section, the performance of the proposed algorithm will be evaluated from four aspects, including payload capacity, transparency, robustness, and complexity. Evaluating transparency is to calculate the decline of audio quality before and after being embedded. It can be measured from both subjective and objective aspects. e subjective index is the mean opinion score (MOS) and the objective index is the signal-to-noise ratio (SNR) which is shown in formula (14), and the object difference grade (ODG) which is the output value of the perceptual evaluation of audio quality (PEAQ). Robustness refers to the characteristic that the algorithm can still extract the watermark more accurately after the carried audio was attacked by numerous attacks. Robustness can be usually evaluated by BER which refers to the proportion of error bits between the extracted watermark and the original watermark, and BER can be expressed in formula (15). e smaller the BER, the stronger the robustness of the algorithm against attacks. According to the standard of international federation of the phonographic industry (IFPI), SNR should be greater than 20 dB so as to make the audio have good quality, and BER should be less than 20%.

Security and Communication Networks
Normalized correlation (NC) also can be used to evaluate the robustness of the algorithm. It reflects the robustness by evaluating the similarity between the extracted watermark and the original watermark, as shown in formula (16). If the value of NC is close to 1, it indicates that the extracted watermark is very similar to the original watermark, that is, the robustness of the algorithm is strong.
e experimental parameters are as follows: (1) e watermarks are two binary images shown in Figure 4(a) with the size of (64 × 64) and Figure 4 (17) can be used to calculate the payload capacity of the proposed algorithm.

Security and Communication Networks
where t is the duration of the audio used to carry the watermark. In our experiment, t is 64 seconds, and the binary watermark is 4096 bits, so the payload capacity is 4096bit/64s � 64 bps. Table 1 shows the experimental results, including transparency, which is evaluated by SNR (dB), MOS, ODG of the audio, robustness which is evaluated by BER (%) and NC of the extracted watermark, and the payload capacity which is expressed as Cap (bps). According to the experimental results in Table 1, the proposed algorithm has good transparency, because under the premise of the payload capacity of 64 bps, the average SNR is as high as 26.86 dB, ODG is −0.45, and MOS is 4.7, which is higher than the standard of IFPI. Good transparency is mainly because the polarity of embedding depth is determined by comparing the SD of the two adjacent frequency-bands as shown in formula (12), which will minimize the modification of DCT coefficients. e waveforms of the original audio and the carried audio are shown in Figures 5(a) and 5(b), respectively (only show an audio clip about 3 seconds so as to display the details of the audio). e corresponding spectrograms are shown in Figures 6(a) and 6(b). It can be seen that the waveforms and spectrograms of the audio before and after embedding the watermark both have no obvious difference, which indicates that the proposed algorithm has good transparency.
In Table 1, BER is equal to 0 and NC is equal to 1, which indicates that when there is no attack applied on the carried audio, the watermark can be extracted accurately. erefore, the watermark image extracted in Figure 7(w) is the same as the original image in Figure 4(a), and the watermark image extracted in Figure 8(w) is the same as the original image in Figure 4(b). e payload capacity of the proposed algorithm is lower than that in reference [13], but has better transparency. e payload capacity and transparency of the proposed algorithm are both better than those in reference [23]. Under the same payload capacity, the transparency of the proposed algorithm is higher than that in reference [24].

Robustness.
is section will evaluate the robustness of the proposed algorithm. Carry out different attacks on the carried audio and then use the extracting algorithm to extract the watermark. Finally, compare the extracted watermark with the original watermark and quantitatively evaluate the robustness of the algorithm by BER and NC. Attack types include a variety of conventional signal processing operations and synchronous attacks, as shown in Table 2.
After the above attacks on the carried audio, the average values of BER (%) and NC calculated under the same type of attacks are listed in Table 3. e extracted images are shown in Figures 7 and 8.
According to the experimental results shown in Table 3, Figures 7 and 8, the proposed algorithm shows excellent robustness against some attacks, including noise corruption with 35 dB, MP3 compression with 128 kbps, re-quantization, resampling, echo addition with the delay of 100 ms and 50 ms, low-pass filter with 8 kHz, and amplitude scaling. e extracted watermarks are very clear. All BER values are equal to 0, and all NC values are equal to 1.
When it resists noise corruption with 20 dB, low-pass filtering with 4 kHz, MP3 compression with 64 kbps, jittering, and random cropping, it has strong robustness against those attacks. e extracted watermarks are very high similarity with the original watermarks, BER are below 2.98%, and NC values are above 0.99.
When resisting TSM, BER values are relatively high, but they meet the standard of IFPI, and the main extracted   main reason is that the proposed algorithm takes the small frame with the largest energy in the voiced frame as the synchronization mark. When the carried audio is subjected to PSM, the location of the synchronization mark may be offset. If the offset synchronization mark is still used as the benchmark to search the location of the watermark, the extracted information will be inaccurate.

Complexity.
e complexity of the algorithm usually can be measured by calculating the running time. Too much complexity will limit the scope of application of the algorithm. In reference [24], the average running time for the embedding watermark is 1055.99 seconds, and the average running time for extracting watermark is 0.8544 seconds. In our study, the embedding time and extraction time are greatly shortened. e average running time for the embedding watermark is 1.2612 seconds, and the average running time of the extracting watermark is 0.6012 seconds, so this proposed algorithm has low complexity. e experimental results show that the algorithm has a good comprehensive performance. Compared with reference [13], the payload capacity of this algorithm is slightly smaller, but it has higher transparency and better robustness against most attacks. Compared with reference [23], this algorithm has a larger payload capacity and better transparency, and its robustness is stronger except for noise corruption with 20 dB, TSM, PSM, and jittering. Under the same payload capacity, the proposed algorithm has lower complexity, better transparency, and stronger robustness against most attacks, especially low-pass filtering with 4 kHz, except for noise corruption with 20 dB, TSM, and PSM than the algorithm in reference [24]. is is mainly because the proposed algorithm designs the embedding algorithm and the extracting algorithm based on the spectrum distribution which has good stability, and the synchronization mechanism is an improvement on the synchronization mechanism proposed in reference [24].

Conclusion
e spectrum distribution of low-frequency components of audio will not change much after being attacked. Based on this feature, a robust and blind audio watermarking algorithm based on spectrum distribution is proposed. e proposed algorithm designs an eigenvalue to represent the spectrum distribution of the frequency-band and then develops the embedding algorithm by adjusting the difference of the eigenvalues between two adjacent frequencybands. e DCT coefficients are modified according to the embedding rules, and the polarity of embedding depth is determined by comparing the eigenvalues of the two frequency-bands, which will minimize the modification of DCT coefficients, so that the algorithm has good transparency. When extracting the watermark, the binary watermark can be extracted blindly only by judging the difference of the eigenvalues. erefore, the original audio is not needed, which is very convenient for practical application. In this study, the improved synchronization mechanism takes the small frame with the largest amplitude in the voiced frame as the synchronization mark to obtain the embedding and extracting location of the watermark in the audio, so as to improve the robustness of the algorithm. In addition, other measures are taken to improve the robustness of the algorithm, such as embedding the watermark into three voiced frames repeatedly. From the experimental results and the comparison with other similar algorithms, SNR of the proposed algorithm reaches 26.86 dB when the payload capacity is 64 bps, so it has a large payload capacity and good transparency. When the carried audio is attacked by noise corruption, MP3 compression, low-pass filtering, amplitude scaling, TSM, jittering, and random cropping, the extracted watermarks are very similar with the original watermarks, so this algorithm has strong robustness.
Although the proposed algorithm has advantages in transparency, payload capacity, complexity, and robustness against most attacks, it also has some shortcomings, such as the robustness against TSM and PSM that needs to be improved. In future research, we will overcome this and further improve the robustness for resisting in more attack types.
Data Availability e data are available upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.