Blind Key Based Attack Resistant Audio Steganography Using Cocktail Party Effect

Steganography is a popular technique of digital data security. Among all digital steganography methods, audio steganography is very delicate as human auditory system is highly sensitive to noise; hence small modification in audio can make significant audible impact. In this paper, a key based blind audio steganographymethod has been proposedwhich is built on discrete wavelet transform (DWT) as well as discrete cosine transform (DCT) and adheres to Kerckhoff ’s principle. Here image has been used as secretmessage which is preprocessed usingArnold’s Transform. Tomake the systemmore robust and undetectable, a well-known problemof audio analysis has been explored here, known asCocktail Party Problem, forwrapping stego audio.The robustness of the proposedmethod has been tested against Steganalysis attacks like noise addition, random cropping, resampling, requantization, pitch shifting, and mp3 compression. The quality of resultant stego audio and retrieved secret image has been measured by various metrics, namely, “peak signal-to-noise ratio”; “correlation coefficient”; “perceptual evaluation of audio quality”; “bit error rate”; and “structural similarity index.” The embedding capacity has also been evaluated and, as seen from the comparison result, the proposed method has outperformed other existing DCT-DWT based technique.


Introduction
In the present era, communicating through Internet has become vulnerable as there may be several intruders who can eavesdrop for secret messages to capture and disburse them for unlawful misconducts.Henceforth nowadays it is most necessary to camouflage secret message in such a way that stego cannot be identified as carrier of secret message.Camouflaging secret message through carrier objects introduces the age-old technique of steganography.However, with the current enormous use of Internet and elevation of various Steganalysis attacks, it is required to have an extra shield to protect steganography techniques.This is the reason cocktail party effect in audio steganography has been explored to ensure enhanced security during data transmission.

Audio Steganography Techniques.
In audio steganography, audio is used as cover media.In [1], authors have described different spatial and frequency domain techniques of audio steganography.The popular spatial domain techniques are as follows.
Least Significant Bit (LSB) Encoding.This is the simplest method of audio steganography where Least Significant Bit of each audio sample is modified with bits of secret message vector.With the extensive use of this method it becomes more prone to attack and its embedding capacity is poor compared to others.To cope up with the necessity of increasing capacity, authors of [2] have proposed an enhanced method of LSB technique where it has been proved that 2nd and 3rd LSB modification does not make audible difference in audio sample.In [3], authors have suggested another enhancement over LSB technique by shifting LSB modification from 3rd bit to 4th bit which incur more embedding capacity compared to previous methods of LSB encoding.
Parity Encoding.In this approach, audio signal is broken into number of samples [4].Depending on sample's parity bit, secret message is embedded in the LSB of the sample byte stream.
Echo Hiding.In this method, a short echo signal is introduced as part of cover audio where secret message is hidden [5].
Study shows that the echo signal is inaudible provided the delay between cover audio and echo signal is up to 1 ms.
The widespread frequency domain techniques are as follows.
Phase Coding.As human auditory system cannot percept phase component modulation, hence, in this technique, secret data is embedded by modification of selected phase component of cover audio signal.Using psychoacoustic model, a threshold is calculated which can be used as masking threshold [6].In [7], authors have used difference between the phase values of the selected component frequencies and their adjacent frequencies of the cover signal as a medium to hide secret data bits.This method provides more robustness than the previous approaches.
Spread Spectrum.The basic principle of spread spectrum is to spread the secret message over the frequency spectrum of cover audio signal.In [8], Direct Sequence Spread Spectrum is used to hide text data in an audio.Here a key is used to embed message to the noise.In [9], authors have discovered that low spreading rate improves performance of spread spectrum audio steganography.Therefore, authors have proposed a technique which decreases correlation between original signal and spread data signal by having phase shift in each subband signal of original audio.

Discrete Wavelet Transforms (DWT).
DWT decomposes a signal in four frequency components, popularly known as subbands.These sub bands are Low-Low (LL), Low-High (LH), High-Low (HL), and High-High (HH), as shown in Figure 1.The LL subband describes approximation details.The HL band demonstrates variation along the -axis or horizontal details and the LH band demonstrates the axis variation or vertical details [10].In other words, the low frequency subband is a low-pass approximation of the original signal and contains most energy of the signal.The other subbands include mainly detailed components which have low energy level.This is the reason LH subband is very popular for data hiding.
In [11], authors have proposed a method to create DWT of cover audio and select higher frequency to embed image data using low bit encoding technique.In [12], authors have decomposed the cover audio signal using Haar DWT and then choose coefficient to embed data.This is done using a precalculated threshold value to flip data.In [13], secret audio is embedded using synchronizing code in the low frequency part of DWT of cover audio.

Discrete Cosine Transforms (DCT).
DCT is used to convert a signal from spatial domain into frequency domain.DCT decomposes a signal into a series of cosine functions.The two-dimensional DCT can be performed by executing onedimensional DCT twice, initially in the  direction, next by  direction.The formulation of the 2D DCT for an input signal  with  rows and  columns and the output signal  has been given in where 0 ≤  ≤  − 1 and 0 ≤  ≤  − 1 and , where 1 ≤  ≤  − 1. ( Inverse 2D DCT is also available to transform a frequency domain coefficient to spatial domain signal, as specified in where 0 ≤  ≤  − 1 and 0 ≤  ≤  − 1.
As shown in Figure 2(a), the top left coefficient is called DC coefficient holding the approximate value of the whole signal; normally it has coefficients with zero frequency and the remaining 15 coefficients are called AC coefficients holding most detailed parameters of the signal, having coefficients with nonzero frequency.There are some DCT coefficients which hold quite similar values.Human brains are less sensitive to detect changes where all the elements hold more or less the same value.Therefore, this region of similar values can be selected for data hiding purpose.This region is known as midband region, as shown in Figure 2(b).
In [14], authors have used speech signal as cover, where voiced and nonvoiced part of the speech are separated by zero crossing count and short time energy.The secret data is embedded by modifying DCT coefficient of nonvoiced part.In [15], authors have decomposed the cover audio in 8 × 8 nonoverlapping block and secret data is hidden in the DC coefficient and 4th AC coefficient in line.In [16], authors have embedded secret data in the low frequency component of DCT quantization.In [17], authors have decomposed the cover audio into 8×8 block and then each of those blocks was decomposed further into 4 × 4 frames.Embedding of secret message depends on the difference between first or last two frames.

Correlation Coefficient (CC).
A correlation coefficient is a measure of linear relationship between two random variables.This term was first coined by Karl Pearson in 1896.The value of correlation coefficient can vary from −1 to 1.If the value is perfect −1 or 1 that indicates both variables are linearly related.If the value is 0 that indicates there is no relation between the said variables.Moreover, the sign indicates that the variables are positively related or negatively related [18].There are three types of correlation coefficients: Pearson's coefficient (), Spearman's rho coefficient (  ), and Kendall's tau coefficient ().Pearson's coefficient, which is also known as product-moment correlation coefficient, is the most widely used popular correlation coefficient.It is given by paired measurements ( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   ) as mentioned in where  and  are the mean of (  some sort of chaotic nature, as seen in the following transformation function: Γ : T 2 → T 2 given by, Γ : (, ) → (2 + ,  + ) mod 1. ( An image is collection of pixels in row and column arrangement, which can be organized in square or nonsquare shape.If Arnold transform is applied to an image, it scrambles the image by "" times iteration (e.g., iteration 1 will scramble less and iteration 10 will scramble more), which makes the image imperceptible.This undetectable image format can be used for data hiding securely as it is unable to reveal any existence of secret data.Hence scrambling an image can be a preprocessing step of data hiding technique.
Traditionally Arnold transform can be applied only for square matrices; however later it has been improvised to apply on any matrix, by where ,  ∈ {0, 1, 2, . . .,  − 1} , where (, ) is the element of original matrix and (  ,   ) is the element of transformed matrix and  is the order of the matrix; as shown in Figure 3, the point (, ) is sheared through and -axis to get (  ,   ).
Arnold transformation is reversible [19].To recover original image from scrambled image there are two ways, the traditional way is periodicity, and the better approach is to use inverse matrix, which is also known as Reverse Arnold Transformation [20] and expressed by In [21], authors have used Arnold's transformation to scramble the image before embedding into the DWT coefficient of cover audio.In [22], authors have embedded scrambled image in "Redundant Discrete Wavelet Transform" coefficient using Singular Value Decomposition (SVD) technique.In [23], authors have proposed data hiding in DWT and DCT domain using SVD where the secret image is scrambled before embedding.

Cocktail Party Problem.
Cocktail Party Problem is a classic example of source separation which is very popular in digital signal processing.In this problem, several people are talking to each other in a banquet room and a listener is trying to recognize one specific speech from that crowd of partying guests.Human brain can distinguish one explicit signal component from a mixed signal combination in real time which is popularly known as "Auditory Scene Analysis."However, in digital signal processing, it is difficult to extract only one speaker's voice from the rest in cocktail party situation.
In [24], Colin Cherry first revealed the ability of human auditory system to separate a single speech or audio from a combination of voices, which may turn into noise through properties like pitch, gender, rate of speech, and/or direction of speech.This task of separating single source audio from a noise is known as dichotic listening task [25].In [26], authors have reviewed the same techniques to train machine to segregate signals.In [27], Broadbent has concluded that simultaneous listening can be performed for small messages, not for long ones.Human ability to identify audio from a mixed signal can be improved by listening by two ears [28].
It has been seen that, in ideal circumstances, the signal detection threshold of binaural listening is 25 dB more than monaural listening.In [29], it has been stated that cocktail party effect can be explained by Binaural Masking Level Difference (BMLD).As per BMLD, for binaural listening the desired signal coming from one direction is ineffectively masked by the noise generated in different direction.In [30], Kassebaum et al. discussed two methods for signal separation-Back Propagation (BP) and Self-Organizing Neural Network (SONN).That experiment was carried out through 4 kHz channel using a modem data signal and a male speech signal.It has been concluded that BP requires more inputs and training time than SONN.
In [31] authors have discussed 3 types of approach to solve Cocktail Party Problem: (i) Temporal binding and oscillatory correlation (ii) Cortronic network (iii) Blind source separation.
In [32], von der Malsburg explained the temporal binding technique.He stated that neuron carries two distinct signals and the binding is accomplished by correlation.The synchronization allows neuron to create topological network.In [33], von der Malsburg and Schneider proposed a cocktail party processor enhancing this idea-the Oscillatory Correlation which is the basis of Computational Auditory Scene Analysis.In [34,35], multistage neural model has been proposed to separate speech from interfering sounds using oscillatory correlation.
In [36], authors have proposed a biological approach to solve Cocktail Party Problem using artificial neural network named as cortronic network.A cortronic neural network describes connection among neurons in several regions which demonstrates the output links of each neuron and the strength of the connections.
The Blind Source Separation (BSS) is the technique of separating signal from a mixed source without having knowledge of source signals and the process of mixing.There are different methods of BSS among which Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Time and Frequency domain approaches are significant.PCA and ICA are both statistical approaches which are better than Time or Frequency domain approach, since Fourier components of data segments are fixed in frequency domain whereas in statistical domain the transformation depends on the data to be analyzed [37].
PCA is a mathematical technique of transforming large correlated dataset into a small number of major components known as principal components [38].It is moderately related to mathematical theory of Singular Value Decomposition (SVD), which is used to implement PCA [39].Independent Component Analysis can also be implemented with SVD, though there are subtle differences between PCA and ICA.The aim of PCA is to find decorrelated variables whereas the aim of ICA is to find independent variables.PCA and ICA both perform matrix factorization for linear transformation, though PCA perform low rank matrix factorization whereas ICA performs full-rank matrix factorization.The As shown in [44], this strategy helps As stated in [45], this strategy results ✓ robustness against loss of network elements N inflexible refocusing of system onto events rapidly occurring in sequence ✓ richness of representation ✓ processing speed enhancement

Cortronic network
As mentioned in [36], in this method As shown in [36], this technique is ✓ there is no requirement for having knowledge of background sounds such as static, traffic, and music N costly to implement as it requires a separate artificial neural network Blind source separation As shown in [46], in this technique As reported in [47], in this method ✓ there is no need for having knowledge of source signals or the process of mixing N convergence speed is slow ✓ no need for defining a cut-off frequency for separation ✓ low computational complexity ✓ helps signal enhancement advantage of ICA over PCA is that PCA just removes correlations whereas ICA removes correlations and higher order dependencies [40].ICA has extensive use in biomedical imaging and audio processing [41].ICA can also be used for transformation to independent variable using multiplication of observed data and for demixing matrix [42].It depends on the fact that there are as many sources as channels of data available, which are to be separated as independent sources-by utilizing this fact, ICA is used in Blind Source Separation.In [43], author described a fast method for ICA using fixed point iteration.This algorithm is popularly known as FastICA.
In Table 1, comparison of the existing techniques for solving Cocktail Party Problem has been discussed.It can be noted that each of these techniques has its own advantage and disadvantages.However, as blind steganographic approach is considered more robust and secure than the nonblind steganography techniques, hence, in this proposed method, "Blind Source Separation" approach has been chosen for solving cocktail party effect.

Proposed Method
3.1.In a Nutshell.Steganography can be broadly grouped into two types: blind and nonblind techniques.The technique where cover object is not required to retrieve the secret is called blind steganography.The method where cover object is required to regain secret is called nonblind or cover escrow technique of steganography.To create a most robust method of steganography, here a blind steganography technique has been proposed.
In this proposed method, image has been used as secret message.This secret image is scrambled using Arnold transform.Then Haar filter is applied for two-dimensional DWT on the cover source audio.Since audio is one-dimensional signal, hence it must be reshaped into two-dimensional matrix to perform 2D DWT.Haar is simple, fast, and memory efficient compared to other available DWT filters like Daubechies and Coiflets.After DWT application, LH subband has been chosen for further decomposition into 4 × 4 blocks where two-dimensional DCT has been applied.As shown in Figure 2(b), in Section 2.1, midband region of those 4 × 4 blocks has been chosen and embedding has been performed by the following equation: where mid( Ḟ (  )) indicates midband frequency region; ∝ is the embedding factor; and PN is the pseudorandom number.Equation ( 9) has been further explained in Section 3.6; embedding factor (∝) has been discussed in Section 3.4 and pseudorandom number (PN) has been discussed in Section 3.5.
After embedding, the resultant cover becomes stego audio.To increase security of the proposed method, this stego audio is blended with other audio signals to produce cocktail party effect-afterwards this has been securely transmitted through the web to reach the intended recipient.Even if any intruder is able to break the communication channel and get access to the transmitted media, neither he would decipher the cocktail party effect to identify stego audio nor he would able to decode the stego audio to recognize the secret message without knowing the key required for extraction, whereas the intended receiver knowing the key as well as the entire algorithms is able to easily extract the secret message implanted without any loss of data.The proposed method is also tested against well-known Steganalysis attacks and the outcomes are quite impressive (discussed in Section 4.3)-hence this technique provides complete security.
Once the intended recipient receives the cocktail effect, using the demixing algorithm (discussed in Section 3.8) s/he   can separate the audios and can also apply the extraction procedure on them, as the recipient is aware of the key.The extraction algorithm performs correlation between the coefficients and extracts the secret bits, from which the scrambled secret image can be generated.Finally, by applying inverse Arnold transform, the secret image can be reconstructed.The flowcharts for embedding and extraction procedure have been shown in Figures 4 and 5, respectively.

Input Preparation
Cover Audio Source.Any speech or music can be used here as cover audio sources.For this demonstration, popular English songs have been chosen-as mentioned below.All the audio sources have been sampled at 44100 kHz in monochannel with 16-bit depth, cut to 26 seconds' duration for optimizing embedding capacity calculation, and finally saved as .wavfile.
The following are the audio sources used for this research experiment: (1) "My Heart Will Go On" by Celine Dion from film "Titanic" → saved as tt.wav (2) "Beat It" by Michael Jackson from album "Thriller" → saved as mj.wav  (9) in Section 3.1, embedding factor () has been multiplied with PN to offset the increment of DCT coefficient value such that, after embedding, stego audio will not have any audible noise.Hence the value of  must be between 0 and 1.After repeated experiments, it has been observed that when value of embedding factor nears 1, then the extracted message is having very high PSNR and SSIM-which tends to high robustness-however simultaneously, in stego audio, there are audible artifacts identified, which is differentiating with the cover audio.This signifies value of  near to 1 compromise imperceptibility.On the other hand, if the value of  approaches 0, the stego audio would be just like the original cover audio (the PSNR between these two audios reaches around 100 dB), whereas then the secret image extracted is completely corrupted.These test results indicate that, to get an optimum outcome, the tradeoff must be done between robustness and imperceptibility.
While experimenting with several cover audios along with various secret images, it has been also noticed that keeping a constant value of embedding factor () cannot ensure similar quality outcome, after extraction.Henceforth it is decided to set  depending on the cover to generate the optimal result.As the data hiding takes place in the LH subband of DWT, hence, to formularize , maximum coefficient value of the LH subband has been chosen as one of the aspects of the following formula: Embedding Factor () = Multiplicative Factor × Max (coefficients of LH).(10) Finally, for this proposed method, the value of Multiplicative Factor has been universally set as 0.2, based on the experimental outcome, as shown in Table 2.

Pseudorandom Number.
For embedding secret into cover, in this proposed method "pseudorandom number" (PN) has been used; PN is generated using Linear Feedback Shift Register (LFSR), as shown in Figure 6.Here LFSR has been designed using only right shift operator and the operation of this shift register is completely deterministic.It must be initialized with a set of numbers and, at any given point, the value of LFSR can be determined by its present state.
In this proposed method, two simple algorithms have been designed to generate two different sets of PN values for a given key with the same initial sequence of numbers.This initial sequence can be altered any time.Here, for easy illustration purpose, "0 0 0 0 1" has been chosen as initial sequence.
Description: The below algorithm(s) generates endless non-sequential lists of numbers in binary base using Linear Feedback Shift Register.Input: A number as Key Output: Pseudo-random Numbers, PN1[] and PN2[] respectively.Algorithm 1: written as function SRPN1 (Key) Step 1: set  = Key; Step 2: set initial state of shift register as state = [0 0 0 0 1] Step 3: set PN1 = []; Step 4: 3.6.Embedding Algorithm.To ensure more security and imperceptibility, in this proposed method, the secret message is embedded in the transform domain using discrete wavelet transform (DWT) as well as by discrete cosine transform (DCT).
Description: algorithm for embedding secret data.
Step 12: find mid-band coefficient region of Ḟ (  ) and term it as mid( Ḟ (  )); Step Step 5: apply 2D DWT on   to decompose it in LL, LH, HL and HH; Step 6: apply 2D DCT over LH and get Ḟ (  ) Step 7: find mid-band coefficient region of Ḟ (  ) and term it as mid( Ḟ (  )) then   (, ) = 0 else   (, ) = 1; end; Step 9: reshape the image bits stored in   to get secret scrambled image Step 10: set iteration as a number = Step 11: call function iArnold (  , ) which returns secret image (  )

Experimental Results and Analysis
This proposed method has been applied on several sets of cover audio and secret images, though, for efficient use of space, here only 2 sets of robustness test results have been presented for Steganalysis attacks.

Adherence to Kerckhoff 's Principle. In this research article, a key based steganography technique has been proposed.
Hence it should follow Kerckhoff 's principle of cryptography [48], which says an exemplary method should be secure even if the public is aware of all the details of that method except the key.As mentioned in Section 3.5, here LFSR has been used both at sender's end and at receiver's end.It requires a unique key to generate the same set of pseudorandom numbers [49] which are used in embedding equation (9) and again in Step8 of the extraction algorithm for comparing correlation coefficients.If the exact same key is not used during embedding and extraction, then LFSR will generate different set of pseudorandom numbers using which secret image cannot be extracted from the stego audio.Henceforth it is proved that the proposed method complies with Kerckhoff 's principle.

Outcome of Quality Metrics
Embedding Capacity (EC).EC is measured by the ratio between size of hidden message (in bits) and size of cover (in bits), as shown in (11) below.In this research experiment, it has been observed that, to hide 128 × 128 size of a secret image, it requires cover audio size of 1048576 bits-which implies embedding capacity value of 1.5625%.Similarly, to implant a 64 × 64 secret image, 262144 bits of cover audio is needed-this again confirms the proportion of embedding capacity as 1.5625%.capacity = size of hidden data size of cover data × 100%.
Peak Signal-to-Noise Ratio (PSNR).PSNR represents the ratio between maximum power of test signal and the power of reference signal.The mathematical representation for PSNR is as follows: where Max sf is maximum signal value or maximum fluctuation in the input image data type (e.g., for 8-bit unsigned integer data type, Max sf is 255) and MSE is the Mean Squared Error, which is given by where  Ref represents original signal;  Test represents degraded signal;  and  represent numbers of rows and columns of the signal matrix, respectively;  represents index of row and  represents index of column.
Structural Similarity Index (SSIM).SSIM is a measurement of similarity, calculated through luminance, contrast, and structural differences between two images as given below.
where  S and  E are the mean of secret image S and extracted image E, respectively;  S and  E are the standard deviation of S and E;  SE is correlation of S and E.
Bit Error Rate (BER).BER is defined by number of error bits divided by total number of transmitted bits, as shown in the following equation: Here the BER is calculated between original secret image and extracted secret image.Perceptual Evaluation of Audio Quality (PEAQ).PEAQ is a standardized metric to evaluate audio quality utilizing human perceptual properties, output of which is given in a scale of 1 to 5 (where 1 signifies poor and 5 implies excellent) depending on the Mean Opinion Score (MOS) of all listeners.The quality of output audio is measured by comparing with a reference audio.
where (, ) is the extracted image and (, ) is the reference image.NCC is used to produce surface plot, which depicts functional relationship between two independent variables and map to a plane which is parallel to - plane.
Here, in Figure 7, the surface plot of NCC between secret and extracted image has been shown.In Table 4, quality analysis of the cover and stego audio has been shown in PSNR, PEAQ, and CC.

Robustness Tests by Steganalysis Attacks
By Random Cropping.On average, English music or a full song has duration of over 5 minutes, that is, more than 300 seconds.In this proposed method, only 25 seconds of audio is required to hide a secret image having size of 128 × 128.This secret can be kept anywhere within the stego, that is, at the start or at the end or after th seconds-in short, the secret can be moved throughout the cover and the exact place of hiding is not predetermined.That is why 9 out of 10 attempts of random cropping leave the secret image intact, as stego has been cropped elsewhere.For the remaining 1 out of 10 attempts, that is, when the stego audio has been cropped in By Adding White Gaussian Noise.In this type of attack, "Additive White Gaussian Noise" (AWGN) is added to the stego audio to distort the hidden message.AWGN can be added to any signal, and it has uniform power and is    By Resampling.While writing audio data into a file, sampling rate of the audio is generally mentioned as Fs.In the resampling attack, at first this sampling rate has been changed to a higher or lower frequency while saving the same audio in a new file.As resampling causes impact on audio file length, hence, to maintain the same length as of original cover, modified audio has been cut or filled with zeros.Once saved, resampling has been performed again on the modified audio to revert it back to the original sampling frequency-by this, audibly no differences will be noted; however it will distort the embedded secret message (if any).In Table 6, result of such resampling attack has been shown.
By Requantization.The number of bits required to express each audio sample is known as bit depth.It is a measurement of sound accuracy: the higher the bit depth is, the more it would be precise.In the requantization attack, this bit depth of stego audio has been changed to pervert the embedded secret image.Table 7 illustrates the outcome of the extraction process after requantization attack.
By Pitch Shifting.Pitch means tone of a signal; it describes the quality of a sound by the rate of vibrations.In pitch shifting attack, original pitch of an audio is lifted or dropped without modifying its length to destroy the hidden message embedded in a stego audio.Here pitch shifting has been done by utilizing time-scale modification algorithm called "Phase Vocoder" [50], the result of which is shown in Table 8.

By MP3 Compression.
In this Steganalysis attack, stego.wavfile has been compressed to MP3 format to eliminate redundant data, by which embedded secret message would be completely removed.Here mp3write MATLAB function has been used to convert the stego.wavfile into mp3 format and mp3read MATLAB function has been applied to read from the mp3 file during extraction process.Table 9 reflects the extraction outcome from three different mp3 files of the same stego audio which has been encoded with bitrates 128 kbps, 192 kbps, and 320 kbps, respectively.

Comparison with Existing Method.
For comparison with the proposed method, research articles published in SCI indexed journal have been searched-where data hiding in audio has been performed by DWT along with DCT and extraction mechanism is blind.Authors of [51] have proposed DCT-DWT based data hiding technique using 16bit Barker code as synchronizing code to accommodate 64 × 64 binary image as secret message.From the comparison results presented in Table 10, this can be proved that the proposed method has outperformed the existing one in terms of quality and robustness test against Steganalysis attacks.In Table 10, "✓" signifies "satisfactory result obtained"; "N" signifies "unsatisfactory result or method does not comply"; and "-" implies "details not mentioned."

Conclusion
Secret communication using age-old steganography techniques often increases chances of detectability through the perceivable noise.Hence, in this article, the cocktail party effect has been considered which has effectively reduced the probability of detectability.This has also been proved by the help of different Steganalysis techniques.Additionally, PSNR, CC, and PEAQ values are also analyzed to determine the perceptual noise recorded due to secret message embedding and extraction.Since all the above results verify the undetectability and robustness of the system, hence it can be concluded that this audio steganography technique is successful in secret communication with very high robustness.
In future, this proposed method can be further improvised by utilizing speaker diarization technique, which determines "who spoke when."Application of speaker diarization along with speech recognition would identify a speaker's voice and this concept will permit segregating secret audio stream into multiple speech segments, ensuring another novel approach of data hiding.

Figure 7 :
Figure 7: Surface plot of NCC between secret and extracted image.

Security and Communication Networks 13 Table 4 :
Quality analysis of cover and stego audio.Secret image and cover audio Graphic plot of cover audio Graphic plot of stego audio PSNR The function mod  is important to regenerate the original  ×  image.The functions to shear in -axis, axis, and modulo function is represented in

Table 1 :
Advantage and disadvantage of different approaches for solving Cocktail Party Problem.
Find out the size of  and store in  and  Any scrambled binary Image ( × ), number of iteration () Output: Descrambled Image ( out ) Algorithm: written as function iArnold ( × , )Step 1: Find out the size of  and store in  and 3.3.Scrambling and Descrambling Algorithm for Secret Image.The "Arnold transform" algorithm randomizes the input image by number of iterations to create scrambled image.Input: Any binary Image ( × ), number of iteration () Output: Scrambled Image ( out ) Algorithm: written as function Arnold ( × , )Step 1:Input:

Table 2 :
Experimental Results with different embedding and multiplicative factors.
Figure 6: Simplified block diagram of LFSR.
Step 4: write Mixed 1 in audio file  3 with Fs 1 and write Mixed 2 in audio file  4 with Fs 2Step 1: read  3 and  4 in  &  while keeping their respective sampling frequencies stored in Fs 1 and FsStep 5: extract two sources from  as source 1 and source 2Step 6: write source 1 in  1 with Fs 1 and source 2 in  2 with Fs 2 Input: two monochannel .wavfiles( 1 and  2 ) having same duration and sampling rate of 44100 Hz Output: .wavfileshavingcocktailsoundeffect(S 3 and  4 ) Algorithm: written as function Mixing ( 1 ,  2 )Step 1: set Gain Factor () as decimal (0 <  < 1)Step 2: read  1 and  2 in sig 1 & sig 2 while keeping their respective sampling frequencies stored in Fs 1 and Fs 2Step 3: set Mixed 1 = sig 1 + ( × sig 2 ) and Mixed 2 = sig 2 + ( × sig 1 ); 2Step 2: find complex conjugate transpose of  and , store them in  and Step 3: create one matrix from  and , store it in Step 4: set  = FastICA();

Table 3
shows the quality outcome of the secret and extracted images with respect to PSNR, SSIM, BER, and correlation coefficient (CC, discussed in Section 2.2).

Table 3 :
Quality analysis of secret and extracted image.

Table 6 :
Outcome of resampling attack.

Table 8 :
Experimental results of pitch shifting attack.

Table 9 :
Experimental results of MP3 compression attack.withrespect to time.As shown in Table5, to test robustness of the proposed method, here 20, 30, and 40 dB of SNR (Signal-to-Noise Ratio) per sample is added to the stego audio signal, assuming the power of stego signal is 0 dBW (decibel-watt is a unit of power in decibel scale, relative to 1 watt). distributed