Adaptive Threshold Learning in Frequency Domain for Classification of Breast Cancer Histopathological Images

,


Introduction
Cancer is the second leading cause of death globally, and breast cancer has now replaced lung cancer as the most common cancer in the world [1,2].Accurate diagnosis of breast cancer is crucial for successful treatment and reducing mortality rates.Tere are diferent detection techniques used for diagnosis, such as biopsy, ultrasound (US), magnetic resonance imaging (MRI), and infrared thermography.Among them, biopsy is the most reliable and widely used technique for detecting breast cancer [3].Tis requires pathologists to observe the morphology of the tissue under a microscope and make a diagnosis [4].However, pathologists who can make accurate diagnoses require years of training, and the diagnostic process is time-consuming and laborious [5].Targeted computer-aided diagnosis systems can help improve the efciency of breast cancer tissue pathology diagnosis and facilitate the examination of more patients [6].
In recent years, the CNN (convolutional neural network) methods that have demonstrated signifcant advantages in image classifcation tasks have been widely applied to medical image classifcation tasks.Also, it has become the mainstream method for breast cancer histopathology image classifcation research and has shown signifcant advantages over traditional methods on publicly available datasets [7][8][9][10].However, some studies have found that the output of a well-trained CNN in image classifcation tasks can undergo signifcant variations due to minor variations in the input, and the quality of the image can signifcantly afect the accuracy of classifcation [11,12].Histopathological images generated by optical microscopes contain noise, which mainly originates from three sources [13][14][15]: (1) Uneven or excessively strong lighting from the light source can create refections or shadows, leading to noise in the image such as glare or black spots.(2) Te instrument itself may also contribute to noise that afects the quality of the image.For example, the optical components of a microscope may have issues such as chromatic aberration or distortion, leading to noise in the image such as color shifts or distortions.(3) Te preparation process of tissue specimens can also have an impact on the quality of the image.For instance, the slicing process may result in uneven thickness, cracks, etc., leading to noise in the image such as discontinuity or blurriness.
Terefore, when classify histopathological images, it is necessary to denoise to ensure the accuracy and reliability of the analysis results.
Wavelet transform [16,17], Gaussian fltering [18], and other fltering methods [19] are commonly used for image denoising.Among them, the wavelet transform is widely used due to its ability to preserve useful information in the original signal [20,21].Treshold selection is an important step in wavelet denoising, as it determines which detail coefcients will be fltered out.However, the commonly employed threshold calculation formulas yield varying thresholds.For instance, the Sqtwolog [22] method tends to opt for higher thresholds, while the Minimax [23] method leans towards selecting lower thresholds.Te choice of which formula to use is a difcult problem and often relies on experimentation and expertise.
Previous studies often conducted experiments on simulated signals with added noise.Some researchers would empirically try several diferent threshold calculation formulas and then use SNR (signal-to-noise ratio) as an evaluation metric to select the formula that yielded the best denoising efect.Some studies did not use threshold calculation formulas but instead directly experimented with a large number of thresholds and selected the one that maximized the SNR of the denoised image.However, comparing the SNR metric requires prior knowledge of both the signal and noise, which is not suitable for real images where noise is not clearly.In addition, in classifcation tasks, the ultimate evaluation metric is accuracy.However, the decision-making process of the model is like a black box, so accuracy may not necessarily be maximized when SNR is at its highest.Moreover, attempting to apply numerous thresholds directly on real images would consume a significant amount of computational resources due to the high cost of deep learning training.
Tis paper proposes adaptive wavelet threshold method, which combines the threshold selection step with deep learning methods by treating the threshold as parameters in the CNN model and training it together with the model.Tis approach can link the threshold to the model's classifcation results and use back-propagation during training to fnd the appropriate threshold for the image and task.Experimental results indicate that the thresholds trained in our study outperform those computed using both the Sqtwolog and Minimax formulas.

Related Work
2.1.Wavelet Treshold.Te wavelet transform, as an effective signal processing technique, has been widely applied in the feld of image denoising.In the area of denoising histopathological images, wavelet transform methods have also been extensively researched.For example, in [24], denoising methods including wavelet denoising were tested on pancreatic histopathological images, while in [25], wavelet denoising was applied prior to the classifcation of breast cancer histopathological images.
Factors afecting the efectiveness of wavelet denoising mainly include the selection of wavelets, threshold selection, threshold function selection, and the number of wavelet transforms.Te threshold selection step is crucial to the denoising efect as it determines the range of detail coefcients that will be fltered out.Commonly used threshold selection methods include Sqtwolog, RigrSure, VisuShrink, and Minimax [22,23].Minimax and RigrSure are relatively conservative and tend to retain more high-frequency coefcients.Te other two methods, Sqtwolog and VisuShrink, especially Sqtwolog, can remove more noise but may also remove useful high-frequency signals.
Te choice of threshold selection method is a challenging issue, and previous research often requires experimentation to assess which one is better.For example, [26] simulated a noisy signal and then attempted the four threshold selection methods mentioned above, using SNR (signal-tonoise ratio) metric to evaluate the methods.Te authors in [27] conducted a more comprehensive comparison of threshold functions.Te authors in [28] not only experimented with existing threshold selection methods but also designed new threshold selection methods and compared them using the SNR metric.In addition to simply comparing threshold selection methods, [29] also designed an iterative search method for experimenting with a large number of thresholds and used the SNR metric to evaluate them.While paper [29] is most similar to our research, there are still signifcant diferences.Te authors in [29] conducted experiments using simulated signals with added noise and evaluated the thresholds based on SNR.In contrast, we utilize actual images and optimize thresholds based on loss.In comparison, our threshold selection method can be directly applied to images with uncertain noise, and the selected thresholds are closely related to the specifc image and classifcation task at hand.

Breast Cancer Classifcation.
CNN is the predominant methodology within the realm of image classifcation.Due to its exceptional performance, CNN has been extensively utilized in the vast majority of breast cancer histopathological image classifcation research in recent years, as exemplifed by various studies [7,10,[30][31][32][33][34][35][36][37][38][39][40][41].Tese investigations predominantly employ common methods such as VGG [42], DenseNet [43], Xception [44], and ResNet50 [45].Among these studies, [7,32,38,41,46] simply employ transfer learning to optimize common models, and [34] specifcally fne-tuning the fnal two residual blocks after utilizing 2 International Journal of Intelligent Systems transfer learning.On this foundation, some research [10,33,39,40,46] has utilized ensemble learning methods, integrating the classifcation results of multiple models to enhance accuracy.In addition, attention mechanisms have been integrated in [10].Some studies have shifted their focus onto the design of loss functions.For instance, the authors in [31] considered the binary classifcation of benign and malignant, as well as various subclasses of multiclassifcation simultaneously, to reduce the likelihood of errors occurring in the benign and malignant categories.Also, the authors in [37] penalize overconfdent low-entropy output distributions and adapt the predictions to accommodate uniform distributions, rendering them more applicable to various circumstances.Other studies are more concentrated on clinical issues.Unlike previous research that used public datasets for classifcation [35], utilizing a dataset gathered independently and incorporating two clinically meaningful categories in recent years: progesterone receptor (PR) status (positive or negative) and HER2 Receptor status (positive or negative).However, the current breast cancer histopathological image classifcation research utilized is essentially based on methodologies proposed in the preceding years.In recent years, numerous scholars have persistently endeavored and conducted experiments to enhance the CNN.Consistently, advanced baseline networks with superior performance have been proposed.For example, [47,48] studied the use of large convolution kernels to expand the receptive feld and enhance the extraction of shape information.Te authors in [49] researched various data augmentation strategies, the authors in [50] studied various tricks of ResNet and ConvNeXt [51] modifed the network architecture of ResNeXt to make it more modern and improve model performance in classifcation, etc.Among them, ConvNeXt, which was publicly released in 2022, conducted detailed research and surpassed the previous frst-place Swin Transformer [52] method on the competition dataset.Tis paper uses some improvements proposed by ConvNeXt and uses the modifed ResNeXt as the baseline in our experiments.

Methods
Tis section is a detailed description of the proposed method.Te frst part introduces the wavelet denoising algorithm, and the second part introduces the network structure.

Wavelet Denoising.
As shown in the upper part of Figure 1, the wavelet denoising includes three steps.Te frst step is the discrete wavelet transform (DWT), which can decompose the original image containing noise into four coefcients: CA (approximation coefcient), CV (vertical detail coefcient), CH (horizontal detail coefcient), and CD (diagonal detail coefcient).
Te second step is to determine the threshold and then perform denoising on CH, CV, and CD.Since the threshold will afect the image input to the classifcation model, which in turn afects the model's output and loss function, the selection of the threshold is crucial.By linking the threshold with the loss function, the threshold can be optimized during the training process and gradually approach the threshold with good denoising efect.
Te third step is to perform the inverse discrete wavelet transform (IDWT) on CA and the denoised CV, CH, and CD to reconstruct the denoised breast cancer histopathological image.Te following is a detailed introduction to DWT, IDWT, and adaptive threshold denoising.

DWT and IDWT. For a one-dimensional input signal
x, DWT decomposes it into a set of approximation coefcients a j and detail coefcients d j , as shown in the following equations: where h and g are the low-pass and high-pass flters of the orthogonal wavelet, x j,n is a discrete sample in the input signal, j is the decomposition level, and n is the displacement.
DWT can be used for recursively decomposing the approximation coefcients of the previous level, and the level of the original signal x is 0, as shown in the following equations: IDWT can reconstruct the approximation and detail coefcients into the original signal, as shown in the following equation: where J is the maximum level, and n is the displacement.Processing a two-dimensional image is similar to processing a one-dimensional signal; that is, performing DWT and IDWT in the horizontal and vertical directions sequentially, as shown in the following equations: International Journal of Intelligent Systems where h and v are horizontal and vertical shifts, respectively.Since the common practice for wavelet denoising is to do DWT and IDWT only once, we focus on the CA, CV, CH, and CD coefcients at level j � 1.Higher level coefcients are not experimented in this paper.

Adaptive Treshold
Denoising.Firstly, we need to obtain an appropriate threshold through optimization, which is optimized as parameters through backpropagation.Specifcally, we initialize a four-dimensional tensor with only one element 1 and input it into a convolutional layer with input channel � 1, output channel � 9, kernel size � 1, stride � 1, and padding � 0. Te output channel is set to 9 because the breast histopathological images have three channels of R, G, and B, and each channel's image generates three detail coefcients of CH, CV, and CD after wavelet transform.Te nine elements outputted by the convolutional layer are used for denoising the nine coefcients.By this method, changes in the threshold will alter the quality of the image and afect the loss value, and then the threshold will be optimized through back-propagation.
Te method for optimizing parameters is shown as (11), where L is the loss function, ϵ t is the learning rate, and θ t is the parameter at the t-th iteration.Te learning rate is the most important optimization parameter, as a high learning rate can cause the model to skip the optimal parameters, while a low learning rate can lead to being trapped in local optima.During training, the threshold parameters and classifcation model parameters are optimized together.However, the range of threshold values (usually a few tens) is much larger than the range of parameter changes in the classifcation model.Terefore, we set a larger learning rate specifcally for the threshold.
After obtaining the threshold, we also need a threshold function to perform denoising.Experimental results have shown that threshold functions with a sudden jump from 0 to 1, such as the commonly used hard thresholding function sgn (), or similar functions that change from 0 to 1 within an extremely small range, perform poorly.In this paper, a sigmoid function is used to construct the threshold function, as shown in the following equation: where x original is the original detail coefcients, such as CH, CV, and CD, x denoised is the denoised detail coefcients, and threshold is the learned threshold.x original − abs(threshold) is used to compare the magnitude of the detail value with the threshold and determine whether the value should be fltered out.Since the optimized parameters may be positive or negative, the abs is used to ensure that the threshold is positive, making it easier to compare with x original .
After applying x original − abs(threshold), the values in x original that exceed the threshold become positive, while those below the threshold become negative.Before using sigmoid function, the resulting values are multiplied by 10.Multiplying by 10 allows the function's output to vary from 0.006 to 0.993 within the range of (−0.5 and 0.5) centered around the threshold, resulting in a steeper flter.Using a number that is too large, such as a few hundred, would create a waveform that is too steep and would seriously degrade the threshold's optimization efect.Conversely, using a number that is too small would fatten the waveform 4 International Journal of Intelligent Systems and reduce the flter's efectiveness.Experimental results have shown that 10 is a suitable value.After using sigmoid function, the output of the threshold function is close to 0 when x original − abs(threshold) is less than −0.5 and close to 1 when x original − abs(threshold) is greater than −0.5.Assuming threshold is 10, the efect of sigmoid((x original − abs(threshold)) * 10) is shown in Figure 2. Finally, the output of the threshold function is multiplied with x original , where the values in x original that are greater than abs(threshold) are mainly preserved, while the values that are less than abs(threshold) are mostly fltered out.

Classifcation Network. Te classifcation network
proposed in this paper is shown in the lower part of Figure 1, which has two improvements over the traditional ResNeXt: changing stage compute ratio and fewer activation functions.Tese improvements refer to the recent exhaustive and thorough fndings of ConNeXt.Experiments demonstrate that the improved and more modern ResNeXt performs well in the breast cancer histopathology image classifcation task in this paper.

Changing Stage Compute Ratio.
In the lower part of Figure 1, we employed the ResNeXt-50 structure, whose detailed confguration is presented in Table 1.Te "output" column indicates the output dimensions in terms of height (H) and width (W), while the "convolution layers" column describes the shape of the convolution.Te frst parameter of the convolution layer specifes the size of the convolution kernel, the second parameter denotes the number of channels, and "C � 32" indicates that the convolution is partitioned into 32 groups.Te number of residual blocks is specifed outside the parentheses.Based on ConvNeXt research, we changed the ratio from the conventional (3,4,6) to (3,9).

Fewer Activation Functions. Block designs for original
ResNeXt (left) and improved ResNeXt (right) we used are shown in Figure 3. Compared with the original ResNeXt, we eliminate the ReLU layer below the 1 × 1 conv.layer in the residual block.

Experiment
4.1.Datasets.Te BreaKHis and BACH breast cancer histopathological image databases are used in our study.Te BreaKHis dataset is the earliest public large-scale non-fullfeld breast cancer histopathological image dataset.Also, the BACH database is a representative multiclassifcation dataset.

BreaKHis.
BreaKHis was used as an eight-classifed dataset, containing four benign tumors: adenosis, fbroadenoma, phyllodes tumor, and tubular adenoma, and four malignant tumors: ductal carcinoma, lobular carcinoma, mucinous carcinoma, and papillary carcinoma.Te dataset includes 7909 images collected from 82 patients using four magnifcation factors (40x, 100x, 200x, and 400x), as shown in Figure 4. Te specifc image distribution is shown in Table 2.In this dataset, each image has three-channel RGB 8 bit depth in each channel with a resolution of 700x 460 pixels.

BACH.
BACH is a four-classifed dataset of 400 breast cancer histopathological images, distributed as follows: 100 normal tissue, 100 benign abnormality, 100 in situ carcinoma, and 100 invasive carcinoma.Tese images are RGB images with 2048 × 1536 pixels each and a pixel scale of 0.42 μm × 0.42 um, as shown in Figure 4.

Preprocess.
Te proposed method was evaluated on BreaKHis dataset and BACH dataset.To reduce the memory and computation overhead, the original images in the BreaKHis dataset were downsampled from 700 × 460 pixels to 256 × 256 pixels, and the original images in the BACH dataset were downsampled from 2048 × 1536 pixels to 512 × 512 pixels.
To reduce the overftting caused by the small number of images, we used the data augmentation methods of vertical and horizontal mirroring, random rotations, and random cropping.We also normalize the data.
To evaluate the proposed method, we used k-fold (k � 4) cross-validation and divided dataset into four folds, each fold containing 25% of the overall images.During the training process, three sets of data were used for training, while the remaining one set was used for validation.As the proposed method involves searching for suitable thresholds during the training process, the thresholds obtained in each experiment difered slightly.To facilitate comparison with existing threshold selection methods, we frst conducted one round of training and validation to obtain a set of thresholds.Tis set of thresholds was then used for 4-fold crossvalidation.As mentioned earlier, the parameters controlling the threshold require a higher learning rate than those in the classifcation model.For the convolutional layer controlling the threshold, the learning rate used for optimization ranges from a minimum of 2e − 3 to a maximum of 2e − 1.Also, for the convolutional layer in the classifcation network, the learning rate used for optimization ranges from a minimum of 5e − 7 to a maximum of 5e − 5. Te experiments were conducted in the PyTorch environment, using an NVIDIA Titan X GPU.

Evaluation.
In this paper, we use accuracy (ACC) to evaluate the performance of the classifcation models.As shown in (13), accuracy is the number of examples correctly predicted from the total number of examples.
where T p and T n are the true positive and true negative samples, respectively.F p and F n are the false positive and false negative samples.

Results
We conducted comparative experiments using two representative traditional threshold selection methods: Sqtwolog and Minimax, as shown in ( 14) and ( 15), respectively.Te Sqtwolog method achieves thorough denoising, but it is also prone to mistaking useful signals for noise and removing them.Te Minimax method is more conservative, and it performs better when noise is less distributed in the highfrequency range of the signal.
where N represents the signal length, which in a 2D image refers to the number of pixels.λ is multiplied by the estimated noise variance σ of the image to obtain the threshold used for denoising.Te estimated noise variance is shown in the following equation: In our experiments, we performed threshold denoising on each detail coefcient of the RGB channels separately.Due to the two datasets used and four magnifcation levels for each dataset, as well as three channels for each image, the number of thresholds used was too numerous to be displayed entirely.As an example, we show the thresholds selected by these two traditional methods and the thresholds  6 International Journal of Intelligent Systems selected through training by our adaptive method in Table 3, using the 40x images from the BreaKHis dataset.
To validate the efectiveness our method, we conducted experiments on two datasets: BreaKHis and BACH.Te experiments compared the results obtained by using the Sqtwolog method, Minimax method, and our adaptive method for denoising, as well as the results obtained without thresholding denoising processing.
Te results in Table 4 show that our adaptive threshold selection method combined with the improved ResNeXt model yields the best results across images at diferent magnifcations.Also, using traditional threshold selection methods, whether it is the larger Sqtwolog or the smaller Minimax, the results obtained are not only inferior to our method but also inferior to the results obtained without performing wavelet threshold denoising.Tis suggests that the thresholds selected by these methods based on pixel count and median may not be appropriate, resulting in degraded classifcation results.In contrast, our adaptive method integrates the threshold selection step into deep learning, experiments with a large number of thresholds during training, and continuously adjusts the threshold using the loss function and back-propagation, enabling the selection of more appropriate thresholds.We also compared the results of Inception-v4+SEP and DenseNet121 + SE methods from other papers.Te results of these two methods were inferior to those of our method.
In addition, the results of the original ResNeXt and the improved ResNeXt are compared, and it can be seen that the two improvements from the ConNeXt research slightly improved the classifcation results.We also used the improved ResNeXt as a baseline on the BACH dataset.
Our method not only selects appropriate thresholds but also does so efciently.Traditional methods often require trying multiple thresholds and selecting the appropriate one based on results.Due to the sensitivity of deep learning models to input changes, comparing the efect of diferent thresholds on classifcation results on a well-trained model is  Table 5 shows the results of diferent methods on the BACH dataset.Similar to the results on BreaKHis, our adaptive method performs the best, while the Sqtwolog and the Minimax methods still perform worse than no denoising.Additionally, the variance of the results using our adaptive method and the Minimax method is relatively large.Tis may be due to the fact that the BACH dataset has much less data than the BreaKHis dataset, resulting in more unstable results.We also compared the results of AHoNet and DenseNet121 + SE methods from other papers.Te results of these two methods were inferior to those of our method.

Analysis
To observe the changes of the adaptive wavelet threshold during training, we conducted an experiment using 40x images from the BreaKHis dataset.We recorded the threshold at the end of each epoch and plotted the threshold change curve in Figure 5, Figure 6, and Figure 7.As shown in Table 3, the threshold selected by the Minimax method were mostly around 20, with the maximum being 37.23.It can be seen that the range of adaptive wavelet threshold changes during training was approximately 0 to 35, which covered the values of the Minimax method.Tis indicates that our adaptive method can generate thresholds similar to those of the Minimax method during training and has tried to do so.Terefore, the reason why the threshold selected by our adaptive method is smaller than that of the Minimax method is that the threshold was optimized to a smaller value during training, rather than due to the setting of parameters related to the training that prevent the threshold close to those selected by the Minimax method.
Te range of thresholds selected by the Sqtwolog method is from 35.92 to 55.52, which is not too far from the range of threshold in this experiment.However, the accuracy obtained by this method is inferior to that of the Minimax method and the adaptive method.Terefore, we speculate that the larger threshold selected by the Sqtwolog method is less suitable, and the fact that the threshold selection by adaptive method did not reach 55. 52 or even larger values during training is the result of efective optimization.

Visualization
To verify the efectiveness of our adaptive threshold method, we compared the efects of thresholds selection diferent methods on images, as in Figure 8. Te selected images were from the validation set of the BreaKHis dataset with a magnifcation of 40x.Te frst row shows the original images.Te second, fourth, and sixth rows show the images after denoising using diferent thresholds.Te third, ffth, and seventh rows show the diferences between the denoised images and the original images.To facilitate observation, we normalized these diference images.

International Journal of Intelligent Systems
For the images in the frst and second columns, the diference images show that the threshold selected by the Sqtwolog method has removed many boundaries and texture information.Tis is also evident in the denoised images, where the boundaries of the tissue are noticeably blurred.Te threshold selected by the Minimax method also causes similar problems, but to a lesser extent.Tis is because the threshold selected by the Minimax method is smaller.Our adaptive method does not have this issue.Tere are no obvious boundaries or texture information in the diference images, and the denoised images still display this information clearly.Tis indicates that compared to traditional methods, our adaptive method can efectively preserve useful information.
Because in areas with a large number of cells, it is difcult to distinguish which information is noise.To observe the denoising efect of diferent threshold selection methods, we selected an image with large blank area as an example, as shown in the third column.After confrmation by a doctor, the light-colored spots in the red box in original image are noise.Te white spots in the red box in the diference images indicate that all three thresholds have a denoising efect on these noises.For further analysis, we examined the pixel values in the red box in the diference images.
Taking the maximum value of the pixels in the upper right red box of the diference image as an example, the maximum values corresponding to the Sqtwolog method, Minimax method, and our the adaptive method are 168, 190, and 204, respectively.Tis roughly indicates that the noise removed by adaptive threshold accounts for a higher percentage of all removed information.Tis validates the advantage of the adaptive method in Table 4.

Conclusion
In this study, we propose an adaptive threshold selection method that combines the optimization process of deep learning with wavelet denoising threshold selection.Tis method establishes a forward and backward propagation between the threshold and the classifcation loss, allowing for the optimization of the threshold during training.Compared to traditional thresholding methods, this method    International Journal of Intelligent Systems can efciently select appropriate thresholds that balance noise removal and information preservation, leading to higher accuracy of the classifcation model.
We conducted experiments on the BreaKHis dataset and the BACH dataset, and compared our adaptive method with representative traditional thresholding methods: the Sqtwolog method which tends to produce higher threshold, and the Minimax method which tends to produce lower threshold.Te results showed that our adaptive method outperformed the Sqtwolog and Minimax methods and also achieved improved accuracy compared to using the original images.
In addition, we speculate that our adaptive threshold selection method is not only applicable to breast cancer histopathological images but also to other images with noise that are difcult to determine threshold values due to unknown noise frequency or unclear boundary between noise frequency and useful information frequency.
Our research also encounters certain limitations and challenges.Tis study proposes a method to derive thresholds through training, yet the selection of wavelets is another formidable issue, demanding both expertise and experimentation.Te Haar wavelet and the db2 wavelet are commonly employed in the domain of wavelet denoising.Troughout our experiments, we not only delved into the application of the Haar wavelet but also explored the intricacies of the db2 wavelet.Unfortunately, the utilization of the db2 wavelet resulted in a deterioration of outcomes.Should it be possible to assimilate parameters associated with wavelet design into the model for training, perhaps the model could adeptly learn the optimal wavelet for a given image and task.Tis poses a formidable challenge for us and sets the course for future research endeavors.

Figure 7 :
Figure 7: Curve of CV coefcients in training.

Table 3 :
Tresholds selection by using diferent methods.

Table 5 :
Results in BACH dataset.