Automatic Histogram Specification for Glioma Grading Using Multicenter Data

Multicenter sharing is an effective method to increase the data size for glioma research, but the data inconsistency among different institutions hindered the efficiency. This paper proposes a histogram specification with automatic selection of reference frames for magnetic resonance images to alleviate this problem (HSASR). The selection of reference frames is automatically performed by an optimized grid search strategy with coarse and fine search. The search range is firstly narrowed by coarse search of intraglioma samples, and then the suitable reference frame in histogram is selected by fine search within the sample selected by coarse search. Validation experiments are conducted on two datasets GliomaHPPH2018 and BraTS2017 to perform glioma grading. The results demonstrate the high performance of the proposed method. On the mixed dataset, the average AUC, accuracy, sensitivity, and specificity are 0.9786, 94.13%, 94.64%, and 93.00%, respectively. It is about 15% higher on all indicators compared with those without HSASR and has a slight advantage over the result of a manually selected reference frame by radiologists. Results show that our methods can effectively alleviate multicenter data inconsistencies and lift the performance of the prediction model.


Introduction
Glioma is a prevalent fatal brain disease and the most malignant, which accounts for approximately 24.7% of all primary brain and other central nervous system tumors and 74.6% of malignant tumors [1]. The World Health Organization's guidelines for glioma diagnosis and treatment are divided into four levels, namely, I-II and III-IV for lowgrade glioma (LGG) and high-grade glioma (HGG) [2]. In clinical applications, biological behavior, treatment options, and prognoses of patients with glioma of different grades are clearly different. Therefore, the accurate preoperation grading of Glioma is important. Magnetic Resonance Imaging (MRI) is characterized by multidirectional tomography and multiparameter high-resolution soft-tissue imaging and is widely used to evaluate the tumor heterogeneity [3]. MRI is commonly used in glioma grading because it can accurately display the location and size, and it correlates well with histological characteristics. In medical research, high quality data is difficult to obtain in a single institution, so it needs to be shared through multicenter. However, the difference of multicentre data is a serious challenge.
In the acquisition of multicenter glioma MRI data, due to differences in acquisition equipment and parameters result significant differences in data samples in terms of specification, size, contrast, and brightness. Differences among the data lead to deviations in the grading effect of glioma [4]. Figure 1 shows the differences among multicenter data. Figure 1(a) shows that some of the data contain skulls and the others do not contain. Figure 1(b) shows that these data have different scales and number of slices. A previous study indicated that pixel size and slice thickness remarkably affect space and strength characteristics of the calculation [5,6]. Therefore, voxel size must be unified to reduce the error in calculating characteristics. Figure 1(c) presents six images with different contrasts. These differences are due to deviation in acquisition equipment and parameters.
In order to alleviate the inconsistency of contrast of multicenter data, it is necessary to enhance the image data appropriately. Histogram correction is a commonly used technique in image enhancement [7]. Among them, histogram equalization (HE) and normalization techniques [8,9] are often used to adjust the overall brightness of images. This type of methods mainly expands the range of gray value in the histogram without adjusting its intensity. These methods only enhance the overall image and improve its brightness, but the contrast between tumors and other tissues is not considerably enhanced [10,11]. Histogram specification (HS) can change the frequency of grayscale values and enhance any local brightness [12,13], but it mainly adjusts the local brightness according to the reference frame. In general, the reference frame is selected by a radiologist. However, this process consumes a considerable amount of time and the final reference frame may not be the best choice. Therefore, it is an urgent problem to automatically search for the best reference frame of histogram instead of searching by a radiologist. This paper presents a histogram specification with automatic selection of reference frames for magnetic resonance images, called HSASR, specifically used to replace radiologists manually selecting reference frames. This method can enhance the consistency of tumor intensity in the whole medical image. In this work, we apply HSASR to the preprocessing of multicenter glioma data, and the effectiveness of this method is proved by Glioma grading experiment. The results showed that this method had certain practical value in glioma grading research.
The main contributions of this study are as follows: (1) This paper proposes a histogram specification with automatic selection of reference frames (HSASR). This method can automatically select the suitable reference frame of histogram instead of radiologists during image enhancement, so as to enhance the consistency of brain tumors image contrast. (2) This paper proposes a set of image standardization algorithms to make the preprocessed multicenter data have better consistency, improve the adaptability of data, and improve the accuracy of the glioma prediction model.

Related Works
In recent years, researchers have gradually adopted multicenter data to replace single institution in clinical medical research. However, multicenter data also faced many challenges. Berenguer et al. [14] carried out a test-retest phantom study of individual image acquisition parameters. The impact of image parameters on the image cannot be analyzed because of the difference of the model or machine manufacturer. Therefore, variations because of the use of images were eliminated by exclusively scanning phantoms. In addition, Hugo et al. [15] indicated that multicenter structure MRI studies have stronger statistical efficacy than single-institution studies. However, central differences in contrast sensitivity and spatial uniformity lead to differences in tissue classification or image registration that may reduce or completely offset the enhanced statistical efficacy of multicenter data. Therefore, maintaining data in a standard environment is important. Nyul et al. [16] mentioned some problems in the original MRI scale preprocessing method and attempted to use the median value and other percentile numerical methods, such as landmark, to solve these problems; they obtained robust results after preprocessing. The validity of the standardized new landmark was also mentioned in the study, after which the image brightness level and contrast consistency were significantly improved. Bakas et al. [17,18] focused on the data preprocessing method, which has a certain enlightening effect on the preprocessing method of this study. In preprocessing, contrast enhancement of multicenter data image is an important challenge. It was shown that a high-contrast medical image could lead to a better interpretation of the different adjacent tissues in the imaged body part [19,20]. Accordingly, the resulting enhanced image, which is in terms of signal intensities of different tissues, can facilitate the automated segmentation, feature extraction, and classification of these tissues. Existing image enhancement techniques (empirical or heuristic) are remarkably related to a particular image and usually aimed at improving image contrast. However, no unified standard is available to measure the quality effect of image enhancement [21][22][23][24].
Image enhancement can be divided into two categories: spatial domain method and frequency domain method. Image enhancement in spatial domain usually corrects histogram. Histogram equalization is a commonly used global grayscale image enhancement technique, and the grayscale value is uniformly redistributed based on the cumulative density function of the histogram [25]. However, equalization fails to consider the intensity of the grayscale value. The average brightness of the image can make the regions with large original intensity fade, whereas dark areas will brighten after image equalization. Sengee et al. [26] suggested an extension method of BBHE based on the neighbourhood metric. This method involved a few steps: first, a large histogram was divided into the subregion using the neighbourhood metric and process independently. This method results in boundary noise and possibly uneven histogram brightness in two parts. In addition, MedGA [24] was an enhancement method that HE combined with genetic algorithm to directly improve the histogram frequency of images and has achieved obvious results. However, this method could only be limited to the presence of two grayscale regional tissues and not enhance complex brain tumors.

Data Collection.
Glioma data used in the study were obtained from two separate sources, BraTS2017 and Glio-maHPPH2018 datasets. The BraTS2017 dataset came from a variety of scanning instruments from 19 medical institutions. BraTS2017 dataset includes 210 HGG data and 75 LGG data; the data was already segmented when it was acquired. GliomaHPPH2018 dataset was obtained from multiple equipment of different models from multiple manufacturers, including 4 Siemens equipment (1 1.5t equipment, 3 3T equipment) and 4 GE equipment (2 1.5t equipment, 2 3T equipment), as shown in Table 1. The magnetic resonance data collected on these devices are different from each other. After data inspection, there are up to 186 default inspection protocols for the devices, and contrast-enhanced T1-weighted imaging (CET1) was used in this study, including 35 CET1 sequences (see Table 2), layer thickness of 5 mm to 6 LGG data. A threedimensional region of interest (ROI) for all the tumors in GliomaHPPH2018 was depicted by two senior radiologists in Henan provincial people's hospital. The ROIs were segmented slice-by-slice on the axial plane using the CET1 sequences. After discussion between the two radiologists, the final decision is made to select the optimal segmentation file.

Histogram Specification.
HS, an effective image enhancement technique, is an extension of histogram equalization that can effectively alleviate the problems of histogram equalization. Let r � r ij be an H × W discrete input digital image with L gray levels, and let L � {0, 1, . . ., L − 1}. The histogram or gray level probability density of an image is defined as follows: where N � H × W and N is the number of pixels with a gray level. The cumulative distribution function (CDF) of P r is presented as follows: Similarly, if the specified reference image is z, then the gray cumulative distribution function is where M and N are the number of pixels with gray level L. HS attempts to obtain transformation function y � F(x) and maps gray level x in the original image to gray level y, such that the transformed image can have a histogram similar to the reference histogram. To preserve the inherent information of the original image, function F should be a monotonically increasing function. This function can be obtained by using the following equation: Therefore, the gray level of the map can be obtained by using where v − 1 z is the inverse of v z . In the discrete case, the inverse function usually does not exist. The inverse function is usually replaced by the best objective function to approximate the y of a particular gray level x as follows: Equation (6) represents the absolute value of the difference between the original image's cumulative histogram and gray level functions of the specified reference histogram, and then the minimum value is selected as the y value. With this transformation rule, each gray level x can be mapped to y. Therefore, the mapped image will be similar to the desired histogram.
The reference frame in the histogram is usually selected by the doctor. However, the selected reference image may not be suitable, thereby consuming a considerable amount of the radiologist's time. To solve this problem, an optimized grid search strategy with coarse and fine search is proposed to select the suitable reference frame automatically.

HSASR.
In this study, an improved grid search method is proposed to solve the problem of selecting a specified reference image. The grid search is generally used to divide grids of the same length in a certain spatial range based on the proposed coordinate system. Each coordinate point represents a parameter, and these searched parameters are calculated and analysed based on the step size. Finally, optimal parameters are considered the output. Given that this method must traverse all the corresponding points in the grid, many unnecessary invalid calculations are generated, resulting in an exponential increase in time.
To reduce time consumption, this study improves the directional grid optimization search based on data characteristics. On the basis of the characteristics of glioma itself, the method is improved by using coarse segmentation and subdivision. Horizontal and vertical searches are used for coarse segmentation and subdivision, respectively. The specific steps of the improved method are as follows: (1) The number of clusters N and the number of slices K of each tumor are determined. Each cluster is set to i, and the initialization step size is K/2. A 2D grid is established by using N and K, and the grid nodes are the corresponding reference slices of N and K. (2) In the coarse search, each time it searched by step size from the beginning of each set (horizontally), the median slice in each cluster was selected as the reference frame. Then, 30% of data in all datasets are randomly selected for histogram specification (the same sample is selected each time), and then the performance of data after histogram specification is tested. (3) Select the sets of all the best area under the curve (AUC) values according to threshold T and perform the next fine search. (4) Fine search: in the optimal cluster, the search is initiated from the middle section (two pointers are set to point to the middle section), and pointer 1 searches upward successively. If reference frames do not contain tumors, then the search is stopped. If the difference between the current and previous values is greater than T, then the search is stopped, and the optimal value is selected. At the same time, similar to the previous step, pointer 2 is searched downward. (5) The optimal reference slice map is the final output. If multiple optimal values exist, then selection is made based on accuracy and specificity indicators. If  several optimal reference frames existed, then the first optimal reference frame is selected as the final reference frame.
Algorithm 1 describes the HSASR algorithm. The parameter T in Step 1 prevents the collection of additional optimal values from being missed. The T in Step 2 saves search time. Figure 2 presents the 2D diagram of the improved optimization grid search. In this study, N represents the number of data samples, K denotes the number of slices of each sample, and K/2 is the middle slice of the sample. First, a horizontal coarse search is performed. The pale blue circles indicate the performance values of intermediate slices in each cluster. The optimal clusters are selected for longitudinal fine search, the middle slice is considered the initial value, and both ends are searched simultaneously. The vertical circles and dark blue squares represent the performance values in the optimal set and the selected final reference frame, respectively. Figure 3 shows the overall process of multicenter data grading prediction. First, data are imported, data preprocessing is performed, features are extracted by using the feature calculation method, features are selected, and model training is finally conducted. Contribution points of this study are in the preprocessing stage. The HSASR method proposed is the most important part of preprocessing.
Furthermore, to unify the multicenter data in preprocessing, maintaining the consistency of brain tissue structure is also important. The GliomaHPPH2018 dataset in this study contained skull images, which are removed at the time of acquisition in the BraTS2017 dataset. This situation is not only a certain impact on HS but also introduced difficulties in combining the two datasets due to the gap between them. In this study, the FSL tool is firstly used in location registration, and then the skulls are removed based on the bet script. In this study, 238 data skulls are removed, and brain tissues and regions of interest are relatively intact.
In addition, spatial consistency of the data image is maintained. Multiple voxel ranges occur in both datasets. On the basis of Convert3D, we adopt the shortest distance interpolation resampling technology to carry out scale resampling in each data sample. The size of each data sample is unified to 240 × 240 × 155, and the label sample is also converted into a file of the same size. In addition, the distance between unified slices is 1 mm, and the original position was [0, − 239, 0].
To solve the problem of inconsistent brightness and contrast of multicenter data, this paper proposes a histogram specification with automatic selection of reference frames for magnetic resonance images. On the basis of the characteristics of glioma itself, this paper puts forward an optimized     and 43 texture features are calculated in the original data and eight wavelet decomposition spaces, respectively. For all image histological features, model training is conducted directly after the processing of missing values. The calculated features are imported into the model script of this study for prediction. The model script is divided into two parts, namely, feature selection and classifier. Five selection methods and nine classifiers are combined to set up the classifier, and the training and testing sets are imported into the model for prediction. For the designs of experiment 2 and 3 in the next section, 80% and 20% of data are selected from the data table as training and testing sets, respectively. The prediction is then repeated 10 times. The average value of final results is obtained, and the classifier and selection methods with the best results are finally selected with the rating device. The selection methods used in the study are SelectKBest (f_classif ) for classifying the tag features between tasks for ANOVA f values, principal component analysis (PCA), kernel principal component analysis (KPCA), independent component correlation algorithm (ICA), and factor analysis (FA). The classifiers used in this study are decision tree (DT), random forest (RF), bagging (BAG), binary search tree (BSA), naive Bayes (NB), multilayer perception (MLP), support vector machine (SVM), logistic (LR), and k-nearest neighbour (KNN). For the two separate datasets in experiment 1, the stable training model is selected by using a tenfold cross-validation scheme, and then the grading prediction is finally performed.

Experimental Comparison Designs.
To verify the effectiveness of processed multicenter data, this study proposed the following process designs: (1) Take the processed GliomaHPPH2018 and BraTS2017 datasets as the training and testing sets, respectively. Then, use the two unpreprocessed datasets as contrast experiments. (2) Mix together 521 data samples from Glio-maHPPH2018 and BraTS2017, preprocess the data, carry out feature selection and model training, and analyze the results. (3) Use single sequences from the processed and unprocessed GliomaHPPH2018 and BraTS2017 as comparison experiment. (4) Make comparison experiments between this research method and common image enhancement methods on the mixed dataset. Finally, compare these results with the untreated data. In addition, to verify the effectiveness of HSASR, compare the research results with the reference diagram results selected by radiologists.

Analysis of Image Enhancement.
The purpose of weakening the multicenter data is to relieve the differences between groups and retain individual characteristics. That is to say,  Note: GH is for GliomaHPPH2018, and BT is for BraTS2017. GH + BT represents 80% of the data as training and 20% as testing.

LGG
LGG True lable  under the premise of retaining tumor morphological characteristics, the image data of different medical institutions have similar contrast and brightness. In this study, contrast and brightness of multicenter glioma data images are significantly improved after image enhancement in preprocessing. Figure 4 shows the MRI results of the CET1 sequence in 12 patients. The left-hand side of Figure 4 presents that the first and second rows are LGG without HSASR, and the third and fourth rows are HGG without HSASR. All processed glioma data are shown on the right side of the figure. The examples show on the left and right sides had a one-to-one correspondence. From the perspective of image, the range of contrast and brightness of data without HSASR in Figure 4 are relatively complex and the tumor area is fuzzy and difficult to distinguish. The data after HSASR are consistent in contrast and brightness, the tumor area is significantly enhanced, and basic features (tumor morphology and lesion range) of the original image can be retained.

Performance of Glioma
Grading. This paper aims to reflect the effectiveness of these methods indirectly by using model prediction indicators. To validate the effect of the processed multicenter data further, this section conducts the verification of grading results based on the following indicators: AUC, accuracy (ACC), sensitivity (SEN), and specificity (SPE). Table 3 presents the results of Glio-maHPPH2018and BraTS2017 datasets as the training and testing set, respectively. The results show that the data with HSASR is better predicted, and the indexes are significantly improved compared with the unprocessed data, as shown in Figure 5. Figure 5(a) shows the ROC curve in which BraTS2017 is a training set and GliomaHPPH2018 is a testing set, and Figure 5(b) shows the ROC curve in which the testing set and the training set interchange. It can be observed from the figure that when testing set and training set swap, the better results can also be obtained. The results show that the data processed by the HRASR method have good adaptability and can alleviate the difference among multicenter data images. Experiment 2 in Table 3 presents the results of the mixed dataset. The calculated characteristics are imported into the proposed grading model for 10 iterations of random prediction, and the average value is subsequently obtained. Finally, the model with the best average performance is selected as the final model. The results show that AUC after processing generally increased by more than 15%, and other indicators also improve significantly. Experiment 3 in Table 3 shows that the data processed by HSASR on a single dataset also achieve a good grading effect.

Experiment 1 in
Experiment 4 in Table 4 shows the comparison experiment conducted by three different methods on the mixed   dataset. These methods include the HE method, reference frame method of manual selection by radiologists, and the method proposed in this paper. The results show that compared with the other two methods, the method proposed in this paper significantly improves the grading effect of glioma. The reference frame selected by radiologists based on experience shows significant improvement in glioma grading, but it can be seen from the results that the performance of the selected reference frame is worser than that of the automatically selected. Table 5 gives the performance comparison of similar work in the references using glioma dataset. In [27], after data preprocessing, they used a modified version of AlexNet for classifying MR brain images into three classes like healthy brain, LGG, and HGG. However, the amount of data of this work is so small that cannot verify the robustness of the proposed method. Reference [28] also involved data preprocessing, but our data is twice as large, and the result verified by the grading model is better than this method. Figure 6 is the confusion matrix corresponding to the final model of mixed glioma data in experiment 2, and LGG and HGG represent the low-level label and high-level label, respectively. Figure 6(a) shows the confusion matrix before processing, and Figure 6(b) shows the predicted data distribution after processing. Figure 6 shows a significant decrease in the number of glioma predicted incorrectly after preprocessing. Meanwhile, in order to intuitively evaluate the performance differences of the model before and after preprocessing, Figures 7 and 8 list the performance heat maps of grading prediction before and after preprocessing, respectively. The horizontal and vertical coordinates and corresponding results represent the classifier, feature selection or dimension reduction method, and the maximum average AUC value, respectively. These data indicate that the overall performance of the data after preprocessing is about 15% higher than that without HSASR processing. Figure 9 illustrates the experimental results of the grid search in this study. In this study, five reference slices with the highest performance indexes were selected from two datasets via rough search, and the reference slices with the best performance were screened out. The N axis, K, circle, and square denoted the number of samples, the number of slices in each sample set, the value of AUC of the reference image selected by the HSASR, and the predicted performance value of the reference image selected by the doctor, respectively. The obtained performance of the reference frame selected by the method is both low and high, indicating that the difference in reference objects had a significant impact on the result of grading. The proposed method can replace radiologists in choosing the best reference frame and save a considerable amount of time. Hence, the proposed method has certain application value in clinical trials.

Conclusions
Inconsistencies among data prevent multicenter data from playing to its shared advantage. This paper proposes a histogram specification method with automatic selection of reference frames for magnetic resonance images to alleviate the problem of contrast inconsistencies among multicenter data. The core of histogram specification is to change the local brightness of the image according to the reference frame, but the reference frame of traditional histogram specification is usually manually selected by radiologists. This method not only increases the workload of radiologists, but also cannot guarantee the optimal reference frame. In the method of this paper, the search range is firstly narrowed by coarse search intraglioma samples, then the suitable reference frame in histogram is selected by fine search within sample selected by coarse search. Finally, the effectiveness and feasibility of the proposed method are verified by a grading experiment based on two datasets. The results show that multicenter data processed by this method have good  Figure 9: Diagram of the proposed method (Note: the red circle and blue square represent the AUC of machine and manually selected reference images, respectively. The black square represents optimal reference frame).
adaptability, which improves the grading results and has certain practical value for clinical prediction.

Data Availability
The datasets used in this paper are public dataset (BraTS2017) and Henan Provincial People's Hospital (GliomaHPPH2018) dataset; BraTS2017 can be obtained through the following URL: https://www.med.upenn.edu/ sbia/brats2017/data.html.

Conflicts of Interest
The authors declare that they have no conflicts of interest.