Homogeneity Test and Correction of Daily Temperature and Precipitation Data ( 1978 – 2015 ) in North China

Homogeneity of climate data is the basis for quantitative assessment of climate change. By using the MASH method, this work examined and corrected the homogeneity of the daily data including average, minimum, and maximum temperature and precipitation during 1978–2015 from 404/397 national meteorological stations in North China. Based on the meteorological stationmetadata, the results are analyzed and the differences before and after homogenization are compared.)e results show that breakpoints are present pervasively in these temperature data. Most of them appeared after 2000. )e stations with a host of breakpoints are mainly located in Beijing, Tianjin, and Hebei Province, where meteorological stations are densely distributed.)e numbers of breakpoints in the daily precipitation series in North China during 1978–2015 also culminated in 2000.)e reason for these breakpoints, called inhomogeneity, may be the large-scale replacement of meteorological instruments after 2000. After correction by the MASH method, the annual average temperature and minimum temperature decrease by 0.04°C and 0.06°C, respectively, while the maximum temperature increases by 0.01°C. )e annual precipitation declines by 0.96mm. )e overall trends of temperature change before and after the correction are largely consistent, while the homogeneity of individual stations is significantly improved. Besides, due to the correction, the majority series of the precipitation are reduced and the correction amplitude is relatively large. During 1978–2015, the temperature in North China shows a rise trend, while the precipitation tends to decrease.


Introduction
North China is an important base of grain, cotton, and edible oil production, as well as a major wheat-growing region.e study of meteorological elements such as precipitation and temperature in North China is an important part of meteorological research, which is of great significance for agriculture.Such research requires high-quality meteorological data, so it is necessary to test and correct its homogeneity so as to be able to objectively reflect the true processes of climate change.
Methods commonly used ones for the work aforementioned include the SNHT (standard normal homogeneity test), Rhtests, and MASH method (multiple analysis of series for homogenization).
e SNHT was first proposed by Alexandersson [1] for homogeneity tests of precipitation data, which was improved later by [2].Further development to this method was made by Khaliq and Ouarda [3], in which the sample interval corresponding to the key T value was expanded from the original [10,250] to [10,50000].e SNHT is widely used in research of meteorological series homogeneity because of its simplicity and intuition [4][5][6][7][8][9][10][11][12].While it is widely applied to the meteorological data with normal distribution such as annual average temperature and annual precipitation, it is not suitable for the data with obvious differences such as seasonal or daily temperature data.
e need of homogeneity tests and correction to the meteorological data on month or day scales sparked the generation of the Rhtests and MASH methods.e Rhtests, established by the Climate Research Centre of the Environment Ministry of Canada, can test and correct the time series of meteorological data on year, month, and day scales.It is based on the penalized maximal T test (PMT) [13] and the penalized maximal F test (PMFT) [14].Generally, in order to improve the accuracy and rationality of Rhtests, it is necessary to further confirm the detected breakpoints in combination with the metadata [15].us, for day-scale time series, it takes a lot of labor and time to compare the Rhtests results with metadata station by station.
e MASH procedure was developed by Szentimrey [16,17] of the Hungarian Meteorological Service, which is one of the methods recommended by "COST (European Cooperation in Science and Technology) Action ES0601: Advances in homogenization methods of climate series: an integrated approach (HOME)" for testing the homogeneity of meteorological data series [18]. is method does not assume that the reference series is homogeneous and fixed.Comparing the series between stations in the same climatic region permits to determine the possible breakpoints and correct the meteorological data of year, month, and day scales.For different meteorological elements, additive (e.g., temperature) or multiplicative (e.g., precipitation) models can be applied, and both can also be converted mutually by taking logarithms.As the MASH method is concerned with relative homogeneity, it does not depend on metadata.Previous study [19] shows that the metadata does not have a large effect on the results of MASH method tests, while the choice of reference series does.e relative homogeneity methods also have some drawbacks [20].
e assumption that climatic patterns are identical within a geographical region does not well stand because different terrains can lead to a local microclimate, especially when there are mountains and plains.e representative studies in this aspect include the following examples.Li and Yan [19] applied the MASH method to test the homogeneity of daily temperature data from 1960 to 2006 in Beijing.Li et al. [21] used the MASH method to test the homogeneity of daily average, maximum, and minimum temperature of 545 stations in China from 1960 to 2011.Lakatos et al. [22] applied this method to the meteorological data in the Carpathian region.
In this study, the MASH method is used to test the homogeneity of the temperature and precipitation data in North China.Compared with previous studies, first, we used a different station selection.In previous studies [19,23], in order to ensure the consistent length of the meteorological series participating in the homogenization test, the stations with shorter series were eliminated, resulting in sparse stations participating in the test.However, the MASH method is concerned with relative homogeneity, which relies on mutual comparison between stations.In order to balance the number and distribution density of the participating stations, and considering the length of the series, the starting time point of the series was set in 1978.Secondly, this study also tried the homogenization test on the precipitation data.In order to allow more stations to participate in the homogenization test process, the temperature data of stations with insufficient sequence length were added in turn.Last, the study area of previous work was either Beijing or China [19,23].e area selected in this study was North China.
Homogenization of spatially and temporally discontinuous daily precipitation data has always been a difficult issue [24,25].Previous studies on the homogenization of precipitation data were based on continuous monthly precipitation or annual precipitation data [1,5,6,7,12].e MASH method uses a multiplicative model for precipitation data and then applies logarithmization to convert the raw data into nonzero data, converting discretely distributed raw data into continuously distributed data.Compared with the SNHT method, the MASH method is more reasonable in the selection of reference stations, and the practice of selecting reference stations in turn is more rigorous and scientific.e MASH procedure has a rigorous mathematic principle and perfect system.It also has some innovations in the selection of reference series, thus being able to check and correct the homogeneity of a large number of time series in a relatively short time.
erefore, this work employs this method to test the homogeneity of the temperature and precipitation data in North China, which were recorded at 404/397 meteorological stations during 1978-2015 and correct the inhomogeneity when detected.

Data and Methodology
2.1.Data. Figure 1 shows the area of North China and locations of weather stations.e data used in this paper include (1) latitudes, longitudes, and elevations of 404 meteorological stations in North China; (2) daily average temperature, maximum temperature, and minimum temperature and daily precipitation of these stations from 1978 to 2015, which are from China National Meteorological Information Center; and (3) metadata of these stations.

Methodology.
is work uses the MASH method to homogenize the aforementioned data, i.e., examining the inhomogeneities in the time series of data first and then correcting them.As a relative method, the MASH procedure establishes the optimal difference series of the reference series and the candidate series.By point estimate and confidence interval, the breakpoints and the shifts of the difference series can be detected and estimated.en, the breakpoints and the shifts of the candidate series are obtained.Finally, based on the above results, the candidate series are corrected.Since the composition of the difference series is not fixed, which is calculated by each station in turn, the entire calculation process will iterate many times to find the most probable breakpoints and shifts of the candidate series.is process can also be achieved by using the maximum likelihood method.
e procedures are presented briefly below.
A single candidate series can be expressed as where X j (t) is the candidate series, μ(t) is the unknown climate change signal (temporal trend), E j is the spatial 2 Advances in Meteorology expected values (spatial trend), IH j (t) is the inhomogeneity signals, and ε j (t) is the normal white noise series.en, the reference series can be expressed by X i (t), i ∈ N −j .us, the di erence series of the candidate series and the reference series is ( Substituting Equation (1) into (2) yields where λ ji is the weighting factor, i∈J λ ji 1.By establishing the di erence series, the common climate change signal μ(t) is ltered out.In order to ensure the maximum correction e ect, the inhomogeneity signals are removed by letting the following item in Equation (3): To realize Equation (4), the weighting factor λ ji (i ∈ J ∈ N −j ) is written in the vector form as where c j,J is the candidate-reference covariance vector, C J,J is the reference-reference covariance matrix [16,26], and the covariance matrix C determines the optimal weighting factors that minimize the variance.e optimal di erence series thus obtained can be used for detection and correction of inhomogeneities e ciently.For the candidate series X j (t), there is a total of 2 N−1 − 1 di erence series with optimal weight by changing the composition of the reference

Advances in Meteorology
series X i (t) (i ∈ J ∈ N −j ).rough iteration in the MASH procedure, applying point estimates and confidence intervals, breakpoints and shifts in these difference series can be detected, which further can be attained in candidate series.By these processing, the candidate series are corrected [27].

Data Preprocessing.
e sorting and statistics of the original meteorological data in North China show that stations with complete meteorological data after 1978 account for 99.02% of the total.In order to ensure the accuracy of the correction and maximize the number of series involved in the correction, the meteorological data of North China from 1978 to 2015 are selected.Of it, there are still more than one month data missing at 18 stations for daily precipitation, which can be interpolated in the MASH procedure [16].
First, the data are transformed into a software-readable format.MASH method does not assume that the reference series is homogeneous, since some researchers argued that the assumption that the created reference series is homogeneous is false and without theoretical basis [27].e reference series is not fixed either.rough the comparison of stations in the same climatic region, possible breakpoints can be found and corrected.Applying the additive model to correct the temperature data, through multiple iterative calculations, finally we get the homogeneity temperature data.For precipitation data, we use the multiplicative model.Since the daily precipitation data are discontinuous with values of 0, in order to avoid its influence on the calculation process, the multiplicative model is used to process the data into nonzero data.
e length of each series is kept consistent during the calculation.A total of 397 meteorological stations have reached the required length, and the remaining 7 series with insufficient length are added in turn.Each time a series is added and a new calculation is carried out.e closer to the year 2016, the more the stations involved.However, due to too much missing data, the precipitation series cannot be cyclically added, so only 397 series are finally involved in the calculation.

Homogenization of Temperature Series.
Because of the loop join, each new series is added at a different length of time, resulting in a new breakpoints file.Taking the initial 397 stations and the average temperature series from 1978 to 2015 as an example, under the confidence level of 0.01, about 98% of the series are detected a varying number of breakpoints, of which over 75% of the series contains 4 to 11 breakpoints.As the number of series involved in the calculation increases, the number and frequency of breakpoints also change (Figures 2(a)-2(c)).e years with breakpoints detected in each series during each calculation are accumulated, and the repeated years are removed to obtain the total years with breakpoints.Consequently, a total of 5,485 breakpoints are present in the daily average temperature series, 5833 breakpoints in the daily minimum temperature series, and 4,509 breakpoints in the daily maximum temperature series, respectively.e frequency of breakpoints in the daily average temperature series is in the range of [12,16], while in the daily minimum temperature series, it is in the range of [10,20] and in the maximum temperature series, it is in the range of [7,15].It can be seen from Figure 3(a) that most of the breakpoints are after 2000, especially culminating from 2006 to 2007, which are consistent with the minimum temperature series, the maximum temperature series, and the average temperature series.e statistics of the total breakpoints also shows that the frequency of breakpoints is relatively high after 2000.From the spatial distribution of breakpoints at 404 meteorological stations, the Beijing-Tianjin-Hebei region with dense meteorological stations has more breakpoints and more stations with breakpoints (Figure 4).
Breakpoints are the highest in the minimum temperature series and the lowest in the maximum temperature series, respectively.e maximum temperature generally occurs around noon, when the solar radiation makes the ground warming fast, resulting in a high vertical decline rate of temperature and poor atmospheric stability; thus, the various scales of the atmosphere are fully mixed, which at last leads to a small temperature difference among stations.By contrast, the minimum temperature generally appears before or after the sunrise, when the atmospheric stability is higher, the temperature mixing between different stations is not sufficient, resulting in large temperature differences among stations.Inferring from the calculation principle of relative homogeneity methods, these differences are difficult to be considered signals of climate change and are more likely to be identified as inhomogeneous signals through comparisons of stations.is may be the reason that the breakpoints are the most in the minimum temperature series and the minimum in the maximum temperature series.However, this inference still needs further research to prove.
e metadata show that after 2000, more than 40% of the meteorological stations in North China have been moved.And around 2000 is also the time when China Meteorological Administration implemented the construction of automatic meteorological stations to replace the original manual measurement acquisition [28].At that time, temperature monitoring throughout the country was gradually changed from the original manual observation of glass liquid thermometers to platinum resistance temperature sensors.Previous study generally agrees that the discrepancies caused by this large-scale instrument change are within the allowable range of automatic station error [29].However, it inevitably resulted in some degree of inhomogeneity in observational data [23].Since 2003, the vast majority of meteorological stations gradually began to move from twotrack observation of artificial stations and automatic meteorological stations to single-track observation of automatic stations.Most of the meteorological stations in 2006 replaced the manual observation data with automatic meteorological station data to prepare reports.Large-scale replacement of instruments and the change of the observation mode could lead to the significant increase of  With the strict algorithm in the MASH procedure, no breakpoint has been detected in the daily average temperature series of Xinbarag Left Banner (50618) station in Inner Mongolia.Probably because few stations are around this station, and thus, the reference series has been assigned a low weight.Correcting the inhomogeneous series of all other stations in North China, the overall annual average temperature drops by 0.04 °C, which shows an upward trend from 2012 (Figure 5(a)).All the daily minimum temperature series have been detected breakpoints.e average annual minimum temperature decreases by 0.06 °C after the correction, in which the correction amplitude in 1992 is relatively larger, showing an upward trend from 2012 (Figure 5(b)).No breakpoint has been detected in the daily maximum temperature series of Xinbarag Left Banner (50618) station either.After correcting the inhomogeneous series of this parameter, its average increases by 0.01 °C, also with a large correction amplitude in 1992 and an upward trend since 2012 (Figure 5), which is consistent with the work of Zhang [30].As shown in Figures 2 and 3, the number of breakpoints detected before and after 1992 is not many.From the meteorological station metadata, few stations were moved and no largescale instrument replacement was done in 1992.e signi cant increase of the correction amplitude for minimum and maximum temperature series in 1992 is an issue that needs further study.
From the spatial distribution of di erences before and after the correction, the average temperature and the minimum temperature series have much larger correction amplitude than the maximum temperature (Figure 6).Meteorological stations located in the Taihang Mountains, Yinshan Mountains, and the junction area between Hebei Province and Shandong Province have relatively larger correction amplitude.After the adjustment, many series of annual average and minimum temperature drop.Among them, 222 and 182 average temperature series have negative and positive di erences, respectively.For the minimum temperature, 228 series have negative di erences and 176 have positive di erences, respectively.For the maximum temperature series, 209 have negative di erences and 195 have positive di erences, respectively.Although many temperature series have lower values after correction, the series with positive di erences are assigned large correction amplitude.As a result, the overall trends of temperature changes before and after correction are largely the same, implying that the inhomogeneity of data from individual stations has been signi cantly improved.
After adjustment, the linear trends of 253 series rise and 151 series decline in the average temperature series, 8 Advances in Meteorology respectively, while in the minimum temperature series, the linear trends of 240 series rise and 164 series drop, respectively, and in the maximum temperature series, the linear trends of 218 series rise and 186 series decline, respectively.e linear trends of the average and maximum temperature series after adjustment are positive except for the shorter series of 54519 (with data beginning on January 1, 1998) and 50924 (with data beginning on January 1, 2006).e 16 negative series with downward trends in the minimum temperature series have been adjusted to positive values with upward trend.e overall trends of the adjusted temperature series are upward.e adjusted series are apparently more in line with the trend of climate warming.At the same time, it proves that the MASH method performs well at middle-high latitudes.
e spatial variations of the trends before and after adjustment are shown by using the inverse distance weighting method to perform spatial interpolation (Figure 7).Here, we take the Wutai Mountains station and Huairou station with the average temperature series as examples.
e Wutai Mountains station (53588) was built in October 1955 at the top of Zhongtai of the Wutai Mountains (39 °02′N and 113 °32′7″E) with an elevation of 2895.8 m.On January 1, 1998, this station was moved to the Muyu mountain of the Wutai Mountains (38 °57′7″N, 113 °31′7″E), with an elevation 2208.3 m. e relocation of this station drops nearly 687.5 m in altitude and is 20 km away from the old site, causing obvious rise of temperature since 1998.After adjustment with the MASH method, the upward trend of the temperature of the Wutai Mountains station dropped significantly.e Huairou station (54419) was relocated in 1996 from 40 °18′N and 116 °37′E to 40 °22′N and 116 °38′E with horizontal distance about 7.5 km, while its elevation changed from 63.1 m to 75.7 m. e relocation led to the linear trend of the annual average temperature to drop from 0.046 °C/a before 1996 to −0.000097 °C/a.After adjustment with the MASH method, the trend is 0.034 °C/a.
ese data variations show that inhomogeneity of the two series is well improved.

Homogenization of Precipitation Series.
Homogeneity tests show that 135 of the all 397 precipitation series have breakpoints, majority of which have 1 to 2 breakpoints each (Figure 8).e years with breakpoints are centered around 2000 (Figure 9).Since 2000, the automatic meteorological observation system has been gradually put into operation at the nationwide meteorological stations.
e precipitation observation has accordingly been gradually shifted from the artificial way to the automatic approach.Such an instrument  Advances in Meteorology change led to the inhomogeneous feature in daily precipitation series.Compared with temperature data, precipitation data have lager spatial and temporal variations, especially remarkable spatial discontinuity; thus, the detected breakpoints of temporal precipitation series are relatively less, which makes the homogeneity test di cult to obtain satis ed results.With the improvement of test methods, perhaps more breakpoints could be detected in the precipitation series.Previous study shows that it is di cult to reveal the implicit impact on precipitation data after a typical horizontal short-distance relocation [31].From the spatial distribution of the breakpoints (Figure 10), the detected breakpoints are more present in the areas with densely distributed meteorological stations, which is similar to the temperature data.
e corrected annual precipitation is 0.96 mm less than before.
e correction amplitude is the largest for 1998, reaching −3 mm (Figure 11).By referring to the  Advances in Meteorology meteorological station metadata, it is found that the stations that were moved around 1998 are few, and large-scale instrument replacement is also less then.Such uctuations in the correction amplitude still need further research.
From the spatial distribution of the di erences of annual precipitation before and after adjustment, the series with large correction amplitude are mostly distributed in the southern Hebei Province and Taihang Mountains (Figure 12).e values of 339 precipitation series decrease after the correction and 58 series increase after the correction.e corrected series with lower precipitation have relatively larger correction amplitude, which is consistent with the trend of decreasing precipitation in North China in the past 50 years [32].14

Advances in Meteorology
Of the adjusted annual precipitation series, the linear trends of 187 series rise and 217 ones decline.After the adjustment, the linear trends of 204 series are negative, compared with 201 before adjustment.e overall trends before and after the adjustment are negative with −0.15 and −0.16, respectively, which shows that the precipitation in North China was gradually decreasing.Compared with the temperature series, although the precipitation series are detected a smaller number of breakpoints, the correction amplitude is much larger.It can be seen from Figure 13 that the areas with large di erences before and after the correction are mainly distributed in the vicinity of the Taihang Mountains, Yanshan Mountains, and Yinshan Mountains.

Discussion
In Section 3.1, we nd that the breakpoints in the minimum temperature series are highest, next is in the average temperature series, and last is in the maximum temperature series.However, in actual observations, the observation method and instruments of the three are uni ed.In fact, in China, the daily maximum temperature and daily minimum temperature data are derived from minute observation data, while the daily average temperature data are the average of four hourly (02, 08, 14, and 20) observations.Before the emerging of the minute data, the daily maximum temperature and daily minimum temperature data are derived from the arti cially observed maximum temperature thermometer and minimum temperature thermometer.erefore, in theory, if there are inhomogeneities caused by nonclimate factors, the number of breakpoints in the maximum temperature, average temperature, and minimum temperature data should be consistent.However, majority of existing studies, whether using SNHT or Rhtest methods, have drawn similar conclusions that breakpoints present in the maximum temperature data are less than in the minimum temperature data on di erent time scales [30,33].Li and Deng [34] hold that the reason for this phenomenon may be station migration.Because the minimum temperature is more sensitive to changes in distance.Based on the data extracted from the US Climate Reference Network (USCRN), they found that the minimum temperature variations between stations at short distances are much larger than the maximum temperature variations.
In this study, we calculate the breakpoints in the minimum temperature series and in the maximum temperature series.Combined with the station metadata, in the 244 stations with more breakpoints in the minimum temperature series than in the maximum temperature series, only 54 stations have the years of station migrations coinciding with the years before and after breakpoints, accounting for 22.1%.
at is, to say, the change in distance caused by the station migrations may not be the reason why breakpoints in the minimum temperature series are more than in the maximum temperature series.
Combined with the principle of the homogenization algorithm, we infer that under the same calculation conditions, the minimum temperature series with variation Advances in Meteorology and regional differences, whether using the absolute homogenization method or the relative homogenization method, are likely to distinguish some normal values as breakpoints mistakenly.e minimum temperature is more sensitive to distance, and another consequence may be that among the reference series and the candidate series, the differences between the two kinds of series caused by distance are more likely to be detected as breakpoints by mistakes.
Based on the inference above, it is necessary to improve the homogenization algorithm according to the characteristics of different meteorological elements.

Conclusions
In this paper, the MASH method is used to test and correct the inhomogeneity of the daily average temperature, maximum temperature, and minimum temperature and daily precipitation series of 404/397 national meteorological stations in North China.Combined with the meteorological station metadata, the results are analyzed and the overall differences before and after the correction are compared: (1) After 2000, the change of the artificial methods into automatic measurement, and the replacement of the instruments caused the inhomogeneity of the temperature and precipitation series.After 2006, report preparation by hand at most of the meteorological stations was replaced by the automatic manner, making the inhomogeneity of the temperature series more prominent.(2) e overall annual average temperature after the correction decreases by 0.04 °C; the annual minimum temperature decreases by 0.06 °C and the annual maximum temperature increases by 0.01 °C.e corrected temperature series with lower temperature are more than before, while the corrected temperature series with higher temperature have lager variation amplitude.
e temperature in North China during 1978-2015 shows an upward trend.
e trend of overall corrected temperature change is largely the same as before, though the homogeneity of the series of individual stations has been significantly improved.
(3) e MASH method-corrected annual precipitation is 0.96 mm less than before.After the correction, most of the precipitation series have less precipitation and relatively larger correction amplitude.Precipitation in North China tends to decrease gradually from 1978 to 2015.(4) Compared with the original series, the MASH method-corrected annual minimum temperature and annual maximum temperature have abnormal increases in 1992, while the MASH methodcorrected annual precipitation has unusually large change in 1998.e causes for these anomalies need further study.

Data Availability
e data used in this article include (1) latitudes, longitudes, and elevations of 404 meteorological stations in North China; (2) daily average temperature, maximum temperature, and minimum temperature and daily precipitation of these stations from 1978 to 2015, which are from China National Meteorological Information Center; and (3) metadata of these stations.
ese data used to support the findings of this study could be downloaded from the website: data.cma.cn.

Figure 1 :
Figure 1: Locations of the weather stations and North China.

Figure 3 :Figure 4 :
Figure 3: Annual breakpoint numbers in the average (a), minimum (b), and maximum (c) temperature series of 397∼404 meteorological stations in North China.

Figure 4 :
Figure 4: Spatial distribution of breakpoints in (a) the average temperature series (T ave ), (b) the minimum temperature series (T min ), and (c) the maximum temperature series (T max ) of 404 meteorological stations in North China.

Figure 5 :Figure 6
Figure 5: Differences of average annual temperature (a), average annual minimum temperature (b), and average annual maximum temperature (c) before and after correction.

Figure 6 :Figure 7 :
Figure 6: Spatial distribution of di erences before and after correction of (a) annual average temperature series (T ave ), (b) annual minimum temperature series (T min ), and (c) annual maximum temperature series (T max ) from 404 meteorological stations in North China.

Figure 7 :Figure 8 :
Figure 7: Linear trend changes ( °C/a) before and after the adjustment of the average, minimum, and maximum temperature of 404 stations in North China.

Figure 9 :Figure 10 :
Figure 9: Annual breakpoint numbers in the daily precipitation series of 397 meteorological stations in North China.

Figure 11 :
Figure 11: Di erences of annual precipitation before and after data correction.

Figure 13 :
Figure 13: Linear trend changes (mm/a) before and after adjustment of annual precipitation series of 397 stations in North China.