About the Relation between Sunshine Duration and Cloudiness on the Basis of Data from Hamburg

The aim of this paper is to relate the two meteorological parameters known as relative (bright) sunshine duration and cloudiness using the data from two stations of the city ofHamburg, Germany.We test the classic linear relationship, as well as newer polynomial extensions suggested in the literature.The results of regression are interpreted against a theoretical background recently put forward by Badescu. The suggested relations can be borne out, but we also point out difficulties due to data quality and insufficiency.


Introduction
Knowledge of the relationship between sunshine duration and cloudiness is very important for the practical forecast of insolation [1]. In solar energy technology there are key meteorological parameters on both short and longer time scales. Datasets with sunshine duration are widespread and easily available for many parts of the world and long time intervals. In contrast, cloudiness has been slow in being estimated or measured reliably. For use in solar energy technologies, cloudiness is fundamental [1] because the insolation reaching the ground depends on the amount and type of the clouds blocking the direct solar radiation [2]. Due to the fact that cloudiness is much more difficult to measure accurately, it has since long been found necessary to look for a relationship between the widely available sunshine duration measurements and the formerly rather elusive estimation of cloudiness.
This short work draws on data from one single German city, Hamburg. We shall use as our data source only two stations located in this city. The first one is a transmitter mast of the "Norddeutscher Rundfunk" (Northern German Broadcasting Corporation, NDR), while the second is Hamburg's airport. At first, the data from these locations are evaluated separately and then they are compared with each other. We expect that both stations should deliver similar relations as Hamburg is a flat and rather homogeneous city.
Correlations of both quantities from many cities have been used in proposing a relationship between them. In recent times, the level of sophistication has been raised by Badescu, who carried out many case studies for Romania, from at least 1990 on. He summarized the findings of his collaborators in a book he edited recently [3]. The general conclusion is that one may use a linear relationship between cloudiness and cloud shade for Romania. But he also has summarized a probabilistic theory that can lead to more general relationships, which had been explored or proposed earlier but on uncertain grounds. This theory of Badescu's will serve as the backdrop for the present work.
There are many studies which do not connect the sunshine duration and cloudiness directly but instead infer the global insolation from observations of sunshine data (e.g., Aksoy et al. [1], Suercke [4]). For such cases where only one or the other quantity is measured, it is an advantage to have previous information on the correlation between the two parameters, especially for cities where the data either of sunshine or of cloudiness are lacking. Then the missing data can be calculated from measurements of the other parameter.
We shall seek to confirm the simplest of possible relations between sunshine and cloudiness, which is the linear one. It will be seen that confirmation of such a linear relation, as it had almost been assumed, may not be sufficiently accurate for certain purposes or time scales. We shall explain deviations by introducing an irradiance threshold value. An important 2 Journal of Solar Energy advantage of our study, in contrast to other publications dealing with the same problem regarding Hamburg, is the relatively long data set, beginning in 1996 and ending in 2011. The fact that we use the data from two similar stations may be useful in validation studies of the results of a single station. The issue of representativeness can thus be assessed, at least for Hamburg.
The structure of the paper is as follows. In Section 2 the data and the methods that have been used for this work are discussed. Section 3 then presents the results and their interpretation, while Section 4 compares the results for the two stations. The paper ends with Section 5, the conclusion, and an outlook or rather a view to one or two possibilities of improving the present study.

Data and Methods
The two stations are a transmitter mast of the NDR and the weather station at Hamburg airport. The distance between them is about 14.3 km. We treat the stations separately at first.

NDR Transmitter Mast.
The transmitter mast is about 300 m high. It is located in Billwerder, Hamburg, on plain grassland. It has been in use for weather measurements by the meteorological institute of the University of Hamburg since 1967. In the following, it will simply be referred to as the "weather mast. " The geographical coordinates are 53 ∘ 31 0.9 N and 10 ∘ 06 10.3 E. The mast is just about 30 cm above sea level, so the instruments over ground are approximately at sea level. There are some agricultural districts in the surrounding and flat, cultivated land. It is a good location for the purposes of the present work, as well as many other measurements [5]. Sunshine duration is recorded with a pyranometer, which measures the global incoming shortwave radiation from the upper half-space. Taking into account the World Meteorological Organization's sunshine criterion that the sun is effectively shining only if the direct solar radiation exceeds a threshold of 120 Wm −2 [6], one can determine the sunshine duration from the measured record.
Cloudiness is measured at the weather mast by means of a ceilometer. It is placed at the ground and is of the type Vaisala CT25K, which measures the cloud base in steps of 30 m up to a level of 7500 m. From the backscattering profile of a laser beam and by means of an algorithm, it is possible to determine cloudiness of up to four overlying cloud basements. This instrument has been working at the weather mast since 2003.

Hamburg
Airport. The airport has a geographical position of 53 ∘ 38 7.95 N, 9 ∘ 59 40.92 and it is obviously level, but the area is surrounded by trees. For the measurement of the sunshine duration, the airport has two different measuring devices. One is the classical Campbell-Stokes sunshine recorder, a polished glass sphere focusing the incoming sunbeam on a card with the day's hours printed on it, thereby burning a trace on it, the length of which is proportional to a day's sunshine. The other device is a SONI sunshine recorder, which is based on an electronic method. It is the source of the data used.
Cloudiness at the airport is measured in the old-fashioned way by a human observer, whose subjective assessment is subject to greater or lesser uncertainty.

The Database.
For this short account, a time period of 16 years was used, from 1 January 1996 (midnight) to 31 December 2011 (23:59 o' clock). An exception is the time series of the cloudiness from the weather mast, which starts only in November 2003. To fill in the missing years, a statistical AR(1) process was envisaged to extend the series back into the past. But this part of the investigation will not be included in this study. Fortunately, the cloudiness time series of the airport is complete within the whole time period selected.
Overall, four time series were used for this paper, one time series of sunshine duration and one of cloudiness from each location. Since the data of the airport were available only as daily means, the data from the weather mast were averaged over every day, from values recorded every single minute. There is a little data gap within the year 2005 of the time series for the sunshine duration of the weather mast. This is due to problems with the instrument [5].

Theoretical Background.
With our purpose being only to report new results for Hamburg of the correlation between sunshine and cloudiness, we shall not go into details of the theory beyond what is needed to interpret those results. We will follow Badescu [3], who introduced the sunshine number. Sunshine number is a random Boolean variable in time, defined as follows: where is any instant of daylight duration. Because the dynamics of cloud cover is too complicated, or even unknown, we may regard ( ) as a random variable. Choosing an (arbitrary) interval of time Δ centered on , we define the probability of the sun being covered by clouds during that interval as ( = 0; , Δ ). Being a Bernoulli-distributed variable, the complementary probability of the sun not being covered with clouds will be written as ( = 1; , Δ ) = 1 − ( = 0; , Δ ). Now we introduce briefly two independent measures for these probabilities. If we denote by ( , Δ ) the sum of those time units corresponding to sunshine during the time period Δ , centered on , then we may define the probability of sunshine during that period as the usual relative sunshine: A probability measure for ( = 0; , Δ ) is more difficult to define theoretically. Badescu [7], following earlier work on geometric probability, as developed by Santaló (cf., e.g., [8]), derived the following expression for that probability: Journal of Solar Energy 3 with the total cloud cover amount being defined as Here, and are, respectively, the areas of the sun's disk and clouds in the celestial vault projected onto the plane tangent to the Earth's surface at the point of observation. and are the perimeters bounding those areas. 0 is the total area of this plane, limited by the horizon of perimeter 0 , and 0 is the area of intersection of 0 and . It can be viewed as a random quantity. For further details, we must refer the reader to the works cited.
The probability (3) may be written as where, for = 0, while for → 1 it should hold that = 0. This is indeed the case if we agree that = 0 for overcast skies. For ̸ = 0, we see that where may be taken to depend on . Note that ≪ and ≪ 2 , so that as a good approximation we may write ( = 0; , Δ ) = 1 − ( , Δ ) ≈ ( , Δ ) .
This is nothing else than the classic assumption that cloudiness and sunshine ought to add up to one. = 1 − ( , Δ ) is also known as the cloud shade. It will be used as the variable against which we shall plot the cloud cover . Both and are available from routine meteorological observations or measurements. Ideally, both variables should be perfectly correlated. This is what we set out to ascertain with the data from the two stations in Hamburg. Of course, as we can only estimate the two quantities, we shall distinguish them by using a tilde on the corresponding variables. Following Badescu [3] we at first accept the approxima-tioñ≈ as a good one. But we must bear in mind that sunshine measurements are contaminated by various factors, and this fact led the World Meteorological Organization to define a bright sunshine by excluding solar irradiance below a threshold value of 120 W/m 2 [6]. Therefore, the estimated value is always smaller than the theoretical one, >̃, or =̃+ , with > 0. But first we shall test the goodness of the approximation (9). Hence, our estimated cloud shade will be defined as̃= In the following regression plots the estimated cloud shadẽ is shown against the observed cloudiness̃, in order to test the validity of their approximate equality, as has been traditionally assumed (at least for measuring places without complex topography or other obstacles to the sun's direct irradiation).

The Weather
Mast. Due to the lack of cloudiness measurements before 2004, the corresponding time series at the weather mast is shorter than the time period we are envisaging in this study. An attempt to complete it by an AR(1) process, though partly successful, will not be explored here, as we want to avoid any artificial data in answering our question above. Measured data from the weather mast are reduced to daily means for the time period of the eight years from 2004 to 2011. As an example, the time series of the sunshine duration with its corresponding possible maximum value (red curve) of the weather mast is shown in Figure 1. The data gap within the year 2005 is visible, as mentioned before in Section 2. A plot of measured cloud shadẽand cloudiness̃is shown in Figure 2.
It contains 5,844 daily mean values and it also shows the linear relationship̃=̃, corresponding to the approximation (8). This ideal relationship is not, however, borne out by the figure. Most of the data points lie above the line, so that̃>̃. Extreme values of̃are 0.9, while cloud shade does achieve values of 1. There is some accumulation of data points around 70% cloudiness, with cloud shade values greater than 90%. On the other hand, for very small cloudiness, cloud shade may reach values up to 0.65, with frequent values around 0.1. This is theoretically unsatisfactory because without clouds we expect a vanishing cloud shade. A slight curvature is observed from around 50% cloudiness upwards. It thus seems that a better fit can be achieved with a polynomial regression model. Figure 3 shows the same plot as Figure 2   lines. The least RMSE (root mean square error) indicates the best fitting polynomial order of the regression (Table 1). On the basis of the regression lines, one can see that the cubic and the fourth power are almost equal, so that there was no need for higher orders. In general, there is almost no difference between the goodness of fit among the four regression lines, and Occam's razor would require choosing the linear relationship. Table 2 shows the values of the regression parameters, and the formula with the "best" fit is̃= 0.83046̃3 − 0.95267̃2 + 1.3959̃+ 0.081687. (11) 3.2. The Airport. The time series of the airport are complete, so there was no need to adopt any procedure to fill in values. Figure 4 shows the observed cloudiness plotted against the measured cloud shade. Note the higher scatter of the points, but we must not forget that more points have been available for the plot. There is a similar trend upwards for higher values of cloudiness, similar to what we saw in the case of the weather mast. As before, there is cloud shade for no cloudiness. We shall soon discuss the reasons for this. Table 3 shows that the quadratic polynomial fits the data with the least RMSE. The four fitting polynomials are also shown in Figure 4. A linear relationship with a displaced intercept would also be acceptable as an average relationship between the quantities plotted. The quadratic relationship is given bỹ= due to the polynomial coefficients given in Table 4. Quadratic relations between the solar irradiance and the relative sunshine duration have been reported in the literature   [9]. How this translates to a possible relation between solar irradiance and cloudiness is discussed by [10]. Aksoy et al. [1], on the other hand, found a linear relation between the sunshine duration and a satellite derived cloud index. This result is certainly not directly comparable to our data from surface observations. We did not examine any cloud index.

Remarks on the Results. The traditional assumption that
= is not borne out by our data. Particularly, clear skies are associated, on average, with some cloud shade. We Journal of Solar Energy 5  therefore reexamine the formulas in the previous section, writing, from (5) and (8), with a very small positive . This means that on average the estimates should obey the inequalitỹ>̃, with the difference diminishing with increasing cloudiness (cf. (6)). Another problem arises with the estimated value for the cloud shade, which is, as we saw before, always smaller than the theoretical value. Thus, the relation between estimated values reads̃( with > 0. Again, even for vanishing , this leads us to infer̃>̃, which is what the figures show. We may develop theoretical grounds to show how depends on the cloudiness but we will not do that here. We rewrite (14), the corrected relation between estimated cloud shade and observed cloudiness, to express empirically the threshold function which clearly leads us to expect positive values of the cloud shade for vanishing cloudiness [3].̃can now achieve its highest value of 1 (no sunshine throughout the whole daytime) for a value of cloudiness less than that for total cloud cover. This is also borne out by the figures. If we had independent means to determine (̃), say as a polynomial of the form then (15) would read, assuming to be negligible, and this would justify the polynomial fits we have used above.
A possible further work would be to find a functional form for and hence (14), but we reserve this for another, more theoretical account.

Comparing the Two Stations
After having described our results we would like to compare cloud shade and cloudiness of the two stations. Because of the closeness of the two stations and the relative uniformity of topography between them, we would expect very similar results. Figure 5 compares the cloudiness at both the airport and the weather mast. We plotted one cloudiness against the other, using only 2,922 values (2004 to 2011) because of the missing values for the previous period at the weather mast. We expect a perfect correlation clustering tightly around the bisecting line, and for higher values of cloudiness that is what we see. However, for little cloudiness most of the data points lie below this line. This means that there is, on average, a bias towards greater values of cloudiness for the airport, as compared to the weather mast's values. A likely interpretation of this shift is to be sought in the human observers at the airport. Due to problems of the perspective, the human observer easily overestimates cloudiness, especially when there are clouds with strong vertical development further away from zenith. On the other hand, at the weather mast, the cloudiness is measured with a ceilometer and therefore is subject only to instrumental and possibly algorithmic errors. This explains the bias for lower cloud covers.
A better match with expectations is shown in Figure 6. Here, data points are closer to the bisecting line of the regression graph. Moreover, more data were available, namely, 5,844 daily mean values for the time period between 1996 and 2011. This means that the sunshine of one station is representative for Hamburg.
This would certainly also be the case with cloudiness, if instead of a human being at the airport there were instruments to record the cloudiness. The ideal case would have been almost identical equations for the relation of sunshine duration and cloudiness, but due to different measurement techniques and instruments, differences show up in the results. Still, the results found are roughly comparable for the two stations at a linear distance of 14.3 km, and hence the two stations can be accepted as representative for the city of Hamburg.

Conclusion and Outlook
Inspired by the results of Badescu [3] and Akinoglu [9] about a small-order polynomial relation between cloudiness and sunshine, we wanted to check their findings with data from Hamburg. We were led to choose two different polynomials for the relation between cloudiness and sunshine duration in terms of cloud shade. The results, which refer to daily means, tend to confirm their findings, even if there are differences 6 Journal of Solar Energy     between the two spatially close stations. In any event, the analysis should be repeated with data from comparable instruments and also with more data, before a single linear or quadratic relationship can be established as stable enough to be representative of Hamburg. Other relations are to be expected if shorter time periods are to be analyzed.
Badescu [3] found a linear relation between cloudiness and cloud shade for Romania. Of course, for cities with other determining factors like a complicated relief, linear or quadratic relations are expected to become harder to confirm.
The relations suggested for Hamburg or Romania have practical applications, especially in the field of solar energy. With their help, it is possible to estimate amounts of solar energy reaching the Earth's surface, merely on the basis of cloud amount. A relation often used, the Angström-Prescott equation, can be adopted if there is just one parameter available at a specific location [1,11]. With it, it is possible to calculate the daily insolation at locations where the measuring network does not provide all the necessary parameters.
The knowledge of incoming solar radiation at the Earth's surface is certainly helpful in deciding whether a solar plant will be cost-efficient, or which location is the most suitable for reducing costs. Many more applications are related to the topic of this paper, but our aim has been only to confirm either the classic or the newer relations between the quantities involved, by drawing on recent data from two reliable stations located in the cloudy city of Hamburg.