Skin cancer is diagnosed in more than 2 million individuals annually in the United States. It is strongly associated with ultraviolet exposure, with melanoma risk doubling after five or more sunburns. Solar activity, characterized by features such as irradiance and sunspots, undergoes an 11-year solar cycle. This fingerprint frequency accounts for relatively small variation on Earth when compared to other uncorrelated time scales such as daily and seasonal cycles. Kolmogorov-Zurbenko filters, applied to the solar cycle and skin cancer data, separate the components of different time scales to detect weaker long term signals and investigate the relationships between long term trends. Analyses of crosscorrelations reveal epidemiologically consistent latencies between variables which can then be used for regression analysis to calculate a coefficient of influence. This method reveals that strong numerical associations, with correlations >0.5, exist between these small but distinct long term trends in the solar cycle and skin cancer. This improves modeling skin cancer trends on long time scales despite the stronger variation in other time scales and the destructive presence of noise.
Cancer is known to have genetic and environmental risk factors. Particular types of cancer can have a greater association with one factor than with others. One such example is that of skin cancer. Skin cancer (SC) is an unregulated growth of abnormal skin cells named after the type of skin cell from which they arise, for example, basal, squamous, and melanoma. Skin cancer is the most common cancer in the United States, affecting over 2 million annually [
The relationship between sunlight exposure, in particular that of the ultraviolet portion of the electromagnetic spectrum, and the increased likelihood of developing skin cancer has been a frequent subject of research. Studies indicate that approximately 90% of nonmelanoma skin cancer is associated with ultraviolet exposure [
The sun is an engine of nuclear fusion and as a result exhibits several measurable characteristics associated with solar nuclear activity. One defining feature is that solar activity is not constant across time. The intensity of solar activity undergoes an approximate 11-year cycle resulting in a naturally occurring pattern of maximums and minimums. Likewise, many characteristics associated with solar activity exhibit a strong cyclic nature with this approximate 11-year period [
Some of the solar characteristics that exhibit the solar cycle include electromagnetic radiation, irradiation, luminosity, magnetic field strength, magnetic polarity, flares, sunspot number, and solar wind. Every characteristic variable has unique methods of measurement and measurement histories. Likewise, the relationship of each variable to the underlying solar cycle phenomenon is different and not automatically synchronous or perfectly correlated [
Total solar irradiance (TSI) is the measure of the sum total power across the entire electromagnetic spectrum emitted by the sun and received per unit surface area. Measurements may be made in orbit (OTSI) or at ground elevations (GTSI). Another solar characteristic, sunspots, is patches on the surface of the sun characterized by locally diminished brightness and temperature and corresponds to the changing magnetic field within the sun [
These time series data sets illustrate one difficulty when investigating long term trends and correlations between variables in order to determine association structures and causative relationships. The relative strength of the components from one time scale can obscure those operating in a different time scale, interfering with the detection of a signal of interest. The separation of different time scales into constituent components allows for the proper unobstructed analysis. Kolmogorov-Zurbenko (KZ) filters (
There is scarce research into global effects of solar irradiation and intensity changes upon individual diseases and disease rates. The objective of this study is to examine the long term changes in SC and SN as a proxy for TSI and investigate the solar cycle and skin cancer relationship. This study demonstrates KZ filtration of signals into different time scale components precisely because different time scales result from different sources and may interfere in the analysis of each individual component [
The sunspot number time series dataset comes from a record with dates spanning the years from 1749 until the present consisting of monthly observed sunspot counts. This record is available from the Solar Influences Data Analysis Center (
Skin cancer records arise from case level data in the SEER, or Surveillance, Epidemiology, and End Results database, 1973–2009 (Surveillance, Epidemiology, and End Results (SEER) Program (
For the analysis it is necessary to prepare the data and establish common unit time measures, in this case monthly observations. While several of the variables include measurements on shorter units of time, employing shorter units of measure becomes unnecessary when exploring long term trends, global scale changes, and events with great periods of latency. SN and TSI data sets have a time series representation with summarized monthly observations. To convert the SC case dataset to the same observational time scale we collapse cases into the count within each month.
Upon initial inspection, skin cancer case data exhibits a nonconstant variance among the observations as well as a growth rate which one would expect from changing population statistics across time (Figure
SN and SC monthly data.
The long term growth rate mentioned appears to have two distinct periods with different rates occurring before and after 1984 approximately. The possible reasons for this change are numerous and worth investigation but unexplored in the course of this research. For the purpose of this analysis, the SC data used spans 1984 through 2009, a period of relatively stable and consistent growth. In time series analysis the linear trend must be removed from the natural logarithm transformed SC cases. The linear trend in log scale corresponds to an exponential growth in the original case data. After trend removal, the remaining deviations from the trend comprise our dataset for continued analysis.
The absence of any sizable long term linear trends throughout our time period of interest for SN and our other datasets representing solar activity makes a similar process of trend analysis and removal unnecessary. In fact, solar activity does exhibit even longer term patterns of fluctuation, patterns across centuries, much greater than the 11-year period of interest [
Most visible in the frequency domain, different time scales are likely rooted in different physical processes and thus arise from different causes. Our datasets viewed in a time domain appear as a compilation of the various influential time scales. Each dataset exhibits several strong features indicative of their respective time scales. The solar data most prominently exhibit a cyclic pattern with an approximate 11-year period. This is the solar cycle referenced. A smoothed Kolmogorov-Zurbenko periodogram displays a spike at the frequency,
SN spectra resulting from application of KZP algorithm with parameters
Viewing SC in the time domain exhibits different characteristics. The first characteristic was the visible upward trend across time discussed previously. The second is a cyclic pattern that appears to repeat with an approximate one-year period. A corresponding DZ smoothed periodogram has a peak near 0 corresponding to the trend and a spike at a frequency corresponding to 1 year (Figure
Ln(SC) spectra resulting from application of KZP algorithm with parameters
Due to the strength of signals present throughout other time scales, a given time scale of relatively less signal strength may be obscured. In order to investigate a particular time scale it is necessary to separate and remove those that are interfering. Kolmogorov-Zurbenko filters are low pass filters characterized by two parameters [
KZ filtered Ln(SC) spectra resulting from application of KZP algorithm with parameters
The data sets are then crosscorrelated to better understand the relationship between them. Note that for this study each pairwise crosscorrelation between two datasets only utilizes observed points beginning with the latest commencement of any dataset timespan and likewise ending at the first cessation of any dataset timespan. To account for possible latencies, or lags, in any possible causal relationship, data points from one variable are paired with opposing data points counted backward in time by
After crosscorrelations are calculated for all possible latencies, the latencies associated with peak correlations are selected and used to perform regression analysis between the variables. Regression analysis, in this case simple linear regression, provides a good fit and allows us to characterize the relationship and see how one variable is associated with the movement in another. Finally, the coefficient of explanation,
After KZ(13,2) filters are applied to SN, OTSI, and GTSI, crosscorrelation between paired datasets confirms the strong correlations between the long term variation of each of these variables. The peak value in crosscorrelation between SN and OTSI is
SN and OTSI long term (>1 yr) time scale components.
Pairing OTSI and SN at 0 latency enables characterization of the relationship between these variables. Fitting a linear regression model produces a slope coefficient of 0.0081 (Figure
OTSI plot versus SN with 0 latencies.
The slope coefficient indicates an increase in 0.0081 W/m2 associated with each additional monthly sunspot count, or, more befitting the range, 0.81 W/m2 per 100 SN. This value will enable extension of the analysis from SN to TSI regardless of the short observational history.
With SC it was necessary to transform the data prior to crosscorrelation. First was the previously described natural logarithm transformation. Second was the removal of an upward trend. The construction of a linear trend using least squares estimation resulted in a slope coefficient of 0.0034. Given the natural log scale, 0.0034 corresponds to approximately 4.2% growth per year. Crosscorrelations between SC cases reach maximum correlations
SC crosscorrelations with SN peak at candidate latencies of 10.0, 19.9, 31.8, 42.2, 52.3, and 62.5 years, and so forth. These peak crosscorrelations range from a minimum of 0.34 occurring at 31.8 years to a maximum correlation of 0.68 at 42.2 years (Figure
SN and Ln SC long term (>1 yr) time scale components.
These correlations correspond to coefficients of explanation, ranging from
With the peaks in crosscorrelation at these candidate years, we plot our transformed SC dataset, best summarized as the deviations from the natural log SC trend, against SN data that is lagged by those respective latencies (Figure
Deviation of LN skin cancer monthly cases versus SN monthly count with the (a) 10.0, (b) 19.9, (c) 31.8, (d) 42.2, (e) 52.3, and (f) 62.5 years of latency.
Evidence suggests there is a relatively small but distinct solar cycle effect on long term SC case variation. The relative influences from other time scales, such as the long term trend and seasonal component, cloak this long term solar cycle effect. Kolmogorov-Zurbenko filters provide an effective tool to separate and screen interfering time scales. Identification of this effect is possible by the separation from the influence of other uncorrelated time scales. Although this effect accounts for only a small percentage of the total variation in skin cancer incidence the benefit of investigating this particular frequency is not available using other time scales. Here the solar cycle fingerprint enables an analysis of the coefficient of influence with this singularly identifiable source which does not exist at other time scales. Crosscorrelation at different latencies accounts for the unknown delay between risk exposure and cancer detection. The latency of peak crosscorrelation is used to determine the magnitude of long term effects and characterize the relationship between variables. Once identified, the coefficient of influence between changes during the solar cycle and SC can be applied to actual observed changes in solar intensity in other time scales even when the underlying source is indeterminate. It should be noted that the ecological design of this study, while providing a risk modifying analysis of the health effect, has both advantages and disadvantages. It is well suited for the analysis of data grouped in this case both geographically and across time. This comes at the expense of generalized conclusions for the population at large that may not apply individually. In this case, this does not hinder an attempt to identify the global scale, long term component of skin cancer variation.
TSI is a natural choice as a representative variable of the solar cycle effect on skin cancer. The known risk of ultraviolet light exposure on skin cancer is a compelling argument in favor of its use. Unfortunately at this time, without additional years of observation, the need for a sufficient history to both detect an 11-year cycle and account for a multidecade latency makes TSI or any specific segment of the electromagnetic spectrum such as ultraviolet light unsuitable. These more accurate TSI records, though of limited research potential here, are however supportive of the analysis that can be performed with SN. Although orbital TSI has the shortest history and ground based TSI suffers from regional influences limiting its usefulness to study global effects, both produced results similar to and compatible with that obtained using SN. Crosscorrelations with SC in the long term time scale component gave evidence of the presence of the solar cycle effect. The extension of the crosscorrelation analysis requires a far lengthier history to investigate reasonable cancer latencies. In the future, with several additional years of data, extending this analysis using TSI measures may produce interesting and even more definitive results.
With SN as the only tenable solar cycle variable with sufficient history, the study proceeds by removing the linear trend from natural logarithm transformed SC case data. With a linear regression coefficient of 0.0034 on the log scale, this indicates that the rate of growth in skin cancer cases for several decades is approximately 4.2% per year. This outpaces the approximate 1% population growth in the United States during a similar period. Clearly, population growth can not alone account for the growth in skin cancer diagnoses, an interesting result and one worthy of continued investigation. Also, prior to 1984, SC data suggests a steady though lower growth rate than that after 1984. Future analysis could be extended to include data from this earlier period following a more detailed analysis of the reasons behind the abrupt rate change and properly accommodate for this feature.
Crosscorrelations between SN and SC displayed the cyclic pattern with an approximate 11-year synchronicity when plotted at different latencies, further supporting the presence of a solar cycle component. These crosscorrelations attain a peak value of
When plotting SC versus SN at the given latencies corresponding to each peak crosscorrelation, the association between the long term skin cancer and sunspot number datasets can be described by a linear relationship with a linear coefficient. These slope coefficients share very similar values near 0.0003, with only two exceptions creating a range from 0.0002 to 0.0004. While the time series tools do not indicate a preference for one particular latency, the same linear coefficients produced do not necessitate selection to provide an interpretation of the relationship between SC and SN. Given the log skin cancer scale the coefficient of linear regression has a clear interpretation by differencing the natural logarithm transformed values resulting in a percentage scale. A typical solar cycle can decrease to a near zero sunspot count in a given month and can peak at 150, 200, or even 250 sunspots during solar maximum [
An immediate extension of this analysis in future research is the application of the same methods to individual skin cancer types with both known sun exposure risk factors and those that have conflicting evidence as to the effect of sun exposure. In order to reveal the existence of a small and obscured solar cycle effect this study relied upon including all skin cancers for sufficient history and data records. This comes at the expense of including possibly uncorrelated cancer types, diminishing signal strength and reducing model fit. Provided that sufficient data resources are available, future analyses using this method applied to investigate particular skin cancers may be more illuminating and result in more refined models.
Knowing that associated skin cancer risk increases with increased solar activity during solar maximums and that this occurs in a well-known, predictable, cyclic pattern, there is opportunity to more effectively target education and prevention campaigns aimed at reducing skin cancer prevalence. The methods outlined in this analysis are equally applicable to similar research where the detection of a signal within a particular time scale is obscured by relatively stronger signals from different time scales, or by destructive noise. This research could be extended to the relationship between the solar cycle and other diseases that may have a long term hidden effect, or to other risk factors of disease.
Within the scope of this research project, the data was limited to records obtained within the United States. With additional datasets, particularly those outside of the US, extending this research would better clarify results to more accurately determine true global long term effects of the solar cycle. The Kolmogorov-Zurbenko filter has previously been extended and formalized in several useful applications including a spatial filter. With additional existing data elements it is possible to extend this research to include spatial data from the cancer database. Rather than pooling the data for a global effect it would then be possible to determine regional effects and develop regional models. This could first be performed by banding latitudes to account for the effect of latitude on irradiation intensity. Secondly, the analysis could be refined to individual locations accounting for local variation in meteorology and geography.
These are just a few of the possible extensions and applications of the research methods and results outlined in this study. Modeling and forecasting are only likely to improve with improved data, additional years of observations, and the inclusion of more accurate representative solar radiation variables as they become available, highlighting the need for continued TSI data collection. This study illustrates the importance of investigating long term effects that may be hidden by other time scales or noise but that significantly contribute to the understanding of disease risk and prevention.
The authors declare that there is no conflict of interests regarding the publication of this paper.