Modeling Metabolism and Disease in Bioarcheology

We examine two important measures that can be made in bioarcheology on the remains of human and vertebrate animals. These remains consist of bone, teeth, or hair; each shows growth increments and each can be assayed for isotope ratios and other chemicals in equal intervals along the direction of growth. In each case, the central data is a time series of measurements. The first important measures are spectral estimates in spectral analyses and linear system analyses; we emphasize calculation of periodicities and growth rates as well as the comparison of power in bands. A low frequency band relates to the autonomic nervous system (ANS) control of metabolism and thus provides information about the life history of the individual of archeological interest. Turning to nonlinear system analysis, we discuss the calculation of SM Pinus' approximate entropy (ApEn) for short or moderate length time series. Like the concept that regular heart R-R interval data may indicate lack of health, low values of ApEn may indicate disrupted metabolism in individuals of archeological interest and even that a tipping point in deteriorating metabolism may have been reached just before death. This adds to the list of causes of death that can be determined from minimal data.


Introduction
Big data sets are revolutionizing science. They promote insights, facilitate comprehension, and order priorities for further studies using models and powerful computers. In the past decade important advances have been made using big data sets; they range from astronomy to climate change and from biology to geology. Bioarcheology, however, has not benefited from this trend, seemingly, because big data in bioarcheology are difficult to obtain.
Bioarcheology, as defined here, is cross-disciplinary research encompassing the study of human and animal remains. The best preserved tissues are bones, teeth, and occasionally hair.
Here we show that such archived materials provide sufficient data to model life's activities such as metabolism, growth, and biologic rhythms of individuals who have died decades or even millennia ago.
Many preserved tissues have growth marks left during life which reflect the rates of growth and by extension metabolism. For example, there are "scale like" markings on hair shafts which occur at more or less regular intervals which can be measured (Figure 1). Similarly on teeth surfaces or bone sections growth lines can easily be discerned. For all of these we use the term repeat intervals (RIs) from Bromage et al. [1] to denote the histological evidence on archived remains that betray life's activities such as metabolism and growth.
We hypothesized that the growth lines (GL) in hair, measured by microscopy as a time series, provide direct measurements of hair growth rate, which in turn depends on metabolism and therefore is a proxy for that individual's metabolism during life [1,2]. By analogy heart rate time series variability provides insight into autonomic nervous system (ANS) function and can hint at diseased states [3,4].
In death, forensic time series have been linked to ANS function and may reflect on the individual's life history; these time series include the repeat intervals between growth lines (RIs) in scalp hair expressed as sizes of hair scales measured by microscopy. Also the repeat intervals between Perikymata Grooves (PG) or Striae of Retzius (SR) in the enamel in human teeth and growth lines in archosaur teeth provide other time series [1,2,5]. In addition, there are time series of osteocyte density in bone [6]. Oxygen, hydrogen, or carbon isotope ratios as well as other chemicals in hair measured along fixed intervals in the direction of growth provide time series. Here we use spectral analysis of such time series as proxies of metabolism, which provide insight into dynamic processes in operation in the individual's past life.

Materials and Methods
The annual growth rate can often be computed in the time domain.

Annual Growth Rate and Preprocessing Forensic Time
Series. The forensic time series may be discrete time , = 1, . . . , , as in the growth lines in bone and teeth or for the scale sizes in hair, or a sample = ( ), = 1, . . . , , and = Δ from a continuous time process such as chemicals measured in successive sections of bone of equal length Δ . For the discrete time process take Δ = 1, so that in both cases we have a discrete time series { } of sample size (length) . Usually this series will need to be preprocessed before it can be considered stationary Gaussian, the typical assumption for its spectral analysis.
Examining the plot versus time , that is, versus , it may show a nonzero mean, a trend over time, or an obvious annual cycle. We detrend the series if necessary by fitting a regression linê= + and replacing the series by its residuals −̂thereafter. The mean of the series is subtracted; the mean corresponds to the power at the zero frequency on the spectra, but our interest in spectral analysis sets aside consideration of the mean for separate analysis.
The next step in standardizing the time series { } is to divide by its standard deviation. This preserves all the frequency content of the series and makes two different time series (perhaps even with different units of measurement) comparable. The situations where we would not standardize both series to variance = 1 is when our interest is the comparison of the variability (variances or the power in specified frequency bands) between the series.
If examination of the plot versus distance along the hair shows an obvious annual cycle, then we can proceed directly to computing the annual growth rate of the hair.
Example 1 (mammoth). The hydrogen isotope ratio measurements ( ) at multiples of 0.3 cm are taken along a hair from a mammoth [7,8]. There is a partial annual sinusoid evident, whose periodicity is 52 weeks.
Fitting the annual sinusoid as well as a trend yields the function of length along the hair in cm: Predicted = −158 −0.727 * cm + 8.69 * sin (−0.196 * cm + 3.98) as reported in [7]. The frequency of the sinusoid is 0.196 radians/cm. Converting radians to cycles we have frequency = (0.196 radians/cm)/(2 radians/cycle) = 0.0312 cycles/cm. This times the annual growth rate (cm/year) gives the number of cycles per year, which is equated to 1 cycle/year. Thus This is the growth rate reported in Sharp et al. [7].

Computing Periodicities by Spectral Analysis of Forensic
Time Series. To identified periodicities that are more frequent than annual and less frequent than daily we compute the power spectrum of the discrete time standardized version of our annually adjusted time series using SAS PROC SPECTRA and the Fast Fourier Transform [9]. The mean is removed because it corresponds only to the power at frequency = 0, which is not our interest. Dividing each series by its own standard deviation (SD) removes the last difference in units between series, which is appropriate if we are not interested in comparing variances. In other settings the comparison of means or variance may be the goal, so this information is retained for such an analysis. Note that -axis is no longer measured in cm but in the number of Δ , just as, in the mammoth Example 1 where Δ = 0.3 cm. Now we give the spectral parameter definitions.
To be explicit, let the discrete time, stationary, Gaussian time series representing a series of measured intervals be { ( ), for = 1, . . . , } with continuous spectral density ( ), where is the frequency on the -axis. Then the periodogram ( ) is an estimate of ( ). One has Note that each sinusoid as a function of has a frequency (radians per unit of ; in this case, radians per observation) and a corresponding period 2 / . Dividing by 2 radians per cycle gives a unit of cycle per observation as an alternative scale. For heartbeat, the frequency unit would be cycles per RR interval. For teeth, frequency units would be cycles per PG deposition (SR, Lines of Anderson (LA), or GL deposition). For the mammoth hair, the frequency units would be cycles per Δ increment. The units of the periodogram (and the spectral density) can be seen from the fact (proof not shown) that the sum of ( )Δ is the variance of the ( )'s. The unit for power density on theaxis is the measurement unit squared divided by the unit of the -axis.
There is a well-known problem with the periodogram as an estimator of the spectral density; it is not consistent; it does not become better as the sample size gets larger. Thus, the usual (and better) estimate of ( ) is the spectral density estimator̂( ), which is a smoothed and locally weighted average of the periodogram [10,11]. One haŝ [ 2 ] . (3) The symbol [ ] represents the integer part of . Spectral density ( ) is symmetric about = 0 by definition (definition not shown) and ( ) is symmetric. Since , called the spectral window, is taken to be symmetric, the estimated spectral density is symmetric, which allows one to plot the spectral density only for the nonnegative frequencies 0 ≤ ≤ 2 . Note that 4 / = 2Δ , where the extra 2 represents the sum over the negative and the -axis should also be scaled by dividing by 2 , and finally that (4 / ) ÷ 2 = 2/ is the coefficient in (2).
Let us return to the mammoth example; the estimate of the spectral density of the standardized series in Figure 2 There are high frequency (0.42) and a low frequency (0.15) spectral peaks. Rearranging (1) above and including Δ provide the formula for computing periodicity. One has period = Δ (freq) (growth rate) .
(1 ) For the low frequency peak, we compute a periodicity of 3.25 weeks. Consider where each observation represents one measurement interval with Δ = 0.3 cm/obs.
Similarly, the periodicity of the high frequency peak is 1.2 weeks.

Nyquist Folding Frequency.
There is a remaining issue; forensic time series are not measured continuously and the use of Δ affects the computed spectral density; one cannot hope to measure frequencies higher than a certain value taking place within an interval of length Δ . Furthermore, the spectral density is folded over at the Nyquist folding frequency with the high frequency content above being added to the low frequency content below . For the Smithsonian mammoth hair, Nyquist folding frequency = .5 Δ = .5 cycles/obs 0.3 cm/obs = 1.67 cycles/cm.
This folding frequency times the growth rate gives a frequency of 1.03 cycles/week with a corresponding periodicity of 1/1.03 or approximately 1.0 week. Since we are not examining periodicities this low or lower, there may be no foldback contamination in the above results. We have excluded the daily cycles from our interest; a much smaller Δ would have been necessary for this purpose. Though we are not examining the daily cycles directly, it could be folded back and contaminate our spectral density computation. The high frequency that folds back to a low frequency is called an alias. The aliasing problem sometimes requires a detailed discussion. The aliases of a given frequency are + 2 , where = ±1,±2,±3,.... The daily frequency is 7 cycles/week, and for = −3, it is the alias of +0.82 cycles/week, but the observed peak is at 1/1.2 = 0.83 cycles/week. Now we are uncertain whether the observed high frequency peak is real at a periodicity 1.2 weeks or it is a contamination from the daily cycle at 1/7 weeks. See discussion. It is clear that one should formally consider the effect of the Nyquist folding frequency.

Low Frequency-High Frequency Ratio.
To compute the power in a given band of frequencies, the spectral density is integrated over the band; that is, the spectral density times Δ is summed over the frequencies in the band.
Thus, the total power or power in frequency bands is obtained as areas under the curve where the units of theaxis cancel. While the computation of power as areas (AUCs) under the spectral density from (2) above is typical, we also wish to compute the asymptotic standard error of such estimates. The formulae for power in a frequency band are adapted from Priestley [12, page 427]. One has Again the factor of 2 represents the negative frequencies. When the time series is standardized, these formulae are dimensionless and can give a measure of the spectral shape. With these asymptotic means and SEs, one can compute a -statistic and value for the comparison of two AUCs. Note that these formulae in Priestley were developed for the periodogram ( ) instead of the locally smoothed periodogram, the estimated spectral density function̂( ). Both are approximately equal to the spectral density function ( ). Koopmans [13] uses the estimated spectral densitieŝ( ) for these formulae.  are independent 2 , chi-square random variables. Since = , the expected value of the random AUC is the targeted AUC. The variance of the random AUC is the targeted variance in (3). We now model the complicated distribution of the random AUC, a weighted sum of chisquare random variables, as a single distribution ≈ 2 by the standard method of equating moments. One has estimates In terms of the moments in (3)   Here, the high frequency band is 0.27 < < 0.5 with a width of 0.23, which introduces a bias relative to low frequency band width of 0.20. We could rerun our statistics for the comparable interval 0.30 < < 0.5, but we modify our estimate of AUC 3 to be (0.20/0.23) * 0.322 = 0.28 and LF/HF = 2.10. So, we develop the following two-tailed -test. In spectral theory, the two AUCs are based on disjoint frequencies and are nearly independent. They would be more independent if the spectral windows did not overlap. We have already computed = 0.0545 and = 10.8 for the statistic AUC 2 .
For an example of comparison between two spectra, we add the data for a hair sample from a Jarkov Siberian mammoth (Figure 2(d), in blue). For the frequency band 0.07 ≤ < 0.27, including the low frequency peaks, we have AUC 2 = 0.587 ± 0.253 (SE) for Smith and AUC 2 = 0.527 ± 0.199 (SE) for Jarkov.
A test of the difference shows no difference: Comment. For a spectral analysis, these sample sizes are small: = 33 and = 44. For chemical analyses these can be larger. Growth). Longer quiescence in hair growth (telogen [14]) indicates disrupted metabolism (longer intervals of oscillations of the system) and may be a marker of the tipping point in metabolism before complete cessations of rhythmic oscillations that are the hallmarks of biological systems.

Tipping Points and Telogen Duration (Quiescence in
Here we compute the quiescent period in a basic model of reduced annual growth rate in hair. In normal human hair the telogen phase lasts approximately 3 months divided into approximately 4 periods. Hair grows for approximately 8 years and then, normally, falls out (metabolism ceases in this particular hair). Stress is known to lengthen the telogen; hormonal levels, age, and metabolism also affect the duration of the telogen. We cannot know at the time of modeling of the hair growth in what stage each hair is at the time of death. The basic model assumes (1) hair growth rate after the telogen continues independently of the telogen preceding it and independent of the telogen duration and (2) metabolism continues during the telogen phase as it was before and after, but since the hair is not growing, information about metabolism is missing in the hair record. Let be the annual quiescence period measured in weeks and let and be the reference and reduced annual growth rates of the hair, respectively.
is reduced because of the augmented quiescence period . The relationship is   The historical record may contain information about metabolism; for example, Isabella had marked hair loss before she died. However, since direct measurement of quiescence was unlikely to be recorded, computations based on comparison of growth rates give information on quiescence periods as illustrated above.
We now show graphically how the quiescence period in our basic model affects the observed sinusoid representing the annual growth cycle. We will also check whether affects the observed low frequency sinusoid representing autonomic neural system (ANS) control. Normal annual hair growth rate measured in hair growth is approximately 16 cm/year. Thus we begin with growth of 20 cm/year when there is no quiescence period ( = 0 and = 0) and weekly measurements that average 0.385 cm (see Figure 3(a)). The annual sinusoid as a function of weeks is followed by differentiation (see Figure 3(b)). Now three (3) months of quiescence are marked as missing (red) in Figures 3(a) and  3(b).
However the quiescence periods are not observed; thus the observable result is in Figures 3(c) and 3(d).
When the periodic function in Figure 3(c) is identified as an annual cycle, computations would consider the cycle as though on a 52-week -axis. Thus, the annual growth rate is computed as 16 cm/year. We have shown how a growth rate of 20 cm/year becomes 16 cm/year in our basic model with = 3 months out of the year.

Nonlinear Time Series Analysis.
We have examined spectral analysis in the frequency domain, which can be considered as linear systems analysis. There are also methods for nonlinear time series analyses and their application to chaos in dynamical systems [15]. The name most associated with this field is Takens [16]. The field provides measure of chaos, which can arise from nonlinear dynamical equations. There is also a more purely mathematical analysis [17]. We illustrate just one practical method from this large field.

Approximate Entropy Measure.
Approximate entropy (ApEn) as described by Pincus et al. [18,19] quantifies regularity in time series data. ApEn and other measures have been used extensively in the analysis of biological time series [20].
In heart rate variability the low frequency/high frequency ratios reflect the autonomic nervous system (ANS) control of the activity of the cardiac pacemaker; in our analysis these ratios reflect the ANS pacemaker of metabolism and thus the ANS control of metabolism. In heart rate variability, disease such as diabetes decreases the variability; the heart rate is fixed at a higher rate but the variability in heart rate is reduced, a sign of ANS failure due to the disease. Conversely, high variability in ApEn reflects the robustness of the system. Bone, teeth, and hair also reflect metabolism and as such reflect ANS control. ApEn depends on three parameters: the length of the time series ( ), the width of the window that defines the patterns ( ), and the tolerance that defines the closeness of the patterns ( ). ApEn measures how the pattern ( ) repeats itself within tolerance ( ) over the course of the time series. ApEn( , ) is a statistic that estimates the logarithmic difference for and + 1 in the conditional probability that runs of patterns that are close for previous repetitions remain close. Consequently, a large ApEn corresponds to an irregular time series and a small ApEn corresponds to a regular time series. ApEn for lengths of > 50 has been found to be reproducible, but the literature suggests it can also be used with ≤ 50. The pattern matching window can vary, but values between 1 and 3 are generally used. We used = 2 and + 1 = 3. A small tolerance value ( ) corresponds to a fine pattern matching and a large value corresponds to a coarser comparison. Our was selected to be scale invariant as a percentage of the standard deviation of the time series being analyzed. We found values of between 60% and 70% discriminated best for our analysis.
Thus, we use ApEn to measure the logarithmic likelihood that similar patterns of data length ( ) that are similar remain so within a tolerance ( ) on the next incremental ( + 1) comparison. In this analysis smaller values of ApEn indicate greater regularity in the data. Larger values are indicative of greater irregularities, more chaotic systems.
Example 5. The "Zweloo woman" was exhumed from a bog in Netherlands in 1951.
We examined six scalp hairs, 2000 years after her death. The approximate entropy (ApEn) was computed for the repeat intervals (RIs) defined by the sizes of hair scales along the length of the hair and for each of her six hairs separately; with = 64-105 repeat intervals, pattern width = 2 and tolerance = 80% of the total standard deviation of the time series. The mean ApEn for Zweloo's hairs was 0.84±0.05 (SD). Since tolerance ( ) and, to a limited extent, pattern ( ) are "free" parameters, these choices can be partially validated by comparison to a control group using the same parameters.
Here a control group of 4 individuals had a mean ApEn of 0.71 ± 0.10.

Results and Discussion
For the methods outlined above, some operational aspects are now considered.
The Nyquist folding frequency probably is not a problem for measured RIs, since generally they are not sampling from a more continuous series. The RIs for hair are deposited in multiple of whole days with the multiple of days being related in an algometric fashion to the species' body mass; the whole days for periodic deposits to tooth enamel were 1 day for smallest bodied primates to 11 days for largest bodied primates and 8 days for humans [1]. Thus the daily cycles are not explicitly present in the RI data. This is not so for the continuous chemical record. The usual engineering solution to the Nyquist folding frequency problem is to design so that there is no power for frequencies beyond . This is a consideration for the chemical times series, since the sampling rate Δ may be under the experimenter's control. Even if this is not an option open to us, this spectral peak could still be real (uncontaminated), provided we knew the power of the daily cycle was low.
The data segments caused by missing values are pooled, laid end-to-end. The laying short time series end-to-end (concatenated) to form a longer series may cause difficulty. However, this difficulty is handled automatically in a similar situation for the spectral analysis of heart rate recordings where gaps occur due to technical problems or are introduced to eliminate periods of anomalous heartbeats for separate consideration. We follow this convention unless the difficulties become too large.
An important assumption for the distributions of spectral estimates in the section so named is that the choice of band width for the spectral window is wide enough for that the smoothed estimate of each spectral density function is consistent and narrow enough that estimates of adjacent spectral densities are approximately independent. The series length must be large enough that the two conditions on the spectral window can be met.
Among several additional methods in use for nonlinear time series analysis, there are generalizations of the two basic methods (ApEn and FFT) used herein. First is the replacement of the deterministic rules used in approximate entropy (ApEn) with fuzzy logic rules yielding an improved algorithm (fApEn) that could help with the choice of the tolerance parameter ( ) [21,22]. Second is the Hilbert-Huang Transform (HHT) which is designed as a time-frequency analysis of nonlinear, nonstationary time series [23]. Compared to the spectral analysis using Fast Fourier Transform (FFT), which is designed as a frequency analysis of linear time series that are stationary in time, the HHT could help with periodicity estimation. The Hilbert-Huang Transform is implemented in a file exchange (hht) in MATLAB and in a package (hht) in the R language. For our basic model of quiescence, we assume that the growth rate when the hair is growing is always constant and normal. Then the length of the quiescence intervals is the major effect on the annual growth rate and the effect is algebraic. If the disruption in metabolism affects both and the growth rate when the hair is growing, then we would need a model that connects increased quiescence to change (lowering) in the (instantaneous) growth rate when growing.
Does quiescence affect the frequency (periodicity) of the low frequency peak, the peak most related to autonomic nervous system (ANS) control? No doubt it does in the same manner that quiescence affects the annual growth cycle. Nonetheless, the computation of the periodicity of the low frequency peak is from the same hair growth record where we compute the growth rate usually reported. As such, it is comparable and useable on its face.