A Bayesian Approach to Multistage Fitting of the Variation of the Skeletal Age Features

Accurate assessment of skeletal maturity is important clinically. Skeletal age assessment is usually based on features encoded in ossification centers. Therefore, it is critical to design a mechanism to capture as much as possible characteristics of features. We have observed that given a feature, there exist stages of the skeletal age such that the variation pattern of the feature differs in these stages. Based on this observation, we propose a Bayesian cut fitting to describe features in response to the skeletal age. With our approach, appropriate positions for stage separation are determined automatically by a Bayesian approach, and a model is used to fit the variation of a feature within each stage. Our experimental results show that the proposed method surpasses the traditional fitting using only one line or one curve not only in the efficiency and accuracy of fitting but also in global and local feature characterization.


Introduction
Hand X-ray shown in Figure 1 is commonly used for skeletal age assessment in pediatric radiology. A discrepancy between skeletal maturity and the chronical age may indicate the presence of some abnormality in skeletal growth. This abnormality has been found to be related to various diseases such as endocrine disorders [1], metabolic/growth abnormalities [2], malformations and bone dysplasias [3], and gonadal dysgenesis [4]. Therefore, the assessment of skeletal maturity has become more and more important clinically. Clearly the accuracy in assessment is of the first concern.
Features encoded in ossification centers form the basis for assessment. If we know the exact characteristics of the features with regard to different stages of ages, we can do the best job on assessment. In reality, one needs a mechanism to capture such characteristics of features. Given data of a feature with respect to skeletal ages, a simple and common approach is to fit a line or a curve, which in turn is used for future prediction of new patients or assisting radiologists to understand the variation rules of the feature.
For instance, Figure 2(a) shows the variation of a ratio feature [5,6] in vertical axis with regard to the increasing skeletal age along the horizontal axis from newborn to 19 year old boys. (More details on this ratio are provided in Section 3.2.) Here in the figure, a single line is used for fitting the values of the feature. Obviously, a line is not enough to capture the characteristic of the values of the feature. A quadratic curve, shown in Figure 2(c), does not do a good job either. Fitting a more complex curve does not seem to be a feasible approach. This is because sometimes there are available only a small amount of data which could restrict the learning of complex curves, and local properties (with respect to the time) of the feature are often lost when fitting a global complex curve, and thus leading to inaccurate future prediction.
In this paper, we propose to fit the variation of features of the skeleton age via a multistage fitting approach. With our approach, we divide the skeletal age axis into several stages or phases, and within each stage, a relative simple model (line or curve) is employed for the purpose of fitting. Usually, the variation of a feature does not follow a simple rule 2 Journal of Biomedicine and Biotechnology when skeletal age increases. Instead, it often shows different variation patterns among different stages of age. As shown in Figures 2(b) and 2(d), multistage fitting not only can capture the entire pattern of feature variation but also carry the local properties regarding the skeletal age. A critical question is then, how does one determine the appropriate positions to separate the stages? The proposed Bayesian cut in this paper provides an answer via a Bayesian approach.
The rest of the paper is organized as follows. In Section 2, we describe our models for fitting, where the Bayesian cut is introduced. In Section 3, we present our experimental results on multi-stage fitting for artificial and real data. We conclude our paper in Section 4.

The Proposed Method
In this section, we first describe our proposed method for a simple case and then extend it to a general scenario.
Given a sequence of values f 1 , f 2 , . . . , f n , which denotes the skeletal age f in an ascending order, consider the linear relationship between f and one feature y found in the hand X-ray (e.g., length of digit). Usually, such a linear relationship varies as the skeletal age increases. That is, one linear form established for one interval of the skeletal age may not hold for the next interval, where a different linear form should be used. The time where two linear forms differ is called a change point. Our model that takes into account linear relationships and change points is stated as follows: . . .
where t 1 , . . . , t k−1 (correspondingly f 1 , . . . , f k−1 ) indicate the sequential change points, t j − t j−1 ≥ 3 (j = 1, . . . , k), and ji (for all i) are independent N(0, σ 2 j ) and ji (for all i, j) are independent of each other. In the model, the parameters β j1 , β j2 , σ 2 j , t j are all unknown, which will be estimated in light of the given data. The interval [t j − t j−1 ] represents the jth stage or phase, denoted by ph j . The main task here is to estimate the times t j . Given the estimates of t j , the linear forms and the associated parameters can be obtained through the traditional regression technique. We note that the requirement t j − t j−1 ≥ 3 ( j = 1, . . . , k) is needed for estimation of the regression lines. When k = 2, the model will be reduced to the two-phase regression with a single change point in [7].
The above model that uses only one dependent variable f can be generalized to include multiple independent variables. This generalization leads to the following model: , and ji are as the same as before. We refer p as the cardinality of the input vector f i , denoted by C(f i ), and the number of sample points in ph j as the cardinality of [t j − t j−1 ], denoted by C(ph j ). We note that though linear regression is used for each phase in model (2), this model certainly encompasses other nonlinear cases such as polynomial forms.
We now describe a Bayesian approach to estimate the change points.
. . , y tj ) T by y j , (y 1 T , . . . , y k T ) T by y, and (t 1 , . . . , t k−1 ) by t. For simplicity, we assume the noninformative or uniform prior for − → β j ( j = 1, . . . , k), ln(σ j 2 ) and t. Noninformative priors are used when information about parameters is completely unknown or when proper priors such as conjugate priors do not apply. (For a vigorous discussion on the choice of priors, see [8].) We can show the following main result (see the Appendix). Given the data y and the uniform prior for and t, where the number k is predetermined, the posterior probability that change points occur at t is where J = ( t 2 (n−kp)/2 Journal of Biomedicine and Biotechnology has its maximum, that is, t * = arg max t p(t | y). We call t * the Bayesian cut, and the value 2 (n−kp)/2

Experiments
In this section, we perform the Bayesian cut on two data sets: one is synthesized and the other is real. We use the synthesized data for performance evaluation in terms of recovery of changing points. The real data are used to discover the Bayesian cut and describe the feature in a multistage way which has more accurate prediction of the skeletal age compared with fitting by a single line or curve. Both linear and nonlinear regression are used for comparison. For convenience, we call the fitting with a single line or curve the single fitting and the fitting with the Bayesian cut the Bayesian cut fitting.

Synthesized Data.
We consider five cases or models describing the relationship between the dependent and independent variables. These are shown in Table 1 where the input vector f i for models m 1 , m 2 , m 3 , m 4 , and m 5 is respectively. The data are generated according to the setting given in Table 2. Specifically, β ji is randomly chosen from (−5.0, 5.0). ji is generated from a normal distribution with mean 0 and variance σ 2 j randomly selected from (0, 5 C(fi)−1 ). The number of sample points of the jth phase C(ph j ) is randomly selected from the set {(C(f i ) + 1), . . . , (C(f i ) + 1) + s}, where s is predetermined. f i takes the value of i for i = 1, 2, . . . , t k . Note that we use a variable bound for σ 2 j for taking into account the influence of the highest degree of the polynomial. Also, we use the variable number of sample points for each phase by introducing unbalance and scalability factors such that the performance evaluation will be more objective.To present a quantity on the performance of the Bayesian cut, we use the metric absolute deviation (AD), defined as where t * j represents the jth element of t * (the Bayesian cut). Intuitively, the smaller AD is, the closer is the Bayesian cut t * to the true change points t. Table 3 shows the AD values. They are obtained by ranging k from 2 to 4 and s from 1 to 10. For given k, s, and a given model, 50 trials are performed to generate data, leading to 50 datasets {(F, y)}. We find the Bayesian cut t * for each (F, y) and a given model. The final AD score is obtained by averaging the 50 runs.
Our findings can be summarized as follows. Regardless of linear or nonlinear regression, the Bayesian cut performs well with low AD scores. Introducing the unbalance and scalability factors does not deteriorate the performance of the Bayesian cut significantly. The Bayesian cut scales well when the number of change points increases.

Real Data.
In this part, we apply the Bayesian cut fitting to some real data from our database shown in Table 4. This table describes feature values with regard to the increasing skeletal age that ranges from newborn to 19-year-old boys (shown in column 1) labeled by radiology experts. In order to obtain features independent of the size and the length of digits, two ratio features are used according to the paper [5]. One is L 1 /L 2 , the ratio of the length of distal phalanx L 1 to that of middle phalanx L 2 of the middle digit, and the other is L 2 /L 3 , the ratio of the length of middle phalanx L 2 to that of proximal phalanx L 3 . See Figure 3 for illustration of L 1 , L 2 , and L 3 . These two features correspond to columns 2 and 3 which are generated in the light of the algorithm in [6]. Columns 4 and 5 represent normalized values of L 1 /L 2 and L 2 /L 3 , respectively. This normalization is done according to (x − μ)/σ, where μ is the expectation of x and σ is the variance. In our experiments, only normalized values are used. Figure 4 shows some of the Bayesian cut fitting, where features n(L 1 /L 2 ) and n(L 2 /L 3 ) are used, models describing the relationship between the feature and the skeletal age are    Table 1, and k takes values of 2, 3, and 4. In Figure 4, the horizontal axis represents the age and the horizontal axis indicates the feature. For model m 1 , the blue straight line across the entire age range is from the single (line) fitting. For model m 2 , the blue curve across the entire age range is from the single (quadratic) fitting. All red (broken) lines are from the Bayesian cut fitting.

Conlcusion
In this paper, we propose the Bayesian cut fitting to describe features in response to the skeletal age. In the semantic space derived by our approach, the axis of skeletal age is divided into meaningful stages, within each of which the variation pattern of a feature is consistent so that a traditional regression technique can apply to model the relationship between the skeletal age and the feature. Our approach is inspired by the observation that the variation pattern of a feature can differ in different periods of the skeletal age. A critical issue is to determine the times or change points when the variation pattern of a feature changes. This is handled by the Bayesian cut proposed in this paper. Simulations have been used to demonstrate the efficiency of the Bayesian cut fitting in terms of recovery of change points. The experiments on real data show that given a type of relationship (e.g., linear or quadratic) between the skeletal age and a feature, the Bayesian cut fitting surpasses the traditional single fitting when the consistency of the variation pattern (over the entire skeletal age range) of the feature is suspected. One major issue which is not addressed in this paper is the determination of k, the number of stages. Selection of k depends on the given data and the practical need. We leave this as our future research work.