^{1}

^{2}

^{1}

^{2}

Structured spatial point patterns appear in many applications within the natural sciences. The points often record the location of key features, called landmarks, on continuous object boundaries, such as anatomical features on a human face. In other situations, the points may simply be arbitrarily spaced marks along a smooth curve, such as on handwritten numbers. This paper proposes novel exploratory methods for the identification of structure within point datasets. In particular, points are linked together to form curves which estimate the original shape from which the points are the only recorded information. Nonparametric regression methods are applied to polar coordinate variables obtained from the point locations and periodic modelling allows closed curves to be fitted even when data are available on only part of the boundary. Further, the model allows discontinuities to be identified to describe rapid changes in the curves. These generalizations are particularly important when the points represent shapes which are occluded or are intersecting. A range of real-data examples is used to motivate the modelling and to illustrate the flexibility of the approach. The method successfully identifies the underlying structure and its output could also be used as the basis for further analysis.

Many scientific investigations involve the recording of spatially located data. This data might summarize objects within an image as digitized versions of continuous curves. Once the data are collected often the original context is lost and the aim of the analysis is to identify which points are associated with each other and to link the points to reconstruct the original shape. These can then be seen as estimates of continuous curves and object outlines. If the original scene contains multiple structures, then the analysis must also divide the points into groups with separate curves used to describe the points in each group. It is important to note that this is likely to form only the first part of an analysis and hence can be seen as exploratory data analysis.

This paper looks at the use of smoothing splines to identify and describe geometric patterns in sets of points. It is assumed that the points lie on smooth curves but that a dataset may contain multiple intersecting curves. It is vital that this be done in a nonparametric way so that the widest possible range of patterns can be highlighted. In general, these are closed, or nearly closed, curves and so a transformation to polar coordinates is used to simplify the analysis. Intersecting curves are described by allowing discontinuities in the fitted curves. These procedures are illustrated using simulated data and varied real datasets describing human faces, gorilla skulls, handwritten number 3’s, and an archaeological site. These provide a wide variety of point patterns and reinforce the general usefulness of the proposed methods. For mathematical detailed description and applications of shape-based analysis of points, refer to, for example, Batschelet [

To allow for this wide variety of possible curves a nonparametric fitting approach, such as splines, can be used (see, e.g., [

It is important to note that there are many existing general frameworks for performing spline-based regression. For example, multivariate adaptive regression splines (MARS) [

A brief introduction to splines, along with the extension to circular data, is given in Section

A smoothing spline is a nonparametric curve estimator that is defined as the solution to a minimization problem. It provides a flexible smooth function for situations in which a simple polynomial or nonlinear regression model is not suitable. For a set of

The above objective function consists of two parts: the first measures the agreement of the function and the data and the second is a roughness penalty reflecting the total curvature—this can also be interpreted in a Bayesian setting as the likelihood and prior. Hence, for given

Figure

Smoothing spline fits to

For this dataset, the periodic nature of the sin function has, so far, been ignored, and it is clear that the extreme left and right do not match exactly. For such datasets, made up of angles or directions, ignoring the periodic nature of the measurements when smoothing may produce unacceptable edge effects. A simple approach for dealing with this issue will now be considered.

Suppose that the dataset is made up of paired angles and distances which will be denoted as

As illustration consider Figure

Periodic spline fits to

Once fitted a residual sum of squares, RSS, calculated on the original data values, can be used as a measure of goodness-of-fit. Here this will be calculated using the radial distances with definition

Of course, the approach could lead to a poor fit if the data is not periodic, but to prevent this it is possible to allow for a discontinuity in the relationship. Here the approach of Gu [

Suppose that the points

Consider the

(a) Data from sin function with a discontinuity; (b) best two-part curve; (c) residual sum of square for two-part curves.

To motivate the modelling, consider an unobserved true scene containing a few objects of various shape and sizes, with possible overlap. However, instead of the scene being recorded faithfully, only partial information is taken and, in particular, only points along the edges of the objects are recorded. These points might be chosen to identify features with special significance or they might simply be at equal or random locations along the edge. Further, due to overlaps, points from the full edge may not be in the dataset. Once collected, there is no record of which points are from which object, and no record is kept of possible object shapes nor even the number of objects. Hence, let the dataset consists of a collection of

Figure

Real datasets: (a) human face and (b) handwritten number 3.

Before the periodic smoothing spline approach can be applied it is necessary for the data to be first transformed to polar coordinates. First define a

To illustrate the transformation and the subsequent spline smoothing consider the simulated data in Figure

(a) Simulated data; (b) polar coordinate data with fitted spline curve; (c) data with back-transformed fitted curves. In (b) and (c) the solid curves use standard splines, whereas the dashed use the periodic spline.

Figure

(a) Occluded data; (b) polar coordinate data with fitted spline curve; (c) data with back-transformed fitted curves. In (b) and (c) the solid curves use standard splines, whereas the dashed use the periodic spline.

To summarize, application of smoothing splines to periodic point data has proved very successful. The modification of the duplicated data is a simple, yet effective way to create closed curves and to interpolate where data are missing. The approach has provided a robust and informative reconstruction of the unknown curve from the data.

To allow for intersecting and overlapping curves the points are partitioned into

In what follows the full dataset will, without further explanation, be referred to using either

Now consider estimation of the model unknowns from observed data. Start by supposing that a dataset is available but that the group membership information is intact; then the group centres could be estimated as

Now consider the case when the group membership is unknown and must be inferred from the data. The aim is to find linked points by fitting curves. Some datasets have more than one curve and some have intersecting curves. Then classifying the points into groups may help to fit the correct curves that represent the data.

In general, this can be thought of as a change point problem, as already discussed, to address the lack of stationarity in the values. A change point occurs at some point in the data if all of the values up to and including it share a common curve while all those after the change point share another. This is exactly the same situation as the discussion in Section

The previous sections have illustrated the proposed exploratory data analysis tools on simulated example, whereas in this section the success of the approach is demonstrated on a varied range of real datasets. There is no wish to construct formal equations to define the shape but to stimulate further analyses.

The first experiment is conducted on data extracted from the human face [

(a) Face data; (b) polar coordinate data with fitted spline curves; (c) back-transformed fitted curves. In (b) and (c) the solid curves use standard splines, whereas the dotted use periodic splines.

This dataset, taken from Dryden and Mardia [

(a) Schematic diagram of a gorilla skull with anatomical landmarks for a male gorilla; (b) landmarks in polar coordinate and spline curves; (c) landmarks along with back-transformed fitted spline curves. In (b) and (c) the solid curves use standard splines, whereas the dotted use the periodic spline.

Figure

Another dataset, again taken from Dryden and Mardia [

(a) A typical

The data in Figure

(a) Magnetic survey data for part of an iron-age archaeological site; (b) selected pits along with back-transformed fitted two-part spline curve; (c) polar coordinates and fitted two-part spline curve.

The data centres are calculated for each subset, the small circles are the data in the first subset, and “

Making sense of clouds of points, apparently randomly placed across a 2D region, is a key task in many statistical investigations. When the points are recorded without additional information, the first task is to infer structure by linking points using a data-driven approach. This paper has proposed and investigated a simple, yet effective method based on change point identification and nonparametric spline smoothing. It provides an intuitive explanatory tool to identify patterns in the point locations. When it is assumed that the structures form lines and curves, the change points divide the data into subsets, with the splines providing a flexible method to infer the shape of the structures. The method has easily dealt with occlusions and intersections in scenes with multiple curves. Similar results might be achieved by applying more general modelling approaches, such as MARS, RARS, RCMARS; for details see, for example, [

There is scope for extending the approach to include larger numbers of curves where it is not possible to divide the curves with a single change point. The nature of the problem is closely related to classification where the group membership is missing. This strongly suggests that a probabilistic approach might be considered based on statistical distribution models. This would then fit into the general framework where the EM algorithm has proven very useful. Also, there is a need to extend the approach to deal with unordered points and ones which are not star-shaped. These are areas of possible future work. Further, it is of interest to develop a similar procedure which would allow more formal modelling and model section, perhaps following the approach of general additive modelling [

The applications are various and varied with an illustrative example of the method when the data points are anatomical landmarks defined by geometrical features, equally spaced but blindly placed points along smooth curves and from extreme intensity points in grey-scale images. Further, the results of the analysis have provided new variables which could be the starting point for other analyses. Hence there is potential for this to be a valuable exploratory data analysis method in the tool-kit of applied statisticians and applied scientists.

The authors declare that there is no conflict of interests regarding the publication of this paper.