A Cluster-Based Method for Improving Analysis of Polydisperse Particle Size Distributions Obtained by Nanoparticle Tracking

Optical trackingmethods are increasingly employed to characterize the size of nanoparticles in suspensions. However, the sufficient separation of different particle populations in polydisperse suspension is still difficult. In this work, Nanosight measurements of well-defined particle populations andMonte-Carlo simulations showed that the analysis of polydisperse particle dispersion could be improvedwithmathematicalmethods. Logarithmic transformofmeasured hydrodynamic diameters led to improved comparability between differentmodal values ofmultimodal size distributions. Furthermore, an automatic cluster analysis of transformed particle diameters could uncover otherwise hidden particle populations. In summary, the combination of logarithmically transformed hydrodynamic particle diameters with cluster analysis markedly improved the interpretability of multimodal particle size distributions as delivered by particle tracking measurements.


Introduction
It has often been shown that the size of nanoparticles determines, among other factors, its biologic or even toxic effects [1]. However, the exact description of a nanoparticle suspension is a challenging issue, for example, during toxicological in vitro testing of nanoparticles [2][3][4].
During the past 5 years, the Nanoparticle Tracking Analysis (NTA) became increasingly important in nanotoxicology to describe the size distribution of nanoparticle suspensions [5]. As a basic principle, the Brownian motion of laser illuminated NPs is captured by a CCD camera mounted on a conventional light microscope and particle trajectories are tracked by image processing software. Particle size distribution is then obtained via the Stokes-Einstein relation [6]. The variance of the size distribution depends on duration of the observed particle tracks [7] and, in particular, on the mean particle size. Thus, a broadening of the size distribution is to be expected if mean diameter increases. Owing to this broadening effect the proportions of different particle populations are hard to assess by the modal values of a polydisperse suspension. From this consideration it, appears intuitively clear that the larger the broadening effect is, the more difficult it becomes to separate populations of particles with a small difference in size.
The purpose of this paper is to demonstrate these features by means of Monte-Carlo simulations of polydisperse suspensions. We furthermore will show methods useful for the analysis and improved interpretation of polydisperse particle size distributions (PSDs). Therefore, different proportions of size-defined particle populations were simulated, and size distributions were analysed. To effectively attenuate the broadening effect and to increase the interpretability of polydisperse distribution, calculated particle diameters will be logarithmised to normalize their variance. This means that the logarithm of each diameter is calculated and used for further analysis. To exploit the logarithmised diameters, a cluster analysis will be used. The efficacy of these procedures to improve the analysis of size distributions will be validated by simulations and verified by experimental results.

Theory: Nanoparticle Size Distributions via Particle Tracking
To track the Brownian motion of NP in suspension, we used a Nanosight instrument (LM10) which combines a conventional light microscope and a laser illumination device. The laser light is guided approximately perpendicular to the optical axis and scattered by NPs which, therefore, can be viewed according to Huygens principle. A CCD camera captures the diffraction patterns of diffusing particles at 30 frames per second. Then, the Nanosight software detects the center of each single diffraction pattern and measures the length of the trajectory [6]. Based on the trajectory data, the mean square displacement 2 of a particle is calculable in several ways [8,9]. Given a trajectory consisting of steps, a formula computes the mean squared distance between two successive particle positions , : The distance ( , +1 − , ) is referred to as the step length and easily calculated if magnification and the pixel size of the camera are known. is the time difference between two subsequent positions. The value 2 is associated with the estimated two-dimensional (2D) diffusion coefficient (DC) in a more complex manner: Each diffusion coefficient is usually weighted by its track length. For this purpose, a particle diameter estimated from a trajectory composed of N steps is added N times to the data set, provided that a minimum number of steps were recorded. Together, this weighting method countervails a source of error, caused by the fact that smaller particles will diffuse in and out field of view more rapidly and therefore more often than larger particles [10]. Finally, the hydrodynamic diameter of a particle is determined using the Stokes-Einstein relation: in which is the temperature in Kelvin, the viscosity of the suspension, and the Boltzmann constant. In any case, the accuracy of a particle size distribution measurement depends on having enough particles (>200) observed.

Data Acquisition.
Particle size measurements were carried out with well-defined polystyrene standard particles sized 50 nm (Thermo Scientific, 3050A NIST), 100 nm (Kisker-Biotech, PPs-0.1), 150 nm (Thermo Scientific, 3150A NIST), and 200 nm (Kisker-Biotech, PPs-0.2). Particle suspensions were adequately diluted with particle-free, double-distilled H 2 O to obtain 30-50 particles within one field of view, pipetted on the stage of a Nanosight LM10 laser device (530 nm) and viewed with an intensified CCD camera (Andor-DL-658M-OEM) mounted on a Nanosight LM10. Tracking data were recorded for 160 s using the NTA Software version 2.2. All measurements were repeated at least three fold. In mixed particle populations camera settings were adapted such that the smallest particles could be recognized by the software. The numerical composition of mixed particle suspensions was studied by scanning electron microscopy (SEM). The aqueous suspensions were identical to those measured in Nanosight experiments but were at least 1000-fold less diluted. Suspension was dried on Thermanox slices and sputtered with a thin layer of gold (40 nm, Sputter Coater S150B, Edwards, North Walsham, UK). Scanning electron microscopic examinations were done with the Gemini DSM 982 (ZEISS, Oberkochen, Germany, m 15 kV). At least 1000 particles from 10 different images were evaluated.

Virtual Polydisperse Suspensions.
NTA measurements strongly depend on instrument and software settings such as camera gain, threshold mode and value, background subtraction, expected particle size, and other parameters. Also the homogeneity of the suspension influences size, distribution, and results are prone to be biased towards larger particles [11].
To eliminate these sources of error, "semireal" tracking data of a polydisperse suspension were generated from monodisperse tracking data. For this purpose, tracks from real monodisperse particle suspensions were gathered within one data file using predefined proportions (see Figure 1). The resulting semireal tracking data were used for statistical purposes and validating the methods described hereafter.

Simulation of Mono-and Polydisperse Suspensions.
To analyse the properties of PSDs, a Monte-Carlo simulation of the Brownian motion of single particles with a specific diffusion coefficient was applied. According to Michalet [9], the step length probability density function (SPDF) of a particle with diffusion coefficient can be described: Using the so-called transformation method [12], it is possible to generate random step lengths with the distribution of (4). Additionally, the length of a track (i.e., the number of steps) can be simulated by resampling NTA measurements. By applying a combined simulation of step length and track length, mono-as well as polydisperse SPDFs are easily simulated. Tracks of particles in monodisperse suspensions were simulated by defining the diffusion coefficient of the particles and the total number of steps . The flow chart in Figure 2 describes the procedure of simulation. The resulting k-tuple of diameters was used for estimating the PSD.

Cluster Analysis.
Finite Mixture Densities Models assume that a population is mixed of subpopulations (clusters) with various densities. These models are also used in medicine and biology for several purposes (e.g., clustering genes or detection of action potentials) [13][14][15][16]. A mixture of c normal densities with different mean values , variances , and population proportions is defined as follows: All population proportions add up to 1. Also the measured size distributions are construable as a mix of normal densities with unknown parameters. For estimating the number of clusters c and their parameters, the statistic software R 2.15 [17] with the MCLUST 3 [18,19] package was used. MCLUST utilizes the Bayesian Information Criterion [20] and the Expectation-Maximization (EM) Algorithm [21] for determining the number of clusters and the density parameters, respectively. The initial parameters are estimated and then iteratively optimised by maximizing the log-likelihood up to a certain convergence criterion. If all clusters have the same variance, the number of possible locale maximas decreases. With the MCLUST implementation of the EMalgorithm, we were able to account for the constraints of such parameters.  Table 1 lists the absolute and relative counts of steps taken per particle population. The resultant PSD of the untransformed, weighted diameter data is shown in Figure 3. Although there were 1.8 times more 200 nm particles than 100 nm particles in the modelled suspension, the modal values had nearly the same  level. Although the relation of 200 nm and 50 nm particles was 3.3 to 1, the modal values led to the misinterpretation of a 1.3 to 1 relation. The reason for this misinterpretation is what we call the "broadening effect, " which is caused by the constant coefficient of variation of the PSDs. Therefore, the standard deviation increases linearly with particle size. Figure 4 illustrates this heteroscedasticity for simulated and measured data. A further disadvantage of the PSD in Figure 3 is that the 200 nm population covers the 150 nm population almost completely. As shown in the next paragraph, a PSD based on the logarithm transform of the diameter data combined with a cluster analysis reveals a solution for misleading modal values and covered particle populations.

Logarithmised Data and Cluster Analysis.
It is a known property of the logarithm that it reduces heteroscedasticity of random variables, so it stabilizes the variance. For this reason, the diameter data were logarithmised to gain more comparability between two particle populations in the PSD of polydisperse suspensions. The logarithmised version of Figure 3 is shown in Figure 5. The modal values of black PSD of Figure 5 compared with the modal values of Figure 3 better represent the true proportions of the particle composition given in Table 1.  Despite the variance stabilizing transform, the 150 nm particle population remains hidden behind the 200 nm population. To uncover such hidden populations and determine quantitative values for the population ratios, an MCLUST cluster analysis was performed. It must be emphasized that the logarithmised diameters are better suited for a cluster analysis, because the variance between particle populations can be assumed as equal.
The result of such a cluster analysis is shown as colored PSDs in Figure 5. Clusters with a difference of the means less  than 7 nm were merged. Table 2 lists the cluster means and the proportions of the individual clusters. The proportions of the clusters 1, 3, 4, and 5 are in good agreement with the true proportions listed in Table 1. The clusters 2 and 6 are false clusters, which may appear in both simulations and experiments. False clusters represent, however, only small particle populations and could easily be filtered using a threshold proportion.

Validation by Virtual Polydisperse Suspensions.
For the validation of our method described in Section 4.2, a virtual polydisperse suspension was generated from real measurements of four monodisperse suspensions of polystyrene particles with diameters of 50 nm, 100 nm, 150 nm, and 200 nm, respectively. Figure 6 shows the PSD of each measurement. was then generated according to the proportions given in Table 1. The results of the cluster analysis is shown in Figure 7 and the relative proportions in Table 3. It can be seen that the cluster mean values are in good accordance with the measured modal values of the monodisperse PSDs. Even the hidden cluster ( = 3) was detected, although the difference between calculated mean value and measured diameter was somewhat larger than observed for all other peaks. This may be due to the low number of tracks which were integrated for this subpopulation. The false cluster ( = 5) had the smallest proportion and might be filtered by a reasonable threshold setting of, for example, 3%. This virtual experiment shows that the properties of the measured PSDs were adequately reflected by the cluster analysis of logarithmised data.

Verification by Measurement.
To verify the method experimentally, we prepared a defined suspension of 100 nm and 150 nm polystyrene particles with equal number concentrations of both particle types. Therefore, an SEM analysis of the mixed suspension was carried out, a representative micrograph of which is shown in Figure 8(b). The result of conventional NTA analysis is shown in Figure 8(a): the modal values were 101 nm and 134 nm. However, the density function shows different peak heights, suggesting a lower particle content of the larger particles. A cluster analysis was carried out for the logarithmised diameter data of Figure 8 and results are shown in Figure 8(c). Cluster means were 104 and 144 nm, respectively. The estimated proportions were nearly equal (1 : 1.04). This ratio was then used again for a simulation experiment of a bimodal suspension of particles with 101 nm and 134 nm diameter. The untransformed PSD is shown in Figure 8(d). It can be seen that the ratio of the peaks in Figure 8(d) is nearly the same as in Figure 8(a). Due to the latter result and because the results were in good agreement with the SEM ratio, we conclude that the proportion had been correctly determined by the cluster analysis.

Applicability and Limitations of the Method.
A fundamental assumption of the proposed method is that the standard deviations of the subpopulations in polydisperse suspension are equal after logarithmic transformation. If this is correct, the logarithmic transform is a good method to reduce the impact of the particle size on the broadening effect of the PSD. Another factor influencing the broadening of the PSD is the mean number of steps contributing to the analysed tracks because the standard deviation of decreases for longer tracks [22]. This is, of course, equivalent to the mean time interval during which a particle is successfully tracked. Figure 9 illustrates the influence of the track length on standard deviation (SD) for untransformed (a) and logarithmised data (b) of four particle types. In contrast to the untransformed data, the SD of the logarithmised data is nearly identical for all four types of particles if the number of steps per track length is the same.
One may object that the different mean step lengths of different particle populations in a polydisperse suspension lead to different SDs. This would contradict the assumption of equal SDs and, consequently, would impair the cluster analysis. To avoid this difficulty, we removed all tracks with a length below 10 steps in the examples of this work, thus avoiding to track "random noise" in particle videos. The remaining tracks of the 100 nm, 150 nm, and 200 nm particle suspensions had a mean track lengths of 56, 37, and 65 steps, respectively. Regardless of some variations in SD which might be due to this disparity, the cluster analysis worked convincingly well with this data. Nevertheless, a cluster analysis may fail if data with larger variation in SD are being processed.
Accepting the above limitations, the methodological improvements suggested in this study may be helpful to analyse more complex and multimodal systems. For example, the apparent size of nanoparticles suspended in biological fluids such as serum or cell culture fluid increases due to agglomeration and/or the formation of a protein corona [23]. These processes are highly dynamic and can be observed during single or repeated Nanosight measurements [24]. As changes of particle size or agglomeration state upon protein coating may be moderate, quantitative evaluation may benefit from our cluster analysis, provided that the number of peaks is not too high (e.g., see [25]). Here, we have used up to five simulated particle populations and found this a reasonable upper limit of complexity.

Conclusion
We presented a method for improving polydisperse particle size distributions based on the logarithmic transform of the estimated diameters to reduce the heteroscedasticity, which is partly due to the constant coefficient of variation of the diameter data. Transformed data were then subjected to a cluster analysis, that shows the ability to uncover hidden populations. Calculated cluster proportions were validated and verified by Monte-Carlo simulations of polydisperse suspensions, NTA measurements, and SEM images. The procedure appears helpful to correctly interpret the composition of polydisperse particle suspensions. The novel method for Monte-Carlo simulation of polydisperse suspension and the concept of virtual polydisperse suspensions seems to be useful for further investigations of the properties of polydisperse size distributions.

Conflict of Interests
The authors declare that they do not have a direct financial relation with the trademarks mentioned in their work that might lead to a conflict of interest.