Perspective Biological Markers for Autism Spectrum Disorders: Advantages of the Use of Receiver Operating Characteristic Curves in Evaluating Marker Sensitivity and Specificity

Autism Spectrum Disorders (ASD) are a heterogeneous group of neurodevelopmental disorders. Recognized causes of ASD include genetic factors, metabolic diseases, toxic and environmental factors, and a combination of these. Available tests fail to recognize genetic abnormalities in about 70% of ASD children, where diagnosis is solely based on behavioral signs and symptoms, which are difficult to evaluate in very young children. Although it is advisable that specific psychotherapeutic and pedagogic interventions are initiated as early as possible, early diagnosis is hampered by the lack of nongenetic specific biological markers. In the past ten years, the scientific literature has reported dozens of neurophysiological and biochemical alterations in ASD children; however no real biomarker has emerged. Such literature is here reviewed in the light of Receiver Operating Characteristic (ROC) analysis, a very valuable statistical tool, which evaluates the sensitivity and the specificity of biomarkers to be used in diagnostic decision making. We also apply ROC analysis to some of our previously published data and discuss the increased diagnostic value of combining more variables in one ROC curve analysis. We also discuss the use of biomarkers as a tool for advancing our understanding of nonsyndromic ASD.


Definition of ASD
The Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-V), issued in May 2013 by the American Psychiatric Association, provides new diagnostic criteria for Autism Spectrum Disorders (ASD), which now includes Asperger syndrome, classic autism, childhood disintegrative disorder, and pervasive developmental disorders not otherwise specified. It classifies ASD by levels 1 to 3 for mild, moderate, or severe, based on the degree of support the patient requires.
Occurrence of ASD is four to five times more prevalent in males than in females (1 in 42 boys versus 1 in 189 girls); the Centers for Disease Control and Prevention (CDC) estimated in 2014 that 1 in 68 children aged 8 years was affected by ASD in USA [1]. Another recent estimate [2] put the burden 2 Disease Markers of ASD on 1 out of 132 persons (i.e., 7.6 per 1000 persons), with little variation around the world. This discrepancy may reflect both a real increase in the occurrence of ASD and in its diagnosis (in 2012 CDC estimated the rate of ASD in US children to be 1 out of 88) and the fact that ASD diagnosis is often "lost" when children progress into adulthood, being replaced by a "generic" intellectual disability and/or hidden under late-developing neuropsychiatric illnesses [3].
Affected children usually suffer from impaired social interactions, speech disabilities-ranging from language delay to lack of speech, repetitive and/or compulsive behaviors and echolalia, hyperactivity, deficits in memory, learning, motor skills, or other neurological functions, abnormal excitability, and hyper-or hyposensitivity to sensory stimuli, anxiety, and difficulty to adapt to new environments/habits. Frequent association with comorbidities such as sleep and gastrointestinal problems has been also reported [4,5]. A recent review [6] points out the four broad domains of development that are predictive of ASD: sensory-motor, attentional, social-emotional, and communication. Deficits in these areas may appear as early as 6-9 months of age, although most manifest during the second year. Reliable diagnosis can be made by an experienced physician around age 2; however many children do not receive a final diagnosis until much older [7,8].

Etiology: Genetics, Environment, and Their Relationships
Betancur [9] listed 103 disease genes and 44 genomic loci recognized in ASD subjects. Among them, 99 genes were classified as syndromic autism genes, since the autism trait arises within the context of a complex syndrome with known genetic origin, such as the fragile X, the tuberous sclerosis, or the Rett syndrome. These single gene disorders account for 3-5% of ASD. Moreover, advances in genetic testing, notably chromosomal microarray analysis, enabled the identification of de novo Copy Number Variations (CNV) in about 30% of affected children. In this way, about 300 rare ASD-associated CNV regions have been identified [10,11]. Besides CNV, more than 500 single gene mutations have been identified by wholeexome and whole-genome sequencing [12,13].
Several lines of evidence support the notion that genetics may play some role also in the remaining 70% of ASD cases. One of the more convincing of these evidences is the high hereditability of ASD. A very high concordance among homozygote twins (over 90%) was recorded in a 1995 study [14]. The genetic complexity of ASD is however supported by the low linkage association in siblings as well as in dizygotic twins, which have only a 6% concordance [14]. Moreover, a study examining parents of 69 people with ASD and parents of 52 controls showed that parents of ASD subjects presented mild forms of autistic-like features [15], the same "broader autism phenotype" recognizable in ASD siblings. Based on these observations, it is reasonable to conceive nonsyndromic ASD as a complex genetic trait, resulting from the combination of multiple de novo mutations, CNV, and rare genetic variants, with possible additive effects, which may account for the high heterogeneity in clinical presentation. In a recent review [16] Bourgeron compared all available information on the genetics of early-onset neurodevelopmental disorders in order to identify a common core of altered pathways affecting neuronal homeostasis. Pathways associated with early-onset neurodevelopmental disorders fall in the domains of cytoskeletal organization, synapse, translation, chromatin remodeling, and metabolism. However, despite similarities and overlapping of symptoms common to most if not all early-onset neurodevelopmental disorders, in particular the presence of epilepsy and of cognitive impairment in many ASD patients, and despite the marked heterogeneity of clinical presentation of ASD, specific clinical traits characterize ASD and lead to specific diagnosis. Hence, the quest for the identification of biomarkers is able to focus on the core ASD symptoms.
Alongside with genetic factors, the concurrence of a multiplicity of environmental factors is strongly emerging in the etiology of ASD. In contrast with [14], a more recent study, which examined 192 pairs of twins [17], concluded that "susceptibility to ASD has moderate genetic heritability and a substantial shared twin environmental component." Environmental factors include metabolic diseases [18], immune disorders [19], infectious diseases [20], nutritional factors [21], GI microbiota [22], and a variety of toxic substances, including pesticides, heavy metals, and atmospheric pollutants [23]. Estimating the contribution of environmental factors to ASD insurgence is particularly complex since it is often difficult to discriminate one factor from the other or to identify the correct cause-effect relationship; for instance, disruption of the immune system or of hormonal homeostasis by pollutants may be erroneously categorized; GI microbiota may affect the presence and the diffusion of toxic metabolites [24]. One should also be aware that the contribution of environmental factors may be underestimated for temporal reasons; for instance, in order to affect neurodevelopment, the exposure to the environmental factor(s) should fall within a still undefined critical window of susceptibility and may thus be missed; evaluations performed in tissues or biological fluids having a rapid turnover may fail to display the presence of the toxic compound. In their recent review, Rossignol et al. [23], while pointing out a number of limitations and weaknesses found in the literature dealing with the effects of environmental factors on ASD etiology, nevertheless concluded that an association could be found between some pollutants and ASD (with stronger evidence for air pollutants and pesticides). Moreover, they reviewed a number of papers showing that ASD children bore a number of genetic polymorphisms that could decrease the expression of enzymes, such as PON1 and GST, able to efficiently eliminate environmental toxicants. These results add a new dimension to the toxicological studies on ASD. In fact, the decreased or impaired expression of an enzyme involved in detoxification may sum up with the increase of oxidative stress, be it of environmental or of genetic origin, and with other features, such as male-related hormonal factors which make males more susceptible to pollutants [25,26] as well as to ASD. These considerations support the concept that the ASD trait is the result of a multiplicity of genetic and environmental factors.

ASD Biomarkers
Generally speaking, biomarkers are biological parameters that differ between normal and pathological processes and can be used as indicators for diagnosis, prognosis, risk assessment of a disease, and evaluation of therapeutic outcomes. As briefly discussed above, the widely accessible chromosomal microarray analysis fails to identify genetic markers in about 70% of children carrying nonsyndromic ASD. Since the clinical phenotype of ASD overlaps, especially in the early ages, with many other clinical conditions, such as Attention Deficit and Hyperactivity Disorders (ADHD), Semantic Pragmatic Disorder, or severe Specific Language Impairment, the lack of specific biomarkers for ASD makes diagnosis very difficult to pediatricians, in particular when dealing with a very mild phenotype of the autistic spectrum.
ASD biomarkers are also needed for prognostic purposes. In fact, Autism Spectrum Disorders are generally considered lifelong conditions, but people with ASD exhibit outcomes that vary widely [27], especially when diagnosis and/or psychotherapeutic/pedagogic interventions are early, that is, at age 2 [28]. In effect, some cases may evolve in other psychiatric conditions, such as ADHD [29], while others may experience a very good outcome, since a minority of individuals with ASD may even lose the diagnosis [30,31].
A few very good reviews have addressed the issue of ASD biomarkers in the past few years [32,33]. In the present review we focused on studies dealing only with peripheral biomarkers. In fact, peripheral biomarkers are potentially easier and less expensive to analyze, when compared with genome-wide sequencing or brain imaging, which require procedures of data acquisition difficult to apply on a large scale. Moreover, some biological material, such as urine, is easy to obtain also in very young children. Another important feature of peripheral biomarkers is its potential for pointing to the biochemical pathways that, when altered, lead to the core ASD phenotype, which is shared also by syndromic ASD subjects and by ASD cases identified by CNV analysis.

ROC Curves
Our review is also characterized by the choice of examining only studies which included the calculation of the Receiver Operating Characteristic (ROC) curve. In our opinion, ROC curve should become the gold standard for the identification of parameters that are sensitive and specific enough to support ASD diagnosis, while its utility in prognosis, risk assessment, and evaluation of therapeutic interventions still awaits further studies.
ROC curves emphasize the most significant statistical differences between cases and controls. The Area Under the Curve (AUC) provides a useful metric to compare different biomarkers. While the AUC value close to 1 indicates an excellent predictive marker, a curve that lies close to the diagonal (AUC = 0.5) has no diagnostic utility. AUC value close to 1.00 is always accompanied by satisfactory values of specificity and sensitivity of the biomarker [34]. For a discussion of the use of ROC curves in translating biomarkers to clinical practice, see [35]. When studying the perspective ASD biomarkers, high sensitivity means that autism will be identified in most cases, while high specificity means that few, if any, healthy individuals will be positive to the test. Very interestingly, the combined ROC analysis of two distinct parameters increased their specificity (see, e.g., [36]), which suggests that one might resort to the combination of a panel of (related) parameters rather than to a single parameter alone.
Typically, when a diagnostic model is built upon a set of perspective markers, elaborated by using a "training" subset of data, a common practice to estimate its performance consists in feeding the model with randomly selected data ("testing" subset) and examining its ability to correctly classify these data as belonging to either of the two groups (e.g., healthy and pathological). Such procedure has the disadvantage of requiring large sets of data, since about one-third is set apart for validation. In the light of the fact that ROC analysis is able to predict the sensitivity and the specificity of the markers, we advance here the proposal to evaluate whether it might make the cross-validation procedure useless.

Search Strategy, Selection Criteria, and Limitations of Reviewed Studies
Identification of the studies was carried out through an extensive literature search using the PubMed database (National Library of Medicine, National Institutes of Health, Bethesda, MD, USA; http://www.ncbi.nlm.nih.gov/pubmed) mainly based on specific keywords and was updated to June 29, 2015. The search strategy included the terms Autism (mesh) AND Receiver Operating Characteristic Curve (mesh) OR ROC (mesh). Only articles reporting peripheral parameters were taken into consideration. Articles that did not present unique or new data were excluded from the analysis. One hundred and thirteen citations were obtained and manually reviewed; 20 [36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55] fulfilled the selection criteria and were used to collect the data about ROC curve analysis.
Despite the large amount of studies about ASD, relatively few studies calculated ROC curves and, consequently, few data are available about sensitivity and specificity of a parameter. Most of the studies reporting ROC analysis dealt with people in paediatric age, and relatively few data are available about adolescents and adults patients. Most of them come from few research groups, and the data are thus limited to few geographical areas. Moreover, many of the reviewed studies had small sample sizes. In most of the studies the control group consisted in healthy, neurologically normal, children. Studies are needed that compare also subjects with clinical conditions that overlap in part to that of ASD (such as ADHD, as mentioned above), subjects with known aetiology such as Down's syndrome, and subjects with cognitive impairment without autistic features. Independent, larger, geographically different studies, extended to other early-onset neurodevelopmental disorders, are thus required to confirm (or disconfirm) available data.

Results and Discussion
Tables 1-6 report published studies presenting peripheral parameters where ROC curves were calculated. Putative biomarkers are grouped into six different biochemical categories: neurotransmitters and neurotrophins, oxidative stress markers, fatty acids and phospholipids, inflammation markers, metabolites, toxic biomarkers, and metals and cations. Selection is updated at June 29, 2015.
Most parameters reported in Tables 1-6 have ROC curves that identify them as highly sensitive and highly specific putative markers, fulfilling the requirements reported in [34] and suggesting that they may be considered for further evaluation as bona fide ASD biomarkers.
Some data reported by the Saudi Arabia group [40,44,45] have AUC value of 1, which should correspond to 100% sensitivity and 100% specificity; however, they are puzzlingly reported with a specificity lower than 100%, a result that is not discussed.
In the light of the usefulness of the ROC curves in evaluating the quality of putative biomarkers, we reexamined some of the data previously published by our group [56]. Notably, the best value (AUC = 1) was reached by a parameter (erythrocyte Na + , K + -ATPase activity), where the values of autistic and typically developing children did not show any overlapping (Figure 1).
Other six parameters which differed in a significant way between the two groups of children had fair-to-good ROC curve values (Figures 2(a)-2(f)). The combination of the six distinct parameters in one ROC curve analysis is shown in Figure 2(g). In order to be able to combine the six sets of data, raw data were standardized according to the following formula [57]: where is the raw score, is the standard score, is the mean of the population, and is the standard deviation of the population. The absolute value of represents the distance between the raw score and the population mean in units of the standard deviation. is negative when the raw score is below the mean and positive when above. ROC analysis shows that the combination of different putative biomarkers increases both their sensitivity and their specificity as diagnostic tools. Notably, ROC analysis of most markers reported in Tables 1-7 shows that they are more sensitive than specific. Although sensitivity is a desirable quality for biomarkers (sensitive biomarkers do not erroneously classify positive cases), more specific biomarkers are needed for a correct classification of cases. Figure 2 shows, as a representative example, how both sensitivity and specificity may be dramatically increased when more than two parameters are combined in one AUC curve; in fact, with such combination, AUC scores reach a value 0.93.
The choice of examining only studies where ROC analysis was carried out has greatly limited the number of reported parameters. Notably, however, they fall in categories, which bear many similarities to the theoretical classification adopted by Ratajczak [32]. In fact, even this limited number When the AUC value is 1.00, the curve degenerates into a segment which lies parallel to the -axis on top of the graph. The parameter of the figure was previously published by our group [56]. Values are shown in Table 7. ROC curve analysis was based on nonparametric methods. The confidence intervals of ROC curves were set at 95%.
of peripheral biomarkers seems to be representative of ASDrelevant pathophysiological pathways that are presumably shared by all ASD patients. The high or excellent AUC score obtained by the parameters reported in Tables 1-7 and in Figure 1 is not sufficient, however, to promote such putative biomarkers to bona fide ASD biomarkers. In fact, we already stressed the limitations of the studies here examined, which need to be confirmed by independent studies using larger population samples.
In our opinion, the use of combined ROC curves, rather than being an artefactual expedient, has the merit to highlight the fact that, in heterogeneous and multifactorial conditions as ASD are, only a (correct) combination of peripheral parameters may be able to maximize the predictive value of the tests. Moreover, in order to be useful for diagnosis and prognosis, putative biomarkers should be evaluated in studies assessing patients with confounding or overlapping clinical features and in longitudinal studies.

Conclusions
ASD are a group of early-onset neurodevelopmental diseases, whose causes are still poorly understood; growing evidences suggest that autism is a multifactorial disease influenced by genetic and environmental factors.       Table 7: Features and ROC curve analyses of peripheral biomarkers reported in Figure 2 and published in [56]. Combined score has a value <  Some parameter values increase in autistic children with respect to typically developing ones, while others decrease. ROC curve analysis of a combination of multiple parameters, albeit with opposite sign, increases both sensitivity and specificity. Values of these parameters, reported in [56], are shown in Table 7. ROC curve analyses were based on nonparametric methods. The confidence intervals of ROC curves were set at 95%.
To date, autism diagnosis is based exclusively on clinical observation of altered behavior and can be made only around two years of age, since in younger children clinical diagnosis is difficult and uncertain. Therefore, valid biomarkers are needed that would allow improving and anticipating diagnosis. In addition, good biomarkers could provide predictive information on the clinical outcome of autism and help monitor the outcome of pharmaceutical or nutraceutical treatments.
The importance of the availability of strong biomarkers in ASD research cannot be underestimated. In effect, even the discovery of the biological networks underlying ASD pathophysiology could be boosted by their identification, as well as the development of new and personalized treatments able to cure or, at least, alleviate the symptoms of the disease.
This review analyzes the literature data to identify a panel of peripheral markers associated with ASD, by focusing on studies which made use of ROC analysis, a way to evaluate in an optimal way both sensitivity and specificity of a putative marker. At present, however, ROC analysis has not been used extensively enough to provide an exhaustive analysis of ASD biomarkers.
It is suggested here that ROC analysis be adopted as the gold standard to assess the quality of putative biomarkers, thus providing invaluable benefits to ASD research and its clinical applications.