Ground-Truthing Validation to Assess the Effect of Facility Locational Error on Cumulative Impacts Screening Tools

Researchers and government regulators have developed numerous tools to screen areas and populations for cumulative impacts and vulnerability to environmental hazards and risk. These tools all rely on secondary data maintained by government agencies as part of the regulatory and permitting process. Stakeholders interested in cumulative impacts screening results have consistently questioned the accuracy and completeness of some of these datasets. In this study, three cumulative impacts screening tools used in California were compared, and ground-truth validation was used to determine the effect database inaccuracy. Ground-truthing showed substantial locational inaccuracy and error in hazardous facility databases and statewide air toxics emission inventories of upto10kilometers.Theseerrorsresultedinsignificantdifferencesincumulativeimpactscreeningscoresgeneratedbyonescreeningtool,theEnvironmentalJusticeScreeningMethod.


Introduction
Over the past three decades, researchers in the fields of environmental justice (EJ) and environmental health have demonstrated the existence of regional-and local-scale differences in exposure to air pollution, as well as calculated health risk and impacts of ambient air quality on the health status of residential populations. The patterns of disparity in cumulative impacts and exposure correlate with several socioeconomic indicators, including race and measures of wealth. Different causal factors contribute to the disparities in health status, but it is probable that differences in exposure to environmental hazards and risk play an important role. In California, there is particularly strong evidence indicating patterns of both disproportionate exposure to air pollution and air toxics and associated health risks among communities of color and lower income groups (e.g., [1][2][3][4]). These same highly impacted communities also face challenges associated with social determinants, such as low social and economic status, as well as psychosocial stressors, which make it more difficult to cope with exposure and health disparities.
The problem of cumulative impacts is not fully addressed by current regulatory and permitting practice, in part because of a reliance on traditional methods of risk assessment to decide, for example, whether a specific polluting facility can operate under existing law. Risk is typically calculated using single stressors and is reported on a chemical-by-chemical, medium-by-medium, and source-by-source basis. Each regulatory authority only reviews those projects or facilities within its mandate and jurisdiction, with no integrated enforcement or review action across jurisdictions. A consequence of framing and identifying priorities on single-risk magnitude and single-scope regulation in this way ignores the fact that, in many communities, residents are exposed to multiple environmental hazards and experience the cumulative impact of the attendant risks. The one-dimensional facility-by-facility regulatory approach ignores the reality of the multiplicity of factors that affect these communities and, in doing so, fails to adequately protect public health and safety.
Cumulative Impact Tool Development. The development of tools, approaches, and methodologies for assessing cumulative impacts on vulnerable communities within a cumulative risk framework is rapidly evolving. Several methods, developed by both academic researchers and state and federal regulatory agencies, have been applied in selected regions to aggregate and map the geographic distribution of cumulative impacts and to include consideration of the relative vulnerability of different communities to negative environmental impacts. These cumulative impacts tools are intended to be used by environmental and regulatory agencies for screeninglevel activities, such as planning and prioritization, and to assist in decision-making on such activities as permitting and determination of environmental remediation actions (i.e., "cleanup" levels). All of these cumulative impacts methods (1) define a set of indicator metrics that track different aspects of exposure, risk, and vulnerability for different geographic units in the region of study; (2) use spatial analysis techniques in a Geographic Information System (GIS) to "screen" areas to characterize their indicator profile; and (3) apply index scores to geographic locations to summarize their relative indicator profile and facilitate mapping and interpretation of the spatial patterns.
Requirements for Cumulative Impact Analysis. A wide variety of health and exposure indicators have been used in various studies. These include proximity to air pollution emissions and hazardous waste sources [1,3,[5][6][7][8], exposure to specific substances such as pesticides and lead [9,10], exposures to outdoor air pollution and associated health risks [4,[11][12][13], differences in regulatory enforcement and clean-up [14,15], body burden measurements [16], and the distribution of environmental benefits due to regulatory implementation (e.g., clean air, water, and access to recreational areas [17,18]).
Residents in EJ communities point out that inequality in exposure exists for many different pollutants and types of environmental hazards and that the resulting cumulative impacts (CI) have exacerbated health disparities in these communities. Many neighborhoods bear the combined, or cumulative, burden of air pollution emissions from numerous industrial facilities and land uses, as well as emissions from mobile sources on high volume roads and freeways, and emissions associated with smaller facilities that either operate illegally or are not subject to regulatory oversight. This is of particular concern where the exposures affect populations that are, because of age or chronic health conditions, particularly sensitive to air pollution. Areas where these "sensitive receptors" spend much of their time are referred to as sensitive land uses by the California Air Resources Board [19]. Sensitive land uses include schools, childcare centers, urban parks and playgrounds, healthcare facilities, and senior residential facilities. Support for Cumulative Impact Analysis. The National Environmental Justice Advisory Committee, EJ advocates, and community organizations have long argued that scientists and regulatory agencies should incorporate the cumulative impacts of environmental and psychosocial stressors when ranking the priorities for regulatory enforcement activities instead of using the traditional chemical-by-chemical and source-specific assessments of potential health risks of environmental hazards, which do not reflect the multiple environmental and psychosocial stressors faced by vulnerable communities. These stakeholders have voiced their concern and have called for additional methods to consider and include cumulative impacts in developing regulatory and enforcement priorities. Regulatory agencies have responded to this need by embracing the National Research Council's call for the development of "cumulative risk frameworks" within their scientific programs and enforcement activities.
The consideration of the effects of cumulative impacts originally gave rise to Presidential Executive Order 12898, "Federal Actions to Address Environmental Justice in Minority Populations and Low-Income Populations," in 1994, which directed the federal agencies "to identify and address the disproportionately high and adverse human health or environmental effects of their actions on minority and lowincome populations, to the greatest extent practicable and permitted by law," and to "develop strategies for implementing environmental justice." The lead agency in this effort has been US EPA, through its Office of Environmental Justice and its leadership role in the Interagency Working Group. EPA's Office of Research and Development through its Sustainable and Healthy Communities Research Program and the 10 EPA Regional offices have also developed robust environmental justice initiatives.
EPA Cumulative Impact Tools and Application Domains. EPA Region 9's in-house and externally funded development and application of cumulative impacts screening-level tools, like EJSM, are part of EPA Region 9's urban air toxics strategy, which has a major focus on mobile source air toxics. EPA Region 9's goal is to integrate EJ measures into land use and zoning development planning (i.e., residential, transportational, industrial, etc.). EPA Region 9 has previously applied cumulative impacts screening tools to federally mandated Resource Conservation and Recovery Act actions, and as a result, environmental remediation plans have been modified.
A key emphasis area for EPA Region 9 is the SJV, because it is a nonattainment area for PM 2.5 (i.e., particles less than 2.5 um in diameter) and the high asthma rates. The current projection is that the SJV will not be in PM 2.5 compliance until 2023. The Interstate Highway 5 and Interstate Highway 99 transportation corridors, along with agricultural pesticides (with particle-bound NH 3 ), are believed to be the main contributors to the PM 2.5 nonattainment and high asthma rate problems in the SJV. EPA Region 9 also has a requirement for a methodology to assess if national or regional emissions trading programs are the cause of disparate exposure impacts on vulnerable communities. The EJSM has the potential to address these EPA Region 9 priority areas and assist them in incorporating cumulative impacts screening results into decisions having environmental impacts.
The US EPA developed four cumulative impacts tools: (1) the Environmental Justice Strategic Enforcement Tool (EJSEAT), a pioneering effort from the Office of Solid Waste and Emergency Response to help it prioritize resources; (2) the Census Tract Ranking Tool for Environmental Justice (CenRANK), developed by an EPA contractor, to add data richness and analytical capability to EPA's screening efforts; (3) EJSCREEN, a screening tool released publically in 2015 to identify areas with disproportionately high and adverse environmental health burdens, using nationally consistent data, to identify communities that are potentially overburdened and to help EPA regional offices prioritize permits in these areas; and (4) the Social Vulnerability Index, developed by EPA Region 9, and designed to aggregate and display the social determinants of health as a base map for programspecific environmental information. The SVI uses US Census Tract data to determine where the socially vulnerable populations are located in EPA Region 9, but this tool does not assess the cumulative impact of environmental hazards (air pollution exposures), or their proximity, on those vulnerable populations. The ESJM, initially funded by both CARB and US EPA, was designed to address the need for this type of analysis. This research effort applied EJSM to validate and correct hazard facility locations and to use the corrected data in ESJM and the two other cumulative impacts screening methods (CEVA and CES) to assess the impact of incorrect facility location on cumulative impacts scores.
California-Based Cumulative Impacts Tools. In California, the Office of Environmental Health Hazard Assessment (OEHHA) maintains a Cumulative Impacts and Precautionary Approaches Work Group, which has advised the California Environmental Protection Agency (CalEPA) in its efforts to develop guidelines for consideration of cumulative impacts within the different CalEPA programs. Academic researchers in California have developed two cumulative impacts tools to assist in screening-level analysis in overburdened communities in California, the Environmental Justice Screening Method (EJSM) [8], and the Cumulative Environmental Vulnerability Assessment (CEVA) screening tool [20]. EJSM is a screening-level cumulative risk assessment tool, which is an analytically robust and procedurally transparent method to assess and compare the cumulative impact of environmental and social stressors across neighborhoods within a region. EJSM has an emphasis on air pollution impacts and vulnerability according to the specific recommendations of the California Air Resources Board [19] but also includes impact and vulnerability with respect to poor drinking water quality and adverse climate change effects. CEVA is a screening tool used to identify concentrations of cumulative environmental hazards in areas with low social, economic, and political resources, to help these communities prevent, mitigate, or adapt to these conditions; it has been applied to selected areas in California. CalEPA OEHHA has developed an additional cumulative impacts screening methodology called the California Communities Environmental Health Screening Tool (CalEnviroScreen or CES) [21], which is used to identify communities that experience disparate health impacts from multiple sources of air pollution. These three cumulative impacts screening methodologies differ significantly from each other in analytical approaches, model algorithms, and other details (e.g., the geographic unit for analysis, some indicator metrics used, and methods of index scoring), but they share many common features, including use of standard data sources, primarily databases, maintained by California state regulatory agencies for permitting and analysis, augmented by land use or business information from municipalities and private companies. These data sources are not only used in cumulative impacts screening, but they are fundamental components in the processes through which regulators and policy developers assess and characterize "place-based" environmental exposure and risk.
Use of Ground-Truthing in Cumulative Impacts Screening. One critique of EJ-based cumulative impacts screening focuses on concerns that the resultant output data is flawed due to locational inaccuracy, lack of completeness, and errors from infrequent updating of the input data sources and that the use of the flawed input data for cumulative impacts screening introduces significant error into screening results. To address this criticism, "ground-truthing" was used to validate these data. The term ground-truthing was introduced into EJ parlance from the field of cartography, where aerial imagery or remote sensing data, used to map surface features such as vegetation or land use, is checked or validated using observations "on the ground" [22]. Ground-truthing in the context of this research project entails verifying whether hazards indicated in regulatory databases are active, accurately described, and actually located at the reported location [23].
We used ground-truthing techniques to (a) validate the locational accuracy of established facilities and land uses from standard business/facility and regulatory databases as a way to check their accuracy before use in cumulative impacts screening tools and (b) determine the impact on cumulative impacts screening scores using unchecked/nonvalidated (with respect to locational and other errors) hazard and facility data as a test of EJSM's susceptibility to identifying false positives (i.e., recorded locations of environmental hazards that are incorrectly shown to be concentrated in a given area and falsely indicate that an area has a high air pollution "loading" or impact). After ground-truthing, the screening results were then compared using both the uncorrected data (i.e., data obtained from original source(s) "as-is") and the corrected data (i.e., data obtained from original source(s) with (i) subsequent correction applied to facility location(s) and/or (ii) removal of nonexistent facilities or addition of new facilities based on visual confirmation and GPS location) to determine the degree to which the results are affected. For example, if standard databases erroneously indicate that hazards are located or concentrated in a given area, that location might be falsely interpreted as an area of high pollution impact, or a "false positive," distorting

Methods
This cumulative impacts analysis was performed using the three cumulative impacts screening tools (ESJM, CEVA, and CES) in the San Joaquin Valley (SJV) region of Central California, comprising eight counties and 71,161 square kilometers (km 2 ) in area ( Figure 1). Two different methods of validation were accomplished. Field-based ground-truthing was completed in three cumulative impacts analysis areas, Arvin, Huron, and Stockton (Figure 1), which were selected to represent the very large and diverse San Joaquin Valley region with reasonable geographic variation and on the basis of the divergence in screening scores among the three methods in these areas. These analysis areas differ from one another, but all have a high number of reported environmental hazards. Arvin, with an area of 24 km 2 , is located southeast of population and commerce center, Bakersfield. Huron, with an area of 816 km 2 , is a somewhat isolated community almost completely dependent on agriculture and is a historically persistent environmental justice community. Central Stockton with an area of 3.4 km 2 is also an EJ community. Fieldbased ground-truthing validation of all facility information for the three test areas was conducted in which all reported facilities were visited and validated for locational accuracy and operational status.
Additional field-based ground-truthing in the three cumulative impacts analysis sites was carried out in a systematic search by driving the public roadway network, to locate and validate facility locations not included in the regulatory databases. The facility information for those sites was built in the field as geospatial data layers using ArcMap GIS software, running on a laptop computer in the vehicle and using an external high-accuracy GPS receiver. Software allowed the receiver location to position the cursor in the ArcMap session so that observer location could be tracked on the display and the GPS position could be used to correct these locations or add new features (new facilities), as needed. In each case, locational accuracy was verified and corrected if necessary. In addition, the name and type of each field-identified facility were compared to the information recorded in the standard regulatory or business/facility database. Facilities were also checked for activity to determine whether they were closed or relocated, and duplicate facility records were removed.
As a separate validation test, the reported locations of all hazardous facilities for the entire eight-county SJV region were mapped using best-known location: geographic coordinates reported in the standard regulatory or business/facility databases or the geocoded address of the facility provided by the applicable regulatory agency. Each facility location was then evaluated for locational accuracy using Google Earth Pro using the available aerial imagery, geocoding capability, and real estate tax parcel information to review and correct all facility data, verify correct location, and correct locations as needed.

Results and Discussion
Several of the hazard facility databases and all sensitive land use types used in California EJ cumulative impacts screening tools were validated including the following: (i) CARB Facility/Facilities of Interest (CARB FOI) that are industrial and commercial facilities from the California Emission Inventory Development and Reporting System (CIEDARS) statewide air toxics emissions inventory of greatest concern to CalEPA regulators because of amounts, toxicity, and possible impacts of emissions, (ii) facilities reporting to the California AB2588 air toxics "Hot Spot" inventory, (iii) California Department of Toxic Substances Control (DTSC) permitted hazardous waste handling facilities and generators (iv) autopaint and body shops from the Dun and Bradstreet Business Locator Service, (v) gas stations as reported by the California Department of Food and Agriculture Division of Measurement Standards, (vi) sensitive land uses: schools, childcare centers, urban parks and playgrounds, healthcare facilities, and senior residential facilities [19]; locations obtained from State agencies, permit databases, county real estate tax parcel information, and the Cal-Atlas Geospatial Clearinghouse.
Geography Journal 5 Table 1: Location errors discovered in field validation in Arvin, Huron, and Stockton by facility type. Note: this is a summary of the number of facilities reported in the standard regulatory or business/facility databases (uncorrected) and facilities found during ground-truth validation (corrected), as well as the number of facilities located inaccurately by at least 100 meters for each cumulative impacts analysis site.  Field-based ground-truth validation of Arvin, Huron, and Stockton revealed that location inaccuracy and error in these databases are substantial (Table 1). Facilities were found which are of the same type as those recorded in agency database. These "new" facilities were mapped and included as well. For example, the field researcher used the road network to confirm presence and activity of an AB2588 "Hot Spot" facility or childcare facility and compared its "real-world" location to the reported location and then corrected/updated the reported location if necessary. If similar facilities were found, their locations and attribute information were added to the geospatial data layer. Ground-truth validation in these areas indicated that the AB2588 "Hot Spot" database is the most locationally inaccurate one and tends to overstate the hazard exposure due to numerous facility location errors and duplicate facilities. Errors in the other regulatory or business/facility databases are significant, but not quite problematic.
The results of validation of all hazardous facility sites in the eight-county SJV area using Google Earth Pro also demonstrated considerable inaccuracy in these databases ( Table 2). One-third of CARB-FOI air toxics emitters were mislocated to a degree that would result in inaccurate cumulative impact scores using the screening tools described above (Figure 2). The accuracy of autopaint and body shops and hazardous waste facilities was considerably better but still contribute to inaccurate screening scores. Gas stations appear to be far more accurately located, as estimated by validating a randomly selected subsample.

Effect on Cumulative Impacts Screening Scores.
After corrections were made to each geospatial dataset, EJSM hazard proximity metrics and land use scores were recalculated for the SJV region to determine the impact of using nonvalidated (with errors) versus validated (errors corrected) facility information for one screening method. The Environmental Justice Screening Method (ESJM) methodology was applied, using the location corrected facility information to look for differences resulting from using unchecked (error filled)

CARB-FOI facilities Location correction
Corrected location CARB-FOI site as reported versus validated (errors corrected) information to assess the degree to which cumulative impacts score metrics changed. Any given census tract containing inaccurately located facilities could either have a higher or lower score, depending  on the degree of change in the hazard proximity metrics resulting from correction of facility locations. Table 3 shows the distribution of change in hazard proximity and sensitive land use scores for the 760 census tracts in the SJV region. A significant number of tracts have different scores as a result of error correction, and the distribution of census tracts mapped against the change in hazard score (i.e., hazard proximity and sensitive land use score (obtained from hazard proximity metrics and land use information)) is nearly Gaussian. The values from −4 to +4 represent the amount by which the tractlevel hazard score changed as a result of correcting the facility database information.
After the appropriate corrections were made to the applicable databases, a total of 247 census tracts received lower hazard proximity and sensitive land use scores. The incorrect data led to overstating the cumulative impacts in those tracts.
Similarly, 313 tracts received higher hazard proximity and sensitive land use scores as a result of error correction, contributing to understating the cumulative impacts in those tracts; there was no change during the rescoring activity in 200 of the 760 census tracts. Figure 3 shows the geographic pattern of change in EJSM scores resulting from using corrected data. The greatest understatement of hazard proximity and sensitive land use scores was in West-Central SJV, a sparsely populated and mostly agricultural region with substantial oil and gas production facilities. Census tracts surrounding population centers in the SJV (e.g., Stockton, Fresno, Modesto, and Bakersfield) were the focus of most tracts with overstated hazard proximity and sensitive land use scores.
Geography Journal 7

Conclusions
The primary goal of this study was to evaluate the accuracy of regulatory databases used in cumulative impacts screening, validate and correct the facility-level data used in the screening methodology to characterize hazard proximity, and determine the degree to which errors affect the accuracy of screening scores. Accuracy validation was accomplished using three different methods of validation or groundtruthing: (1) field-based ground-truthing validation of all reported facility information for three selected test areas; (2) finding and recording hazardous facilities in the field that are of the same type as those in the regulatory database, but not included in the database itself; (3) reviewing and correcting all reported facility locations for the entire SJV region using Google Earth Pro. Using the validated and corrected facility data, cumulative impact screening scores were recalculated using the method in the EJSM, which employs a sophisticated approach to characterizing hazard proximity based upon CARB recommendations for land use planning to provide health-protective distances buffers around certain land uses and facility types. Differences in scores resulting from using unchecked (with error) versus validated (errors corrected) information provided a comprehensive test of false positives/negatives in the entire SJV region which were significant, demonstrating the importance of error-checking and database validation in this context. Of the 760 census tracts in the study region, well over one-third ( = 247 36.5%) received lower hazard proximity screening scores; the uncorrected data led to overstating the cumulative impacts in those tracts. Similarly, 313 tracts (41.9%) had higher screening scores, with the use of the uncorrected inaccurate data which understated the cumulative impacts in those tracts. There is also a geographic pattern to the corrected screening scores. The rural west-central portion of the SJV experienced the greatest increase in score after errors were removed. Tracts in this region tend to be relatively large and sparsely populated, and agriculture and energy production is intense. Areas with lower hazard proximity scores were concentrated in the urban and suburban areas surrounding the population centers of the SJV region: Stockton, Modesto, Fresno, and Bakersfield. The locational error rate tends to be higher, and error distances tend to be greater, in rural regions of California for several reasons. Road networks are less regular and address ranges are not as uniform as in urban areas, so address geocoding accuracy suffers. Many hazard types in these regions are larger in size and, consequently, not as well represented by a geocoded point.
Finally, regulatory reporting practice is often accepting low accuracy or generalized locations, locations are commonly not verified by the government agency, and there is little to no penalty for reporting locations inaccurately or incorrectly. This highlights the need for local, regional, and state governments to maintain accurate data sources and to invest resources into assuring accuracy in order to facilitate reliable and correct cumulative impacts analyses for vulnerable communities, regardless of which screening method is used.

Disclosure
It has been subjected to Agency review and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.