The Campi Flegrei Blind Test: Evaluating the Imaging Capability of Local Earthquake Tomography in a Volcanic Area

1 Dipartimento Centro di Ricerche Sismologiche (CRS), Istituto Nazionale di Geofisica e di Oceanografia Sperimentale (OGS), Trieste, Italy 2 Dipartimento di Scienze Fisiche, Università degli Studi di Napoli “Federico II”, Napoli, Italy 3 Dipartimento di Geofisica della Litosfera (GDL), Istituto Nazionale di Geofisica e di Oceanografia Sperimentale (OGS), Trieste, Italy 4 Centro Nazionale Terremoti (CNT), Istituto Nazionale di Geofisica e di Vulcanologia (INGV), Roma, Italy 5 Géosciences Montpellier, UMR 5243 CNRS, University Montpellier 2, Montpellier, France 6 Stanford Rock Physics Laboratory, Stanford University, Stanford, CA, USA 7 Institut des Sciences de la Terre (ISTerre), The Université Joseph Fourier, Grenoble, France


Introduction
The Campi Flegrei is a volcanic area located about 15 km west of Naples. Its structure along with its peculiar morphology, which is characterized by the large depression of the caldera and a high number of volcanic cones and craters, makes it a unique attraction in the world for both scientists and tourists. The caldera was generated by two episodes of collapse due to explosive eruptions, which happened 6,000 (the Campanian Ignimbrite eruption) and 12,000 years ago (Neapolitan yellow tuff eruption), respectively [1,2]. In the last 10,000 years, the bottom of the Neapolitan yellow tuff caldera has been interested by ground deformation phenomena. These phenomena are still ongoing as evidenced by the bradyseismic crises, which occurred in the years 1970-1972, 1982-1984, 1989, 1994, and 2000. Since the area is highly populated, several studies and research projects have been carried out to improve the scientific knowledge pertinent to risk mitigation.
During the bradyseism suffered in the Campi Flegrei between 1982 and 1984, Wisconsin University deployed a network of seismological stations to record the swarm of local earthquakes. The dataset from the 1982-1984 crisis was first used for seismic tomography purposes by Aster 2 International Journal of Geophysics and Meyer [3]. During 2000-2003, the Italian Department of Civil Protection, together with the scientific coordination of the National Group of Volcanology, boosted a research program aimed at investigating the buried 3D structure beneath the Campi Flegrei and improving the information about the state of the caldera. The identification and reconstruction of the possible magmatic feeding system were one of the main goals, and two research lines were implemented, one geared towards gaining the maximum information from the existing data-for instance the 1982-1984 earthquake dataset-, the other aimed at acquiring and interpreting new data, mainly from active seismics. The first research branch-which this work originated from-preceded the second one by a couple of years, however, they subsequently went ahead simultaneously for several years.
The purpose of this work is to analyse the local earthquake tomography method's suitability to provide a reliable seismic velocity image of Campi Flegrei Earth's structure with the network geometry set up for the 1982-1984 campaign. To fulfil this target, a numerical blind test was performed by creating a (1) hypothetical 3D model of the underground structures forming the Campi Flegrei caldera and (2) a synthetic dataset of arrival times by using the same network geometry and number of earthquakes as the 1982-1984 experiment.
This study has mainly methodological value today, since the results of the other researches going ahead simultaneously have provided a more complete and constrained outline of the Campi Flegrei "system." For instance, some years later, Vanorio et al. [4] repicked the arrival times of the whole dataset. Additional insight into the 3D caldera structure around the Neapolitan bay was also provided by the SERAPIS project, an active seismic experiment carried out in 2001 [5,6]. Several years later, in 2008, Battaglia et al. [7] published a 3D image of the structure of the Campi Flegrei caldera with a linearized tomographic inversion using a combined dataset from an active seismic survey (1528 shots produced during the SERAPIS experiment in 2001) and a passive seismic campaign (606 earthquakes recorded by the 1982-1984 campaign). The obtained images were characterized by higher resolution and confidence level than those provided by previous studies. They confirmed the presence of a ring with high V P -the buried trace of the rim of the Campi Flegrei caldera-in the southern part of the bay of Pozzuoli and extended its trace inland. In addition, a body with low V P /V S ratio was imaged over the large part of the area at a depth of about 3-4 km and was interpreted as the covering layer of formations enriched in gas under supercritical conditions.
As already mentioned, the goal of this work is analyzing thoroughly the capability of the 1982-1984 dataset to provide a reliable tomographic image of Campi Flegrei Earth's structure through a numerical blind test. A 3D model of the local structure has been created, and theoretical arrival times of P-and S-waves have been calculated for about 550 earthquakes recorded by 14 receivers. The resulting dataset consists of about 5,800 P-and S-time arrivals: it corresponds to the amount of experimental data and represents roughly the overall spatial distribution of the seismic events of the 1982-1984 swarm [2]. This study is an entirely numerical experiment; however, we emphasize the fact that both model and data are strictly constrained to the real structure and the scientific information available for the Campi Flegrei area.
The blind test's organizers were mostly interested in evaluating the results provided by expert groups in solving the problem in question each using its own strategy, rather than evaluating the theoretical performance of different tomographic methods. Their point of view was that the specific choices made by a team when approaching the problem are in the very least as important as the method itself. Subsequently, this paper gives more priority to (1) the strategy adopted by any (expert) user for getting the best results, rather than analysing and comparing the method's technicalities, and (2) whether the obtained results provide some useful information on the specific features introduced in the synthetic model. The results and the indications obtained cannot be exported directly to any situation; however, they have a general significance and allow one to better understand the benefits and limits of the local earthquake tomography (LET) method and the effectiveness of the strategy adopted for the inversion.
The tomographic inversion has been performed by four independent teams, using different methods. The synthetic dataset was distributed to the participating teams, with no additional information about either the synthetic velocity model or the earthquake hypocenters used to create the dataset itself. Furthermore, teams did not exchange any information amongst themselves. The output data were delivered to the blind test organizers in accordance with a number of specifications assigned at the beginning of the test. The organizer's team carried out all postprocessing operations. As the final solution of linearized tomographic inversion depends on the a priori 1D reference (i.e., initial) model [8], the strategy chosen by each team to define the initial model is an integral part of this study. This paper is organized as follows. Sections 2 and 3 describe the 3D model and the dataset of the synthetic arrival times, which were calculated and used as an input for the tomographic inversions. Section 4 introduces the tomographic methods used and describes the most relevant details of the settings adopted for this study by each team. Section 5 describes the results, discusses the reliability of the inverted fields, and compares the results obtained by the different inversion methods to the original model. Finally, Section 6 discusses the results and draws some conclusions.

3D Model
The study volume corresponds to an area of 14 km × 12 km and 8 km in depth ( Figure 1). The area is centred on Pozzuoli Bay and includes Monte di Procida to the west, Posillipo to the east, Capo Miseno to the south, and part of the Quarto Plain to the north.
The 3D model was built by the organizing team using the Gocad software [9]. It does not include topography nor does it take into account the Earth's surface curvature, since these features are not essential for the specific area and scale of investigation. Regions are defined by boundaries and physical properties. The physical properties of interest for International Journal of Geophysics 3  Figure 2). The purpose of the organizers was to perform a test for evaluating the capability of the tomographic method to recognize a number of features (e.g., the presence of a low V S velocity volume representing a magmatic chamber at depth, some magmatic chimneys, or V P /V S anomalies representing gas-saturated zones) for a situation similar to the one which yielded the 1982-1984 dataset. With this in minid, a realistic background model has been created considering the information provided by the available studies [3,[10][11][12][13][14][15], and then adding to it some generated but plausible details. The model represents a hypothetical structure of the caldera and the surrounding Earth ( Figure 3). In particular, the model assumes the presence of a magmatic chamber, filled by fluid magma at the bottom of the volume, and some intrusions of fluid magma in the upper layers. The information about the depth of the magma chamber, at the time the synthetic test was designed, came from the papers of Ferrucci et al. [15] and D'Antonio et al. [14]. Ferrucci et al. [15] identified an interface at about 4 km depth where a strong P-SV conversion was observed. This interface has been interpreted as the seismic marker of a magma chamber roof, and together with petrological estimates of the magma chamber depth [13], led to a likely estimate of the magma chamber roof and emplacement between 4 and 6 km. Moreover, since the observed temperature field in boreholes at Campi Flegrei showed the presence of strong lateral gradients [13], we postulated the existence of some magmatic intrusions reaching shallow depths (up to 2.5 km).
For the formations that represent the background velocity field (i.e., carbonatic rock, tuff, and weakly consolidated soil), the velocity increases gradually with the depth. The magma unit has V S = 300 m/s. It takes up the base of the structure at about 7 km deep as well as a number of isolated bodies, which represent hypothetical volcanic chimneys. It should be emphasized that the structural model defined for the blind test represents only one of the possible conjectures based on the data available at the beginning of the project. The V P /V S ratio is generally between 1.75 and 1.85. For weakly consolidated soils and fluid magma, it has a value of 2.0 and greater than 9, respectively. The result of the construction of the model is a set of vector GoCad structures [9], which can be easily managed and modified. Figures 4 and 5 show the structure of the model for V P , V S and V P /V S ratio in terms of slices along horizontal planes and vertical EW-oriented sections, respectively, with the same graphical representation that will be used in Section 4 to illustrate the results of the tomographic inversion. From these figures, one can clearly recognize both the location and size of the velocity anomalies as well as the overall velocity distribution of the background medium.
The 1982-84 seismicity was mostly concentrated in a volume beneath the Pozzuoli Bay at a depth rarely exceeding 5 km. The 550 sources are localized within the same volume, however hypocenters are aligned along five different planes (three vertical and two horizontal, resp.), in order to allow an easier evaluation of the inversion results ( Figure 6). 145 earthquakes out of 550 are deeper than 4 km with a maximum hypocentral depth of 7.55 km.
It is instructive to regard the synthetic model here introduced in light of the information made available by the following recent studies. The velocity distribution represented in Figure 7 has been obtained by merging two tomographic models: the SERAPIS model, obtained from the inversion of an active seismic dataset [16], and the LET model of Vanorio et al. [4]. In the instance when the two models overlap, they have been averaged [17]. Since the SERAPIS model includes only P wave velocities, in regions not covered by the Vanorio et al.'s model a constant V P /V S = 1.8 was assumed.      Table 1 for the description of the lithological units.
This velocity model ( Figure 8) clearly shows the central depression of the caldera, filled by volcano-sedimentary materials with P velocities lower than 4 km/s. The buried rim of the caldera is evidenced as a high velocity ring at shallow depths [6,16]. At the center of the caldera, beneath the town of Pozzuoli, there is evidence of a volume with low V P /V S values, which was intepreted by Aster and Meyer [3] and Vanorio et al. [4] as related to rock materials percolated by International Journal of Geophysics  International Journal of Geophysics   condensed brines. Below this volume, the V P /V S values show a marked decrease, interpreted as evidence of overpressured gas-bearing rocks at supercritical conditions [4]. The earthquake hypocenters shown in Figures 7 and 8 consist of 3298 selected events, recorded at Campi Flegrei between 3/7/1982 and 4/14/1985. We have plotted only events with no less than 7 phase pickings, relocated in the model described above, using a nonlinear approach (NonLinLoc software) and an EDT cost function. Most of the hypocenters are distributed in two main clusters. The first has an ellipsoidal shape, centered at the town of Pozzuoli, with a WNW-ESE elongation, a major axis of about 5 km and a minor axis of about 3 km. The second, smaller cluster has an arclike shape and is located in the southern part of the Pozzuoli Bay [18].

Synthetic Dataset
For the calculation, the final model has been discretized into a 3D regular mesh with spatial steps of 50 m, with a resulting grid of 280 × 240 × 160 nodes. Since the number of receivers is much smaller than those of the sources, we have applied the reciprocity principle to reduce the overall computational time.
The synthetic dataset of theoretical P-and S-wave arrival times has been computed using the numerical solution of the eikonal equation developed by Podvin and Lecomte [19]. This method is implemented in the NonLinLoc software package developed by Lomax [20].
The synthetic dataset of arrival times consists of 3830 Pphases and 2056 S-phases. In order to make the synthetic dataset more realistic and take implicitly into account the high-frequency noise always present in the recorded signals (e.g., environmental seismic noise or high-frequency scattering effects [21]), the computed arrival times have been contaminated by a Gaussian random error, with an average value of 0.05 s. Figure 9 shows the modified Wadati diagrams [22] of the calculated synthetic dataset, the same with noise superimposed, and the 1982-1984 dataset repicked by Vanorio et al. [4], but with the largest outliers removed, respectively. The latter panels show an acceptable consistency for the purpose of this work in terms of both V P /V S ratio and (ΔTP, ΔTS) dispersion, which corresponds to about half as much as the dispersion of the real dataset. We emphasize that the random error applied to the synthetics defines a dataset with very good quality, which is what we wanted in order to evaluate the best resolving capability of the tomographic method.
In addition to the dataset of arrival times, the set of bibliography studies and relevant data used for building up the model [3,[10][11][12] were shared among the participants. Finally, a common grid for output data as well as output formats was agreed upon.
International Journal of Geophysics     As a result of the tomographic inversion blind test, each group was asked to provide the data of earthquake locations and (V P , V S ) values for a grid with a spatial step of 200 m for each direction. In addition, each team had to perform an analysis of reliability of the final results using their own tools/methods. No checkerboard test was planned, since the evaluation of the resolution power of the deployed network did not fall under the purposes of this blind test.
Before introducing the tomographic approaches adopted by the four participating teams and discussing the obtained results, let us analyse the theoretical resolution of the dataset provided as an input for the blind test. We must reiterate that we are not analysing the resolution of the 1982-1984 dataset. Figure 10 shows the ray coverage calculated for the given dataset and the true model. It is immediately apparent that the resolution capability decays quickly from 5 km down. In the XZ plane, the wide extension of one of the source horizontal alignments can be easily recognized at a depth of 3 km. The width of the resolved volume changes strongly above or below that depth, from 7 km above and 4-5 km beneath it.
A more quantitative estimate of the parts of the volume passed through by the wave paths is given in Figures 11 and  12, which show the null space map projected along Y -and Zslices. The null space value, computed by the singular value decomposition of the tomographic matrix, is considered as the best indicator of the reliability of the tomographic results [23]. The inversion is considered reliable where low null space values are estimated, while nodes with values equal to 1 are empty. The maps shown in Figures 11 and 12 correspond to the actual null space associated with the dataset provided to the participating teams and the true 3D model. It has been computed for a regular grid with inter-node distance of 400 m, a value that can be assumed as the maximum resolution achievable for the real waveforms of the 1982-1984 dataset, on the basis of the estimated background velocity model and the maximum frequency of the recorded signals. The null space maps show clearly the volume that can be imaged with good reliability for both V P and V S . Its width varies from a maximum of 6-7 km for V P (5-6 km for V S ) in the middle of the volume to 2-3 km or less in the peripheral zones. In particular, the reliability decays strongly not only below 4-5 km of depth, as already evidenced by the ray density maps, but also at the most shallow "layer" (i.e., Z < 0.5 km), as a result of the poor coverage provided by the overall geometry of stations and hypocentres. Comparing these images to the corresponding images of the true velocity model displayed in Figures 4 and 5, we argue that (1) most of the low-velocity bodies (e.g., magma chamber or several isolated bodies) will not be imaged adequately, and (2) International Journal of Geophysics   several details of the internal model will be lost, owing to the resolution limit estimated at about 400 m.

Tomographic Approaches
In this section, we introduce the tomographic methods used for this blind test and describe the most relevant details of the settings adopted by each team. For each method, we describe the following topics, in specific paragraphs: method description, grid selection, initial 1D model and locations, and inversion. A summary of the values adopted for the main parametres and settings by each method is provided in Table 2.

Disjoint Inversion and Integrated Analysis between
Tomographic Software CAT3D and Earthquake Location NonLinLoc (CAT3D + NLL) This approach consists of an iterative procedure, where earthquake location and travel time inversion are applied in separated, sequential steps for the current velocity model. Earthquakes are localized by NonLinLoc software [20], while the tomographic inversion of the arrival times is performed through CAT3D software [24]. NonLinLoc calculates travel times using the numerical solution of the eikonal equation developed by Podvin and Lecomte [19]. Earthquake location is performed using P velocity (km/s)  nonlinear search techniques that define the maximumlikelihood hypocenter among a set of nodes distributed on a regular grid. In this study, the Oct-tree, nested grid-search method developed by Lomax and Curtis [25] is used. Travel time inversion is computed by CAT3D software. It includes both velocity and interface depth estimates (the latter are used for both reflection and refraction seismic data), assessment of the reliability of the inverted structure, and different optimisation options, such as staggered and adaptive grids. The inversion algorithm uses the SIRT (simultaneous reconstruction technique) and the ART (algebraic reconstruction technique) methods [26]. The ray-tracing algorithm uses a modified version of a minimum time raytracing developed by Böhm et al. [27]. In order to increase the resolution of the tomographic image, the inversion procedure uses the staggered grid approach [28], which takes advantage of low-resolution grids with a fast and stable inversion. According to this approach, the velocity field is calculated as the average of different velocity fields calculated separately for a number of different grids, rigidly shifted in space. The resulting velocity field represents a composite image where the influence of each single grid is reduced, and the boundaries of velocity anomalies can be described in a much finer and more complex representation.

Grid Selection.
For the blind test described in this paper, we used a base regular grid defined by voxels, each measuring 920 × 950 × 400 m; in the final grid, after applying the staggered grid method, the dimension of the voxels was 460 × 475 × 225 m. The resolution analysis was guided by testing the reliability of the tomographic system (area discretization + acquisition) by computing the null space energy for different grids with respect to the same acquisition system of the experiment. For the chosen grid, the null space values are less than 0.2 over 80% of the ray covered area.

Initial 1D Model and
Locations. The initial model, used in the first step of localization and inversion, was a  low-resolution 1D depth-velocity function, represented by a stepwise gradient model ( Figure 13). The chosen V P velocity derived from general information of geological knowledge of the investigated area found in the bibliography distributed to participants (see Section 3). V S was derived from V P assuming a constant value of V P /V S = 1.8, that was inferred from the modified Wadati diagram of the input dataset ( Figure 9(b)).
International Journal of Geophysics

Inversion.
With regard to the inversion, about 10% of the events initially distributed was excluded from the process, since localization was poorly constrained or had high residual. As a result, 3338 P and 1909 S arrivals, associated to 451 localized events, were used for the inversion, with respect to the 3830 P and 2056 S arrivals and 550 events initially provided. The final V P and V S velocity models were obtained after five iterations, with rms residuals of 0.084 s and 0.106 s for P-and S-wave arrivals, respectively, and 0.092 s of rms total residual.

Method Description.
Simulps14-and the family of all the versions related to it-is a program that performs simultaneous inversion of V P , V P /V S , and hypocenter locations [29]. The code is based on the tomographic inversion method developed by Thurber [30] and modified by many others. It performs a joint inversion for hypocentres and a 3D velocity crustal model by minimizing the sources-stations travel time residuals and using, in the velocity inversion procedure, the Marquardt damped least-squares approach [31]. The velocity of the generic point is calculated by a weighted linear interpolation of the velocities at the surrounding eight nodes. The code makes a series of runs first calculating the hypocentre parameters and then the velocity anomalies. It usually stops when a F-test reports a little significance between successive iterations. The solution reliability depends on several factors, such as: (1) a realistic starting model, (2) little phase reading errors (i.e., close to 0.01 s), (3) homogeneous distribution of earthquakes and stations, and (4) a good choice of the damping parameter. SIM14 is derived from Simulps13q code [32,33], but it has been improved by introducing a full 3D shooting ray tracer based on Hamiltonian perturbation theory and paraxial ray tracing [34,35].

Grid Selection.
For this blind test, the SIM14 grid consists of 11 × 11 × 6 = 726 nodes with spacing variable between 1 and 3 km. The results of the inversion were then interpolated onto a finer grid in accordance with the organizer's instructions.

Initial 1D Model and Locations.
The initial 1D Pvelocity model was inferred from a number of previous studies (see Section 3) (Figure 13). The initial S-velocity field was obtained from V P by assuming a constant value of V P /V S = 1.80, that was inferred from the modified Wadati  diagram of the input dataset. The initial localization, that was used as an input for the tomographic software, was computed by Hypo71 [36]. For the inversion, we excluded events whose initial localization was poorly constrained or had high residual. As a result, 510 earthquakes out of 550 were used, for a total of 3575 and 1934 P-and S-arrival times, respectively.

Inversion.
The inversion was performed according to the following working scheme: (1) for V P /V S fixed at the initial value, the damping value for V P inversion was determined as the best tradeoff curve in the diagram modelvariance versus data variance [37]; (2) P-velocity inversion was performed; (3) the damping value for V P /V S inversion was determined (with procedure similar as in step 1); (4) V P and V P /V S were inverted jointly to obtain the final solution [38]. The optimal damping values were equal to 6 and 1 for V P and V P /V S , respectively. The quality of the solution was estimated by computing the resolution matrix for all nodes. Low values of resolution (near to zero) indicate that little or no information has been received from the data, and the velocity solution in the damped least squares inversion remains close to the initial model.  [30]. The main difference between the two approaches lies in the velocity model parameterization:

Tomographic Analysis
Michelini and McEvilly adopt cubic B-splines basis functions for interpolation whereas Thurber adopts linear interpolation. Furthermore, in this study, the sparse matrix inversion is performed by LSQR method [40], whereas the original version of the code uses LU decomposition. The adopted technique includes some weak V P /V S conditioning [41] to avoid the insurgence of artefacts due to diverse ray coverage between P-and S-waves rather than true anomalies. Another notable feature of this technique is the scheme to determine the amount of damping to be introduced at each iteration step [42], which permits the amount of damping to decrease while stabilizing the inversion in a consistent and objective manner.

Grid Selection.
The model grid set for the blind test inversion consists of 20 × 20 × 6 = 2400 nodes, for a total of 4800 inversion nodes (i.e., 2400 P-and 2400 S-wave velocity parameters). The grid spacing is 0.75 km, 0.6 km, and 1 km along X, Y , and Z coordinates, respectively. The shallowest layer of nodes was put at the surface (0 km) while the deepest layer is at 5 km depth.

Initial 1D Model and Locations.
The initial 1D velocity model was inferred by an analysis of the input arrival time dataset itself. To this end, a genetic algorithm was used to perform the global search of the parameter space made by P wave velocity and layer thickness for eight layers and find the model that minimizes the overall location residual for all events. Locations were calculated by the Hypoellipse program [43]. The initial model obtained in this way ( Figure 13) features a final weighted root mean square (RMS) of the residual times of 0.056 s (P and S combined).

Inversion.
The team using SIM28 thinks that a careful selection of the data is important in order to obtain robust and accurate results, since event bias or mislocations will inevitably map into velocity anomalies. With this in mind, events featuring a small number of arrival phases or poor accuracy were removed. Events with more than 10 arrival phases were selected as a basic data set for the inversion, for a total of 248 earthquakes (1960 P-phases and 1294 S-phases) out of the 550 events provided initially. Once the simultaneous inversion was performed, all the remaining earthquakes were relocated. Five iterations were performed to carry out the 3D velocity inversion. The amount of damping required to regularize the inversion was found from the tradeoff curve between the model norm and rms misfit. The final rms residual time of 0.039 s (P and S combined) reduces the initial one by about 31%. However, similar reductions were already obtained at the second iteration indicating that, probably owing to the complexity of the true model, it is difficult to perturb the initial model within a linearized iterative inversion scheme.
The robustness of the results was eventually tested against a number of different discretizations of the model. For example, the velocity model was shifted along depth, and different layers of nodes were used. The resolved models, however, have been found to be consistent among each other. In order to estimate the better resolved parts of the velocity model, we have summed up and normalized the partial derivatives at each grid node (it is analogous to the calculation of the hit counts).

Method Description. The TLR3 method (tomography, location, and relocation, version 3) was introduced by
Latorre et al. [44] and updated by Gautier et al. [42]. It consists of a delayed-time tomography method, which inverts simultaneously both the velocity distribution and the hypocenter parameters, thus providing a smooth velocity model estimated on a 3D, regularly spaced, rectangular grid. TLR3 implements an iterative linearized tomographic inversion where first arrival times are computed by solving the Eikonal equation with a finite difference algorithm [19]. Ray tracing is performed using an a posteriori ray-tracing method that is based on time gradients. More precise travel times and partial derivatives, both for slowness fields and for hypocenter parameters, are evaluated along the ray paths and stored in a sparse matrix.
Normalization and scaling of the derivative matrix are performed to remove influences of parameter units and take into account the sensitivity of the data to each class of parameters. The new linear system is then inverted by LSQR technique [40]. Both the velocity models and the hypocenter parameters are then updated and used as input for the next iteration.

Grid Selection.
TLR3 uses two different grids. The first one is related to the inversion and the second one is only used for solving the Eikonal equation and is removed once the rays have been traced. In this blind test, the first one is composed of 71 × 71 × 46 nodes in the X, Y , and Z directions, respectively, with regular spacing of 200 m. Velocity field is parameterized by trilinear interpolation between grid nodes. The second grid is composed of 141 × 141 × 91 cubic cells with size of 100 m.

Initial 1D Model and Locations.
Defining the initial 1D V P and V S models is the first step of the inversion scheme. This task is accomplished by a trial and error procedure, in which 1D P-and S-wave velocity models are randomly extracted. Earthquake locations are computed for these 1D models by Hypo71 code [36]. The model quality is evaluated on the basis of the time residuals of event localizations. Figure 13 shows the V P and V S models obtained at the end of the step above and chosen as initial models for the inversion. Earthquake locations associated with that model represent the initial locations used for the inversion.
The influence of the 1D initial model on the inverted solutions was then evaluated by a statistical analysis with an approach similar to that described in [4,44]. A family of 200 models randomly distributed around the initial 1D model was defined by perturbing the initial V P value. Then, each model was inverted and rms residuals evaluated. Despite the large variability of the initial residuals (from 0.14 to 0.18 s), results show that the residuals obtained after inversion lay within a very narrow interval ranging from 0.056 s to 0.057 s. Finally, the initial 1D model arose from the average taken from the best 20 models of the previous family.

Inversion.
We inverted the entire dataset of 550 events provided for the blind test, without any particular preliminary data selection. Inverted arrival times consisted in 3830 P-phases and 2056 S-phases. Data were kept within the inversion scheme as long as travel time residuals resulted lower than 1.5 s.
The reliability of the inversion is assessed by a resolution grid, which is calculated through a series of spike tests centred at each grid node of the final tomographic model [44]. We resampled the inversion grid in more detail into a grid of nodes spaced 400 m, and computed a synthetic set of new arrival times by applying a small perturbation (i.e., like a spike) of 400 m/s and 200 m/s to the P-and S-velocity models, respectively, at each node. These datasets represent the input data for the inversion for the spike tests. The retrieved velocity perturbation is normalized and stored into the resolution grid.
In our iterative inversion scheme, travel times and partial derivatives along the rays are estimated by integrating the slowness with a 10 m integration step. In order to normalize and scale the derivative matrix, synthetic tests using the real event station geometry provided the optimal set of weights for this tomographic study. Considering that weight for the    Figure 6. The grey shaded areas and black lines locate the virtual alignment planes of the true sources (see also Figure 6).

Results of the Tomographic Inversions
Earthquake tomography inverts observed travel times for both hypocentral locations and velocity structure. Therefore, the blind test results are analysed by looking at those two elements separately.
International Journal of Geophysics Since the initial velocity model plays an important role in linearized tomographic inversions, let us start our discussion with some comments on the initial models adopted by the four teams ( Figure 13). Despite the different strategies followed, all initial models concur with Z ≤ 3 km, and two of them (namely, CAT3D and SIM14) concur with the whole depth range very nicely. At a depth of 3-6 km, TLR3 model features a higher velocity bump, but from a depth of 6 km it is again in alignment with the other two. On the contrary, SIM28 model closely complies with both CAT3D and SIM14 20 International Journal of Geophysics   Figure 16: V P fields obtained by the different methods along vertical EW sections at Y = 5, 6, 7 km. The red dots represent the sources localized by each method near each section. Inverted models are masked by the resolution matrices that correspond to each method. See text for more details. down to 5 km, but after that depth it settles down at a nearly constant V P value of about 4.3 km/s, far off from that set by the other methods.
Let us now analyze the inverted hypocenters, which are shown in Figure 14 and can be directly compared to both the trace of the virtual alignment planes (gray shaded areas and black lines) and the true locations ( Figure 6). All methods are able to recognize the alignment planes, though with differing degrees of accuracy. In this respect, SIM14 is the least accurate; whereas SIM28 is generally the most accurate method. However, this result is also due to the strategy adopted by the SIM28 team, who retained only the best initial locations as the basic data set. TLR3 also provides very accurate results. It should be noted that, in this case, the hypocenter distribution is less correlated to the original one in the XZ projection, while the image of horizontal plane of sources features a gentle curvature and slope in the YZ projection. CAT3D + NLL also features quite a good correlation, although a larger scattering can be noticed in the YZ projection, as an effect of a shift upwards of the sources aligned on the horizontal plane at Z = 3 km. If we consider that location and velocities are not inverted jointly, the quality of the results achieved by this method is very good. Finally SIM14 provides the worst results, with the largest International Journal of Geophysics   Figure 17: V P fields obtained by the different methods along horizontal planes at depths Z = 0, 1, 2, 3, 4, 5, 6 km. Other details as in Figure 16.
scattering in all projection planes and the lowest correlation to the original alignments. The quality of the accuracy of the localizations has also been analysed by estimating the shift of the hypocentres with respect to the true locations (Table 3 and Figure 15). Again, TLR3 and SIM28 methods provide the lowest shifts and variability. Note however that both TLR3 and CAT3D + NLL tend to underestimate the source depth (i.e., estimated hypocenters are shallower than the true ones) of about 300 m and 500 m, respectively. On the other hand, SIM14 estimates a good hypocentral average value but individual locations are usually affected by quite high errors.
Let us now examine the results obtained for the inverted velocity fields. Figures 16 and 17 show some images of the V P velocity fields along three vertical equilatitudinal sections and seven horizontal planes, respectively (see Figure 6 for locations). The inverted fields are compared directly to the true one. Similarly, Figures 18, 19, 20, and 21 show the V S and V P /V S fields along the same slices. Finally, Figures 22, 23, and  24 show the difference between the inverted and true fields along the same vertical slices displayed in Figures 16, 18, and 20. In all figures, the inverted fields are masked by the resolution (i.e., reliability) maps calculated independently by each team, as already described in Section 4. For better graphical impact, resolution values, which range continuously between 0 and 1, have been preliminarily clustered into three distinct values, that is, 0.0, 0.5, and 1.0, which correspond to "not reliable", "just reliable", and "reliable" inversion, respectively.
All methods predict quite reliable tomographic inversion in the middle part of the studied volume, that is, for 4-5 km ≤ X ≤ 10-11 km, 2-3 km ≤ Y ≤ 8-9 km, and 1 km ≤ Z ≤ 4-5 km. Outside those limits, the resolution decreases quite rapidly. In particular, poor information can be gained on the structure below 4-5 km of depth, where the true model features some massive low velocity anomalies.
In the analysis of the velocity structures inverted by earthquake tomography, we pay attention mainly to two different features, that is, (1) the imaging of the background velocity field, and (2) the recognition of the anomalous bodies, such as magmatic chambers and chimneys.
In reference to the first aspect, the inverted background fields seem to reproduce the true one reasonably well up to a depth of about 5 km, with the exception of SIM14, whose V P background field underestimates the velocity of the shallow structure at 0-2 km of depth (see, e.g., the vertical slices of Figures 16, 18, and 22-24). As we will see below, this is due to the poor accuracy of the initial velocity model and the use of an overly high damping in the inversion. All models gradually fail starting from 4-5 km depth downwards, because of the progressive reduction of the number of sources; however, the reliability of the inversions decays quickly from this depth down.

24
International Journal of Geophysics In general, SIM14 tends to underestimate the velocity fields for both V P and V S (Figures 22-24). On the contrary, the other methods feature weaker deviations with both a positive and negative sign. The comparison of V P /V S (Figures 20, 21, and 24) shows that SIM28 and SIM14 are more stable but fail to recover the high V P /V S values of the magma at a depth of 4-6 km, partially recovered by TLR3 although with some instabilities. CAT3D + NNL demonstrate the highest instabilities, especially in the shallowest part of the model (Figure 21), corresponding to the yellow tuffs at a depth of 1-3 km (light brown unit in Figure 3).
Most of the low-velocity intrusions have not been solved, and we remark the main causes of this default here: (1) only a few hypocenters are located beneath these bodies, (2) ray paths tend to go through high-velocity zones, so that low-velocity volumes are poorly sampled, and (3) those bodies are too small to be solved with the assumed network geometry. Also the strong heterogeneities that the real model features at the surface, within the first km, are not imaged, because of the poor station coverage and the small number of events.
A more precise and quantitative information on the capability of the inverted models of imaging the background field is provided by Figures 25 and 26, which illustrate the average performance of the inversions at varying depths. for the inverted values demonstrates the distribution of all values (i.e., like a histogram) at the given depth. In the calculation, each value is weighted by the corresponding resolution value (i.e., 0.0, 0.5, or 1.0) to take into account the reliability properly. The true model is represented by a few, well-separated values, which refer to either the background model-note for instance the gradients corresponding to the incoherent soils and tuff formations in the upper part or to carbonates at depth-or to the anomalies introduced into it. A comparison with the model shown in Figures 3-5 and Table 1 will greatly help interpreting these diagrams. In most cases, the distribution of the inverted values is unimodal and quite well centered around the value of the background model. Loose distributions are only seen rarely, mainly for the CAT3D method, and correspond to large variability of the V P /V S parameter, sometimes with an opposite trend to the real model. None of the magma units is imaged correctly, whether in the middle of the model or at a larger depth. Figure 25 illustrates the strong influence of the initial model on the final results very clearly. This is particularly evident for SIM28, where the inverted velocity moves away from the true model at 4.5 km and follows the initial model tightly.
As a matter of fact, the whole model inverted by SIM28 reproduces the initial model closely, and this is a clear indication of an overly high damping in the inversion setup. Figure 26 shows the mean deviation between the true and inverted models at increasing depths. In addition, in this case, the resolution matrix is taken into account in the calculations. In particular, the misfit between the inverted model v(x, y, z) and the reference model v 0 (x, y, z) was evaluated as a function of depth z by applying the following formula: where the weight w(x, y, z)-a number with values of 0.0, 0.5, or 1.0-is the resolution index provided by each team and S is the generic horizontal section. All methods reproduce the V P and V S background fields reasonably well down to a depth of about 4 km. In particular, SIM28 and TLR3 feature a slightly better accuracy than SIM14 and CAT3D + NLL for V P and V S in the 0-3 km range. From a depth of 3 km, the average accuracy obtained by all methods decreases progressively due to the gradual reduction of the number of sources. The V P /V S error has a similar trend, that is, it increases at a depth greater than 3 km. The inverted models show little lateral variation compared to the true one, and no method is able to resolve the low-velocity anomalies that represent melted magma and were defined with the 3D model. This result was expected to a degree since the velocity model features very strong contrasts that could only be tackled by multichannel activesource tomography experiments and recognizing specific 28 International Journal of Geophysics phases (i.e., reflections, refractions, converted phases, etc.). In some cases, the tomographic images feature some artificial anomalies and methods suggest somewhat contradictory interpretations. While CAT3D + NLL and SIM14 predict high-and low-velocity ratios at a depth of 1-2 km and 3-6 km, respectively, TRL3 provides nearly the opposite. In general, the high V P /V S values are either ignored (as in the case of SIM28 that applies upper-and lower-limit constraints to V P /V S ) or mislocated (as in the case of CAT3D + NLL and SIM14). Amongst the tested software, the most sensitive method to V P /V S variations is TLR3, which retrieves the highest/lowest values in the low centre of the sections, closer to the true location of the anomalies. It is important to note that in the TLR3 method, V P /V S parameters are not inverted directly, but rather depend on the V P and V S fields resolved independently by the inversion procedure. As predicted by the reliability analysis, none of the tested methods resolves the true model at a depth beyond 5-6 km, due to the lack of data. In this respect, a tomographic inversion of such a dataset can hardly provide useful information about the magmatic chamber simulated at model bottom.

Conclusions
The model on which this blind test was carried out is just a fictitious representation of the Earth's structure beneath International Journal of Geophysics    the Campi Flegrei area. It does not represent the real structure, since it is not constrained by real observations in a number of details such as (1) the low-velocity anomaly representing melted magma or the magmatic chambers; (2) the distribution of microearthquakes along planes, which was defined in order to make the interpretation of the results easier; (3) the V P /V S ratio anomalies. Nevertheless, the model was conceived to represent at best the range and the mean values of the expected physical parameters at Campi Flegrei, the heterogeneity within its structure, the overall volume International Journal of Geophysics  Figure 26: Mean deviation of the inverted velocity models and V P /V S ratio from the true ones as a function of depth. Deviation is calculated for each method as the weighted rms difference between the true and inverted models along horizontal planes, using the resolution matrix as a weight. Note that a logarithmic scale is used to represent the quite large dynamics of the values adequately.
source of microearthquakes, and the capability of the whole 1982-1984 dataset. Therefore, the test of different tomographic methods on this model is important not only as a means of understanding the information that can be recovered from the real dataset but also to quantify the variability that arises from the use of different inversion methods.
In general, all the tested earthquake tomography methods provide reliable low-resolution images of the background velocity field of the Campi Flegrei area. However, none of them succeeds in the second goal, namely, to detect the presence of either a low-velocity body (i.e., the magmatic chamber beneath 4-5 km of depth) or smaller isolated bodies, representing possible magmatic chimneys and/or intrusions in our model. This failure can be attributed to a number of reasons, some pertinent to the specific case study and others of a more general nature. First of all, there is the poor coverage resulting from the overall geometry of stations and hypocentres. Most earthquakes are shallow (Z < 4 km) and, as a result, most ray paths do not go through the low S-velocity anomalies of the magma chamber. The resolving power of the tomographic method based on direct wave arrivals decreases to such a degree at that depth, that there is little that can be done with any technique to compensate for this deficit, unless other phases of the waveform are recognized and inverted.
Second, transmission tomography cannot recognize sharp changes of velocity such as those occurring at interfaces. As a result, the time delay due to the presence of an isolated low S-velocity body is spread over a larger volume of the model. The fact that the target is represented by a low S-velocity anomaly, rather than a P-velocity one, is indeed a major difficulty for tomographic approaches. Direct Sphases can be imbedded within complex waveforms, and S-delays can be easily missed or erroneously picked among converted phases. This might be quite a recurrent situation for the 1982-1984 waveform dataset suggesting a validation of the travel time dataset based on a further careful analysis of the waveforms. However, only a waveform inversion approach is likely to improve the image of the magmatic chamber beneath and other fine details, provided the ray coverage is good.
Hypocenters are located with acceptable quality, with errors in the order of some hundred metres for all methods. Alignment planes are poorly imaged by some methods (i.e., SIM14), and estimated locations are usually shallower than the true ones owing to the ambiguity between depth and origin time.
This blind test also shows, in some way contrary to expectations, that the disjoint location inversion method (CAT3D + NLL) estimates locations and velocity images with accuracy comparable to methods that perform location inversion jointly.
The philosophy behind the design of this blind test was to evaluate the effectiveness of the whole strategies adopted by different expert groups. In this regard, we can highlight the following issues. The initial model does influence the final results of the inversion, but the amount of it comes from the inversion setup, as for instance the tuning of the damping parameter. In this respect, the solutions produced by SIM28 do not move away from the initial solutions, even where possible (see, e.g., Figure 25). The correct tuning of the parameters ensures the range of the inverted velocity including the true background field well, independent of the initial velocity model (see again Figure 25), for three of the four groups. Another issue concerns the strategy adopted for gridding: each group adopted quite different choices, but this does not seem to have a relevant effect on the results. Finally, being clear on the outstanding role of the data available and the acquisition geometry, considering that only well-established methods were involved in this blind test, we can argue that a relevant, if not determinant, factor for achieving reliable results is the experience of the group in setting up the overall model and tuning the parameters, which drive the inversion.