Guidance onSetup , Calibration , andValidation ofHydrodynamic , Wave , and Sediment Models for Shelf Seas and Estuaries

*e paper is motivated by a present lack of clear model performance guidelines for shelf sea and estuarine modellers seeking to demonstrate to clients and end users that a model is .t for purpose. It addresses the common problems associated with data availability, errors, and uncertainty and examines the model build process, including calibration and validation. It also looks at common assumptions, data input requirements, and statistical analyses that can be applied to assess the performance of models of estuaries and shelf seas. Speci.cally, it takes account of inherent modelling uncertainties and de.nes metrics of performance based on practical experience. It is intended as a reference point both for numerical modellers and for specialists tasked with interpreting the accuracy and validity of results from hydrodynamic, wave, and sediment models.


Introduction
Although a need to standardise model build, calibration, and validation processes around one agreed approach is widely acknowledged, only limited guidance is available (e.g., [1,2]) and often ambiguous and sometimes con icting advice if o ered in the grey literature (e.g., [3,4]).A wide variety of di erent modelling practices are employed by consultants and academics, and frequently insu cient attention is given to the potential errors associated with the measured (and modelled) data used for model calibration and validation. is can result in poor model performance and unreliable model predictions.Without an agreed methodology and a performance standard for model calibration and validation, there is a risk that the quality of di erent approaches will vary, e orts will be wasted following ine cient or inappropriate calibration methods, and inconsistencies in methodologies will make model intercomparisons problematic.
is paper provides an evidence-based review and presents examples of calibration data sources and of model calibration and validation practices for estuarine and shelf sea models.It is intended to provide guidance to the assessment and use of model calibration data and to o er procedural clarity and simpli cation to the model calibration and validation process.In doing so, it acknowledges that some degree of compromise between the complexity of the natural system and the model representation must be reached.For this reason, the paper does not address complex modelling issues around wave-driven currents, littoral drift, and shoreline evolution where specialist models (e.g., the nonhydrostatic version of XBeach and CFD) must be employed.
Since the accuracy of the model calibration depends critically on the calibration data used, attention is given to some of the most common issues associated with data quality.
e paper also provides (a) the end users of model data more specialist guidance on modelling approaches, (b) the calibration procedures most frequently applied, and (c) the uncertainty in the model predictions.e paper draws on practical experience of modelling and expands on the earlier and limited guidance on the model calibration and validation that focus on Eulerian point-based criteria de ning model performance (e.g., [2,5,6]).It also takes account of results and recommendations from modelling case studies where calibration issues have been the focus of the work (e.g., [7][8][9][10]).
Speci cally, the paper describes (1) general factors that must be considered at the outset of all numerical modelling activities, (2) the quantitative assessment of model performance, (3) data sources and modelling guidelines for hydrodynamic, wave, and noncohesive and cohesive sediment models, and (4) morphological models.Special attention is given to one of the greatest challenges to the modelling community concerned with measuring and modelling sediment transport and associated erosion and accretion.While the focus of the work is based on practical applications of modelling shelf sea and estuarine processes, many of the issues discussed are relevant to a wide range of geophysical models.

What Is Model Calibration and Validation?
It is important from the outset to de ne the terms commonly used by numerical modellers: (a) calibration is a process which requires the adjustment of certain model parameters to achieve the best performance of the model for speci c locations and applications; (b) veri cation ascertains if the model implements correctly the assumptions made; and (c) validation seeks to establish the agreement between the predictions and the observations (e.g., [11]).Validation is achieved by running the model using data covering an alternative period and/or a di erent location without making any additional adjustment to the model parameters (e.g., [12]).Of course, the accuracy of the model outputs cannot be proved to be greater than the accuracy of the original calibration data used, and validation does not imply veri cation, nor does veri cation imply validation.However, in practice, when measured data are available for the system being modelled, validation is often blended with veri cation [11].If a comparison of measurements and model results suggests that the predictions from the model are close to the measurements, then the implemented model is assumed to be both a veri ed implementation of the assumptions and a valid representation of the system being modelled.
Irrespective of the model accuracy, the model calibration must express (a) express the level of agreement achieved; (b) express how realistic is the representation of the processes, and (c) de ne the criteria by which it has been judged as being t for purpose.e quantitative assessment of data error, accuracy, and uncertainty in models then de nes metrics against which model performance can be judged.
As an illustration of the typical calibration and validation processes applied in most coastal and estuarine models, a schematic diagram of steps followed for a hydrodynamic model is shown in Figure 1.In the initial model run, model parameters are set to the recommended values provided by the modelling software guidance (i.e., "factory settings").Critical parameters a ecting model performance (e.g., bed roughness) are then adjusted to achieve the best possible agreement between model predictions and measurements.Care must be taken to ensure the values set for these adjusted parameters are physically meaningful and appropriate [1].Achieving a good model calibration for the wrong reasons is as bad as a poorly calibrated model.
A useful rst step in the calibration and validation process is the determination of the most sensitive parameters in the model.While expert judgment can be helpful, less-experienced model users should undertake sensitivity analyses.Here, the aim is to determine the rate of change in model output with respect to changes in model inputs (parameters).To undertake sensitivity analyses, it is necessary to identify key model parameters and to de ne the parameter precision required for the calibration (e.g., [13]).Sensitivity analysis approaches can be (a) local, where parameter values are changed one at a time, or (b) global, where all parameters are adjusted simultaneously.Both approaches have drawbacks.For example, the sensitivity of one parameter often depends on the value of other related parameters so that the correct values of other xed parameters cannot be determined.In global sensitivity analyses, many simulations are required.Despite these drawbacks, both approaches provide insight into the sensitivity of the model parameters and are necessary steps in the model calibration process.However, "manual" calibration of models, where parameters are adjusted in a stepwise fashion, can be very time-consuming and ine cient.e second step in the calibration process is undertaken to reduce the uncertainty in the model predictions.Normally, this uses carefully selected values for model input parameters Advances in Civil Engineering and compares model predictions with observed data for the same conditions.In common with the process described in step 1, this is often done iteratively without any xed rule and is guided by the experience of the user and knowledge of the processes being modelled.e third step in the calibration process involves validation of the model output of interest (e.g., water level, ow speed, and direction).Validation involves running a model using parameters that were determined during the calibration process and comparing the predictions to observed data not used in the calibration.e use of automated techniques for model calibration is now widespread (e.g., [14]).Typically, autocalibration procedures rely on Monte Carlo or other sampling schemes to estimate the best choice of values for multiple input parameters.For example, the autocalibration procedure described by van Liew et al. [15] is based on the shu ed complex evolution algorithm of Duan et al. [16], which allows for the calibration of model parameters based on a single objective function.While autocalibration can provide a powerful, labour-saving tool that can be used to substantially reduce the frustration and subjectivity that frequently characterises manual calibrations, care must be exercised when using these approaches to ensure the theoretical boundaries for each speci c input parameter are not violated.
Frequently, the evaluation of postcalibration numerical model results is subjective and based on specialist interpretation of graphical output only.Examples of this approach include water-level curves or discharge time series, current vector distributions, and spreading patterns of heat and spills.Indeed, for many practitioners, a good visual t between model predictions and observations is often sucient to demonstrate good model performance without the need to quantify this further.Objective measures of model performance are also not new and are used, for example, in the Deltares (semi-) automated model calibration tools and in adjoint modelling (e.g., [17]).However, the increasing complexity of model functionality, and the use of model output by technical end users requiring information on model accuracy to reduce risks, has led to an increasing need for better guidance on how to quantify and evaluate the performance of models.A description of the calibration process applied to a biological-physical model [18] provides a useful example of typical procedures followed in the calibration process.

General Considerations
Irrespective of the model being used, there are several generic elements that require consideration prior to and during the model build phase.
ese elements will each impact on model performance and include bathymetry, bed roughness, model grid setup, the incorporation of speci c structures and features, data accuracy and uncertainty, and model boundary conditions.
3.1.Bathymetry.One of the most common problems associated with the calibration of a hydrodynamic model concerns errors in the underlying bathymetric data.e use of accurate bathymetry is pivotal in all shelf sea and estuarine modelling studies, and e ort is required to ensure that the best possible bathymetric information is used.As standard practice, the analysis of bathymetric data should ensure (e.g., through a data review of the study area) that the most recent bathymetry survey data are used.Key features and contours should be checked against historical maps and charts.LiDAR data across water surface must be discarded.Suitable grid dimensions should be determined that re ect the spatial distribution of the bathymetric data, and where data are already gridded, poor interpolations/reductions/extrapolations onto model grids must be identi ed by reference to the original data sources.
A summary of bathymetric and topographic data requirements for models is shown in Table 1 [3].Here, a distinction is made between application types, with the most exact being associated with scheme designs (e.g., ood defences) with less accuracy required for appraisal and/or strategy studies.ese distinctions are used in other tables and are useful as they de ne the accuracy of key data required to build a model for di erent applications.e correct use of the most appropriate data for a given application can save time and e ort.While Table 1 re ects the bathymetric and topographic data requirements for modelling estuaries, including speci cations for average distances between survey positions, the minimum acceptable channel cross-section spacing, survey age, and the age and resolution for LiDAR data, they provide equally useful guidelines for shelf sea models.
Careful checks on the horizontal and vertical survey datum should always be undertaken prior to any model runs, and models should always aim to use a common reference datum.Typically for vertical positions, national reference points (e.g., Ordnance Datum Newlyn (ODN) in the UK), chart data (related to the lowest astronomical tide or to mean lower low water), or mean sea level (MSL) is used widely.However, while national reference points are useful in local-scale models, MSL has wider utility in larger regional models at all geographical locations.In the UK waters, the Vertical O shore Reference Frame (VORF (http://www.ucl.ac.uk/vorf)) provides spatial maps of values that can be used to convert between vertical data.Similarly, VERTCON [19] and more recently VDATUM (http://vdatum.noaa.gov/) in the USA allow vertically transformation of geospatial data among a variety of tidal, orthometric, and ellipsoidal vertical data.In addition, satellite altimetry data can also be used to inform the o set between one or more tidal layers and the relevant satellite or geoidbased datum.
To illustrate a simple datum error, Figure 2(a) shows nearshore bathymetry from a coastal location in southern Portugal with a clear vertical datum problem. is issue is resolved in Figure 2(b) using a simple datum correction. is is a simple case for illustrative purposes only, and often datum errors are more complex and harder to correct.
Other errors can arise in hydrodynamic models due to (a) changes in charting properties (e.g., older Admiralty charts from the UK projected to OSGB which has now changed to WGS84), (b) data types, which have inherent Advances in Civil Engineering weaknesses (e.g., poorly interpolated bathymetry which may lead to an underestimation of depth and thus tidal volume), and (c) postprocessing using GIS or other "smoothing" software.To minimise the errors introduced in the model bathymetry, it is recommended initially to visually inspect the raw bathymetry using suitable software (e.g., Matlab or Fledermaus).Abrupt changes in elevations and spikes in data should be treated with caution.It is also helpful to examine gradients, and where possible, to compare interpolated data with known soundings.
Careful consideration should be given to data interpolation since interpolation routines can vary signicantly between programs, and the options available (e.g., linear, nearest neighbour, inverse distance weighted, and spline methods) can also result in signi cantly di erent answers.Furthermore, some interpolation methods are better suited to sparse data sets (e.g., inverse distance weighted) and others to well-populated data distributions (e.g., nearest neighbour).e selection of the interpolation methods should always recognise this.It is important also to consider the scale of features on the bed that requires resolving in a model.For example, large bedforms, such as sand banks, redirect ows and must be resolved in the model.Smaller bedforms such as sand waves provide a resistance to the ow that can be parameterised through the bed friction term.Eliminating the need to resolve these features individually can reduce the model run time.In other applications, sand waves may need to be resolved on an individual basis to assess, for example, migration rates and pipeline or cable routes.
Time and care spent ensuring that the underlying bathymetry has been correctly interpolated (datum, projection) and that it is free of spurious values and correctly represents the features of interest will contribute to improving model performance.Without good underlying bathymetric data, the task of trying to calibrate a hydrodynamic model will be extremely di cult, especially in shallow coastal and estuarine areas.For example, Cea and French [20] have investigated how errors in bathymetry can impact on the performance of estuarine shallow water models.ey demonstrate that correcting errors in the measured depth can be signi cantly more e cient than a "classic" calibration approach based only on adjustment of the hydrodynamic roughness of the bed.eir proposed bathymetry calibration framework may o er improved performance from the current generation of numerical models.Further guidance on the use of bathymetry in models is given by Plant et al. [21,22] and Mourre et al. [23].

Bed Roughness.
e hydrodynamic roughness of the bed (hereafter termed "bed roughness") is a primary calibration variable for all coastal and estuarine models.It is also essential for modelling other processes accurately such as sediment transport and wave attenuation.Irrespective of the method chosen for de ning bed roughness, values are  4 Advances in Civil Engineering typically manipulated iteratively by the user within the ranges reported in the literature.Any bed roughness can be generally assigned to a so-called equivalent sand roughness, k s [24].e equivalent sand roughness depends on the arrangement (pattern), distance (density), and shape of the roughness elements such as sand grains and ripples.However, in most models, bed roughness is typically parameterised by (a) a drag coe cient de ned at a speci ed height above the bed, (b) the Manning number, n [25], or (c) the Chézy number, C [26].
To illustrate the range of drag coe cient values appropriate for di erent estuarine and coastal environments, Table 2 shows empirically derived values of the drag coe cient C 100 measured at 1 m above the bed for di erent bottom types.In the absence of data to de ne the bed roughness accurately, these "typical" values are often employed in model applications.However, in many cases, this is an oversimpli cation, and care must be taken to obtain as much information as possible about bed characteristics so that appropriate bed roughness values can be assigned.
In "industry-standard" models such as MIKE21 and Delft3D, "roughness maps" can be used to de ne the spatial distribution of bed roughness values across the model domain.A good account of this approach is given by Lefebvre and Lyons [27].Figure 3 presents a typical example of a bed roughness map showing the spatial distribution of (a) the measured median grain size, D 50 , obtained from seabed samples and (b) the derived drag coe cient, C d , which accounts for D 50 and bedforms detected in a multibeam survey.
Bed roughness has been mapped by an ADCP (e.g., [28]), and high-resolution bathymetry and granulometry samples have been used by Huybrechts et al. [29] to derive bed roughness maps used in a TELEMAC model (cf.[30,31]).Recently, the use of high-resolution multibeam sonar has provided bathymetric data at a resolution of less than 1 m and revealed the details of sea bed features (e.g., DORIS (http://www.dorsetwildlifetrust.org.uk/doris.html))as well as provided information on sediment properties.e use of these data in modelling studies is currently experimental and requires high computing power to resolve the details.However, it o ers the possibility of better de ning bed roughness and thus may contribute signi cantly to reducing the e ort needed for model calibration.

Model Grid Setup.
e selection and setup of the model grid is a very important initial stage in the model build process.While some models still employ regular grid structures, most models now employ some form of exible mesh usually comprising triangular elements.is approach allows high resolution of areas of interest and lower resolution over areas where bathymetry and/or processes are largely spatially invariant.In virtually all coastal and estuarine applications, the use of exible mesh models provides the best model grid solution (e.g., [32]).
Taking a generic estuarine model as an example, some key points about model grids emerge: (a) the model grid should be designed to ensure the grid resolution can de ne the main morphological features (including structures) that could have in uence on hydrodynamics; (b) narrow channels and banks should have at least three grid cells (preferably 5) to determine the base or crest widths; (c) as far as possible areas of increased grid resolution should follow the course of the main channels, particularly in a curvilinear grid; and (d) when considering the location of upstream and downstream boundaries in the model, boundaries should not be xed too close to areas of interest.However, this may be constrained by the actual aims of the modelling as well as available boundary data.Further examples of the issues arising when de ning a model grid are given by Hsu et al. [33], Kernkamp et al. [34], Liu and Ren [35], and Maynard and Johnson [36].Table 3 shows an example of the typical model grid resolution required for estuarine models intended for studies of water levels and ow velocities [3].Table 3 indicates also the minimum number of model grid points required to correctly represent features such as channels and sand banks in the model.Similar grid requirements apply equally to coastal models.It is noted that a ne grid resolution (<2 m) is required to correctly represent the deliberate breaching of ood defences when designing managed realignment schemes.
During the process of grid generation, irrespective of the modelling software used, interpolation of the bathymetry will take place.Again using the example of a generic estuarine model to illustrate some key points, the following checks should be made on completion of bathymetric interpolation processes: (a) the gridded bathymetry must show the same characteristics as the original bathymetry; (b) the gridding process must not displace and/or narrow/constrict channels; (c) di erent interpolation methods should be assessed; (d) channels must not be widened or narrowed, particularly when these make up a considerable proportion of the estuary cross section; and (e) depths adjacent to the boundaries should be inspected to ensure correct interpolation has occurred.
A further key point to note concerns the spatial resolution in the computational grid of a given model.Typically, a model prediction is only applicable at the spatial resolution de ned by the computational grid.In contrast, the measurements, typically used to calibrate and verify model predictions, are obtained at a single location and represent the local environment only.us, when comparing a model prediction with a measured value at a point, consideration must be given to the tolerance of this spatial resolution.Most In some circumstances, data extraction from a grid cell adjacent to the measurement location may better represent the actual conditions at the measurement point if that point lies close to the boundary of a model element.Depending on the model grid resolution, it is advisable therefore to extract model outputs from all grid cells adjacent to the measurement location and to make comparisons with the observations.
Taking the modelling of uvial bedforms as an example, El Kheiashy et al. [37] discuss the selection of an appropriate model grid.In common with many modelling applications, a compromise must be reached between the resolution needed to de ne the bathymetry accurately and the consequent execution time required for a certain grid resolution.
eir study showed that the apparent bed resistance (shear stress) and bedform steepness decreased with increasing grid spacing.Increasing the grid spacing also created arti cial bedform elds giving rise to grid-dependent resistance.e model grid therefore has a signi cant in uence on the model predictions.It is therefore important to be aware of these issues when interpreting model results and to check them whenever possible against all available data sources.

Model Boundary Conditions.
Experienced modelling practitioners will ensure that the intended boundary type is being used at each open boundary and that the cell notation and order of data are correct.Indeed, most industrystandard models (e.g., Delft3D) give a visual representation of the boundaries for checking purposes.It is recommended practice to align boundaries with the dominant ow direction, 6 Advances in Civil Engineering tidal characteristics (avoiding amphidromes), waves, or geographic features.e input data to estuarine and shelf sea model boundaries typically fall into 2 types: (a) a water level and (b) a ux (discharge).A water-level boundary is normally obtained from existing models or measurements.In the case of modelderived boundary data, knowledge of how the boundary data are produced by the larger model is required to de ne the accuracy and reliability (e.g., the number of constituents used and the spatial and temporal resolution).However, a waterlevel boundary may not be applicable in areas with little or no tidal height variation.
Noting that a ux (discharge) carries momentum and water-level variations across the model domain have to generate momentum, it can be argued that a ux (discharge) boundary condition is a more robust option for model calibration purposes.However, it is usually much more di cult to describe and apply.Reliance on water levels only can lead to serious model underperformance, and wherever possible, attempts must be made to use water level and ux at the model boundaries.
Since the appropriate model calibration accuracy can be obtained, the following boundary condition issues also need to be fully understood: (a) spikes in modelled boundary data attributable to instabilities in the original boundary data and (b) the selection of boundary data from larger model domain in unsuitable locations (e.g., close to land domain or elements that dry).
A ux (discharge) can be applied at any model boundary (e.g., the point of freshwater input into an estuarine model).Generally, these data are provided by measurements or derived from a coarser-scale model.e quality of these data depends on the accuracy of the measuring device or model.It is recommended that in areas of small tidal variation, or where multiple boundaries are included, at least one boundary is of a ux (discharge) type.It may be noted that using exible grids (or nested models), the model domain can be extended to provide more robust boundary conditions owing to the large phase di erences and gradient effects across the model domain.However, in practice, many model setups use only water levels as the primary driving force, and in many applications, this proves to be successful.
It is recommended that if measured tidal levels are used as model boundary conditions, then these are checked to ensure consistent phase and amplitude with values obtained from the harmonic constituents.While there will be small di erences in amplitude attributable to meteorological effects, the phase should be very similar.

Assessing Model Performance
In engineering and environmental modelling studies, the use of quantitative model evaluation methods is perceived as providing more objective, consistent, and reproducible model validation and assessment.However, it is also self-evident that the identi cation of systematic or random errors in model results can also be detected quickly by the human eye.In practice, the assessment of model output is most e ective when both qualitative and quantitative approaches are employed.For example, in most shelf sea or estuarine applications, a combined visual and quantitative evaluation may be achieved by presenting the spatial distribution of current vectors for visual examination together with statistics that quantify di erences between measured and predicted current speeds from several locations.ese statistics can provide useful additional information about spatial coherence, correlations, and consistency and will often indicate explanations and origins of the possible di erences between the model results and the measurements.
It is also important that model results receive expert assessment, ideally against a conceptual understanding of processes in each model domain established using a range of data resources.is might include some obvious checks on current speed, phase, and direction as well as more detailed investigations of sedimentation patterns.It is recommended that the initial assessment of model performance by whatever means should be undertaken before running models for extended periods.However, the period chosen for this preliminary examination depends on the processes being modelled.For example, a model of tidal currents run over one or two tidal cycles should be su cient to determine how well the model is performing and which adjustments might be necessary.On the other hand, a model of sediment transport may require considerable time before the e ects of net sediment movement are evident through changes in the bathymetry.

Error, Accuracy, and Uncertainty of Model Calibration
Data.As it de nes the metric against which model performance will be judged, the assessment of error, accuracy, and uncertainty in the data used for model calibration is an important step in the modelling process.Indeed, the accuracy of a numerical model is governed in part by the degree of error present in the model calibration data.It is essential therefore to quantify error, accuracy, and uncertainty through understanding of the instrumentation, the instrument deployment method, and its location as well as any data postprocessing issue.
It is necessary to distinguish between systematic and random measurement errors.All measurements are prone to systematic errors resulting, for example, from imperfect instrument calibration (zero error) and changes in the environmental conditions.Similarly, random errors are usually present in a measurement or other observations and result from inherently unpredictable uctuations in the readings of a measurement apparatus or in the experimenter's interpretation of an instrumental reading or the environment.Di erent results for ostensibly the same repeated measurement are a clear indication of a random error.e error can be quanti ed by comparing multiple measurements and reduced by averaging multiple measurements.Systematic errors cannot be detected this way because they always "push" the results in the same direction.However, when identi ed, they are easier to eliminate from a data set using trend removal techniques (e.g., regression analysis).
Instruments collecting data from di erent spatial locations may also apply range-dependant spatial averaging to Advances in Civil Engineering the recorded data, leading to variable spatial resolution.For example, the horizontal averaging across spreading ADCP beams results in a measurement footprint that increases in size with distance from the instrument.Taking as an example the calibration of a 2D depth-averaged hydrodynamic model using ADCP data, it is rst necessary to derive the depthmean current from the ADCP measurement.is requires making assumptions about the vertical structure of the marine boundary layer (often occupying the region from the bed to the air-water interface) before time and spatially averaging the ADCP data to obtain a depth-mean current speed.ese data processing steps introduce errors which are di cult to quantify.ese problems are further compounded when attempting to extract a meaningful depthmean representation of the current direction, especially in areas subject to signi cant current veering (e.g., adjacent to sand banks).Furthermore, if a given measurement footprint is within a highly turbulent ow eld, then the accuracy of the mean ow measurement will be governed by the sampling time and can lead to signi cant errors if the ow is not sampled correctly at that location.With this example in mind when comparing predictions from a grid point in a model with measurements from single or multiple locations, attention must be given to spatial and temporal inconsistencies that might lead to calibration error and/or bias.

Sensitivity Analyses.
Sensitivity analyses are used to study how the uncertainty in the output from a model can be apportioned to di erent sources of uncertainty in its inputs.Sensitivity analyses are undertaken by varying input parameters (within a range, i.e., physically realistic) and examining the model response.Sensitivity analyses can be useful for a range of purposes including (a) testing the robustness of model resulting in the presence of uncertainty, (b) increasing the understanding of the relationships between input and output variables in a model, (c) identifying errors in the model by encountering unexpected relationships between inputs and outputs, and (d) simplifying models by identifying model inputs that have no e ect on the output, or identifying and removing redundant parts of the model structure.Sensitivity analyses can also help to reduce uncertainty by identifying the model inputs that cause the greatest uncertainty in the output, thereby allowing adjustments to increase the robustness of the model.Importantly, by making model results more understandable, compelling, or persuasive, sensitivity analyses can enhance interactions between modellers and the end users of modelling output.Sensitivity analyses are therefore a vital part of evaluating if a model is t for purpose, and time must be set aside in any modelling study to undertake a credible model sensitivity study.
One area of sensitivity analysis that requires special consideration concerns the sensitivity of a given model to errors in the input data (e.g., bathymetry, water level, and depth-mean current speed).
is is especially important when there are errors and/or uncertainty in more than one input data set which can result in compounded errors in the model output.For example, in an estuarine sediment transport model, errors in the water depth and/or current speed at a given location will result in an over-or underestimation of the bed shear stress.Since sediment transport is related to a power of the bed shear stress (typically quadratic for bed load and cubic for suspended load), small errors in predicted bed shear stress can result in large errors in predicted gross and net sediment transport.

Time Series and Statistical
Output.In many cases, the presentation of data in time-series format helps to reveal the goodness of t between model and observation data, with gaps between observed and predicted data indicating visually discrepancies between the model predictions and the calibration data.Calibration should aim to minimise these discrepancies, and statistical analysis should be used to quantify the goodness of t.Additionally, it is also informative to compare like-with-like values using a scatter plot showing observed versus modelled values.Some examples are provided below.
To quantify the temporal aspect of the model calibration further, statistical approaches are used to demonstrate that con dence can be placed in the model performance over temporal time scales in a clear and understandable way.e Danish Hydraulics Institute, DHI, Quality Indices Matrix calculating several goodness-of-t statistics for comparison between observations and simulated results is an appropriate methodology to adopt.When necessary, and when data quality permits, additional types of analysis may be appropriate, such as Brier skill score analysis [38] or indices of agreement (e.g., [39]).
Simple statistics that demonstrate the level of agreement between measured/observed data and model prediction at a chosen location in the model domain include the mean and peak di erences (often expressed as a percentage) and the standard deviation.In addition, there are several quality indices that can be used to demonstrate the statistical agreement between model predictions and observations (Table 4).In the table, O i and S i are the measured and predicted values of a given parameter at time t i , respectively, and N i is the total number of data points.e statistics are now de ned.
Accuracy expresses the di erence between the measured and modelled data which is de ned as dif i � S i − O i .In all cases, the aim should be to reduce the value of dif i to the smallest value practicable.Ideally, a minimum dif i should not exceed 10%, although this will be highly variable depending on the parameter being considered and the accuracy of the calibration data used in the model.e accuracy of the modelled data can also be quanti ed using the root mean square error (RMSE) statistic (Table 4).e RMSE value is often expressed as a percentage, where lower values indicate less residual variance and thus better model performance.
e bias expresses the di erence between an estimator's expectation and the true value of the parameter being estimated and can be de ned as being equal to the mean error in the data.Systematic bias re ects external in uences that may a ect the accuracy of statistical measurements.Detection bias 8 Advances in Civil Engineering is where a phenomenon is more likely to be observed and/or reported for a set of study subjects.Reporting bias involves a skew in the availability of data, such that observations of a certain kind may be more likely to be reported and consequently used in research.e agreement or otherwise between measured/observed data and model prediction time series is frequently quanti ed using the Pearson product-moment coe cient, R (Table 4).It is essential to test the statistical signi cance of the correlation coe cient.In most cases, the Pearson method (one-or two-tailed) is appropriate.In statistical signi cance testing, if the null hypothesis is true, the p value is the probability of obtaining a test statistic at least as extreme as the one that was observed.One often "rejects the null hypothesis" when the p value is less than 0.05 or 0.01, corresponding, respectively, to a 5% or 1% chance of rejecting the null hypothesis when it is true.When the null hypothesis is rejected, the result is said to be statistically signi cant.In estuarine and shelf sea modelling studies, statistical signi cance at around the 95% con dence level is judged to be acceptable for most practical applications.
A range of statistical indices of model performance has been developed (e.g., [40][41][42][43]).e widely used Brier skill score, BSS [38], and Willmott's dimensionless index of agreement [44] compare the mean square di erence between the prediction and observation with the mean square difference between baseline prediction and observation.For example, perfect agreement gives a BSS score of 1, and negative values indicate that predictions are worse than the baseline value.van Rijn et al. [45] provides an interpretation of BSS values where 0 < BSS < 0.3, 0.3 < BSS < 0.6, 0.6 < BSS < 0.8, and BBB > 8 indicated poor, reasonable/fair, good, and excellent, respectively.However, it has been recognised that the larger errors, when squared, overweight the in uence of those errors on the sum of squared errors.
is issue has recently been addressed by Willmott et al. [46] who present a nontrivial improvement to the earlier index of agreement recommended for a wide range of model performance applications.Examples of model skill assessments for estuarine models are given by Sheng and Kim [47] and Warner et al. [48].e scatter index, SI, is the RMSE normalised with the mean value.In most cases, the scatter index provides a useful indication of the model performance.However, taking wave model results as an example, the scatter index may appear to understate the skill of the model, as it tends to be large in shelf sea applications.e reason is that the RMSE of the signi cant wave height is normalised with the average signi cant wave height, which is usually rather small in shelf sea regions.For example, an RMSE of 0.25 m in the signi cant wave height in complex eld conditions seems reasonable, but if the mean value is only 0.5 m, the scatter index attains the rather high value of 50%.
e diagnostic model performance index MPI indicates the degree to which the model reproduces the observed changes of the waves.Like the scatter index, it is de ned in terms of RMSE values in the form MPI 1 − (RMSE/RMSC).Here, the de nition of RMSC is identical to that of RMSE, except that all S i values are replaced by the incident O i values.For a perfect model (RMSE 0), the value of the MPI would obviously be 1, whereas it would be 0 for a model that (erroneously) predicts no changes (RMSE RMSC) (cf.[49]).

5.
1. Data Sources.Water-level gauges and pressure sensors typically provide information on the water level relative to a de ned datum at a suitable temporal resolution (typically no more than 30-minute intervals).Ideally, water level and current information should be obtained from as many key locations within the model domain as possible, and speci cally, in areas of interest and areas of signi cant variation.Typically, errors Bias Average bias Bias ∑

Correlation
Pearson product-moment coe cient: Brier skill score (BSS): X p is the postevent condition predicted by the model, X m is the measured postevent condition, and X b is the preevent condition.Skill: index of agreement [44], where X and X are time series and time average of model and observed values BSS Advances in Civil Engineering associated with these kinds of data include (a) incorrect time references (e.g., GMT/BST), (b) errors in datum corrections (see below), (c) errors in correctly de ning the measured data locations in the model domain, and (d) instrument calibration error.Problems with the measuring device often appear as o sets and/or spikes in the measured data.Spikes should be either removed or substituted with arti cial data.Interpolation over large gaps in the data should not be attempted, and alternative data sources with better temporal cover should be sought.
To de ne the con dence limits of the measured data, a quality review is required.is may result in the rejection of some data, or the adoption of other data with stated caveats.
e more data that are available (depending on quality, format, and spatial and temporal resolution), the more reliable the model calibration is likely to be.To minimise potential uncertainties in model performance and to optimise model calibration, common misunderstandings and typical errors and uncertainties in hydrodynamic model input data are described below along with some suggested approaches which can aid model setup and calibration.

Water Level.
A model calibration for water level should include examination of amplitude, phase, and asymmetry.Speci cally, the test should look at (a) di erences in maximum and minimum surface elevations; differences in tidal phase, at high and low water; and RMSE (noting that this is not corrected for bias, and unless the bias is insigni cant, this parameter can be di cult to interpret), (b) bias, and (c) scatter index (SI).It is recommended that the minimum-level model performance required for shelf sea areas is (a) water levels to within ±0.10 m (or to within 10% and 15% of spring and neap tidal ranges, resp.) and (b) timing of high water to within ±15 minutes.For estuaries, it is recommended that the minimum-level model performance required is (a) water levels to within ±0.10 m at the mouth, ±0.30 m at the head (or to within 10% and 15% of spring and neap tidal ranges, resp.) and (b) timing of high water at the mouth to within ±15 min, ±25 min at the head.

Current Speed.
In 2D depth-average hydrodynamic models, current speed predictions should be examined with respect to amplitude, phase, direction, and asymmetry.Speci cally, the test should look at (a) di erences in peak ow speeds (ebb and ood tides), (b) mean ow direction, (c) RMSE, (d) bias, and (e) SI.However, appropriate depthaverage current speed values must normally be derived from either point measurements at some reference height in the water column or measured vertical current pro le data (e.g., ADCP data).In both cases, depth-average current speed can be calculated using the 1/7 power law (e.g., [50], p. 49) or similar.Normally for 3D hydrodynamic models, ADCP data can be used directly for calibration at one or more levels in the model.However, if the model layers are large in vertical extent, they may span one of more ADCP measurement bins, and the 1/7 power law or similar must be applied to interpolate an appropriate current speed value for the model layer.
It is recommended here that predicted current speeds from 2D and 3D hydrodynamic models in shelf sea areas and estuaries be less than ±0.20 m/s (or ±10% to 20%) of the measured speed.To express the accuracy of tidal current speed predictions by models, Cefas (www.cefas.defra.gov.uk/media/.../report-on-rst-asmo-workshop.pdf,accessed March 2014) expresses performance in terms of error in the maximum predicted velocity so that errors of < 0.05 m/s, < 0.1 m/s, < 0.2 m/s, and > 0.2 m/s express very good, good, moderate, and poor performance, respectively.
Results from statistical analyses of model performance need to be interpreted with care.e RMSE value provides a quantitative measure of how good the model ts the data based on the mean of the data.However, if there is signi cant bias in the data, then the goodness of this t is not an appropriate statistic to use.It is recommended here that bias < 0.2, SI < 0.5, and RMSE < 0.2 demonstrate a statistically signi cant t.

Current Direction.
Since current direction is derived from vector quantities, it cannot be treated in the same way as other parameters (e.g., speed).However, the accuracy of predicted current direction can be examined using time-series plots and quanti ed, for example, using bias and SI statistics.To remove ambiguity from current direction data, the following steps are recommended: (a) detect whether the absolute difference between the directions is greater than 180 °; (b) if it is greater than 180 °, then add 360 °to the lesser direction before subtracting the greater direction; or (c) if it is less than 180 °, then calculate the absolute di erence between the directions. is method returns an absolute (positive) value describing the difference in directions which will be always less than 180 °.For practical applications, it is suggested that preserving the sign (negative or positive) of the direction di erence is not necessary, and it prevents a meaningful mean bias to be calculated from those di erences.Once the absolute di erence between the directions has been calculated, it is possible to calculate the bias.For shelf sea areas and estuaries, the minimum-level model performance is recommended here to be ±10 °and ±15 °, respectively.

Bed Shear Stress.
Except for some specialist research instruments, for example, a eld-deployable shear plate prototype reported by Oebius [51] and laboratory-based shear plates reported by Grass et al. [52] and Rankin and Hires [53], no reliable direct way of measuring the bed shear stress is yet available.For most practical applications, the use of measurements to calibrate/validate bed shear stress values predicted by a model is therefore not possible.
When considering bed shear stress in the context of hydrodynamic and/or sediment transport, it is critically important to distinguish between the skin friction component of total bed shear stress responsible for sediment mobilisation and transport and the form drag imposed on the ow by pressure losses in the wake of bed obstacles such as bedforms.Most models predict the total bed shear stress using the quadratic stress law. is relates a depth-average ow speed to stress via a drag coe cient that characterises the hydrodynamic "roughness" of the bed.For skin friction bed shear stress, the roughness parameter expresses the drag 10 Advances in Civil Engineering attributable only to sediment grains.Form drag (in part responsible for maintaining suspended sediment status) is then obtained through a partitioning approach.It is very important to understand how a model deals with drag partitioning and to use any resulting estimate of bed shear stress correctly.Soulsby [50] provides a clear account of bed shear stress components and their calculation and application.
When dealing with subgrid-scale bedforms, it is normal to parameterise bed roughness using a friction coe cient or an equivalent grain roughness.is can vary spatially (and in some cases temporally) and provides a means of moderating or enhancing the local bed shear stress and thereby "tune" the model against observational ow data. is needs to be undertaken with care to avoid implementation of unrealistic friction coe cient values, and guidance on appropriate friction coe cients should be sought (e.g., software Guides for the model being used; Soulsby [50]).To avoid signi cant over-or underestimation of sediment transport, it is recommended that bed shear stress requires estimation to within ±0.05 N/m 2 for shelf sea and estuarine models.However, small errors in bed shear stress can be compounded over time, especially in morphological models.It is also noted that bed shear stress data can be usefully postprocessed to obtain estimates of bedforms and bed load and suspended transport using a range of empirical formulae [50].However, these estimates are constrained by the data used to generate them and the accuracy of the algorithms used to estimate hydrodynamic stresses.
A simple illustration of a depth-average hydrodynamic model calibration using bed roughness is shown in Figure 4. Figure 4(a) shows time series of measured and predicted current speed at locations P 1 and P 2 in the mouth of a small tidal inlet.In this initial model run, a drag coe cient, C d , value of 0.035 is assumed, leading to an underestimation of the current speed by the model.Reference to available bed sediment data suggests that a C d value of 0.02 is more appropriate resulting in much better agreement between the measured and predicted current speed values (Figure 4(b)).However, it is also noted recently that erosion had occurred in the inlet since the last bathymetric survey.Iterative adjustments to the water depth in a subsequent series of model runs nally resulted in very good agreement between the measured and predicted current speed (Figure 4(c)).e lowering of the bed of the inlet channel by 0.45 m was subsequently con rmed by a repeat bathymetric survey undertaken after the modelling was completed.
As a further example of calibration targets to achieve, required hydrodynamic model performance statistics for estuarine ooding models from Defra/EA [3] are shown in Table 5. e statistics include RMSE for storm surge elevation (h surge ), RMSE for high water levels (h max ), the tolerances for predicted peak water levels, RMSE for ow velocity (U), the tolerances for predicted uvial inputs (Q), the ood area required to be predicted correctly for two or more historical oods (A), and the predicted ood depth error (d err ).While these statistical tests are speci c to the ooding application and are exacting since ood predictions must be accurate, they are typical of the model performance criteria that should be used for all shelf sea and estuarine models.

Visual Observations.
Visual observations of wave parameters (height, period, and direction) taken from ships of opportunity are sometimes available for long periods (decades).is data source has clear limitations and many potential errors, particularly in stormy weather (cf.[59]).Other signi cant limitations include the number of observations and extent of data coverage which is typically limited to shipping lanes.However, in the past, there are many areas of the world where wave measurements by other means are absent and visual observations may be the only source of information.

Buoy and Platforms.
Surface-following buoys are the most common instrument used to measure waves, with deployment depths between 10 m and a few hundred metres (cf.[56,57,60]).ere is a large variation in the quality of the data available from these devices depending on their age and type.Typically, the latest devices can capture an estimate of the main wave parameters (signi cant wave height, H s , mean and peak period, T m and T p , and the related directional information such as mean direction, mean directional spread, kurtosis, and skewness for the full 2D spectrum).Typically, the measurements are taken at 1-hour intervals.Wave-measuring buoys are accurate instruments, and the related error for H s is usually only a few percent.Uncertainty occurs due to sampling variability and resolution of the frequency distribution (peak periods).In the high H s range, the buoys tend to "slip" around the highest crests.In doing so, this introduces a negative bias in estimation of the higher wave height values.
e altimeter provides information on wind speed and wave height, and the scatterometer provides a wider band of information on wave and wind parameters.However, in areas of complex geometry (typically shelf sea and estuarine areas), satellite data usually provide a poor estimate of sea state due to the strong spatial gradients which cannot be well resolved by the satellite sensors.Other limitations include poor temporal coverage due to satellite overpass frequency which can prevent acquisition of high-frequency time series for a chosen location.However, developments reported by Young et al. [62] demonstrate clearly that useful global wave data sets can now be assembled using data from a range of remote sensing platforms and that these data have high utility in regions of the world where wave data are scarce.Wave data can also be obtained using HF (e.g., [63]) and X-band (e.g., [64]) radar systems and through the use of video (cf.Argus Video [65]).Although these approaches require some calibration, each has a capability of measuring nearshore waves and can help calibrate and validate wave models in complex regions where re ection, refraction, and di usion processes may be present.

Performance Guidelines.
Typically, for waves, the required model performance at the calibration and validation stage is judged to be acceptable if the wave model outputs are biased to within (a) ±10% of the mean observed height, (b) ±20% of the mean observed period, and (c) ±15 °of the mean observed direction.Considering design, appraisal, and strategy applications, Table 6 provides practical wave model performance guidelines concerned with model resolution, minimum record (or hindcast) lengths required to de ne extreme wave statistics, and RMSE values for H m0 and average peak H m0 [3].
ese wave model performance statistics are intended only a guide, and often more stringent agreement between observed and modelled data may be required.Equally, these criteria might be too exacting for all regions of the modelled area.Meeting these criteria for at least 90% of positions/time combinations is likely to be a less stringent and acceptable criterion in most circumstances.In cases where waves from more than one direction are present simultaneously (e.g., swell and wind sea), mean wave direction is meaningless and reference must be made to the directional wave spectra to characterise the observed and modelled wave eld.Scatter plots and correlation statistics are also useful to demonstrate agreement between measured and modelled wave direction for multidirectional sea states.It is also helpful in some circumstances to examine directional wave spreading since many third-generation spectral wave models tend to underestimate this parameter (cf.[66,67]).
Examples of useful plots that help assess wave model performance are shown in Figure 5 which shows measured H s , T p , and direction data from the SWAN model [68] and measured values from a Directional Waverider buoy.Good agreement between the model and the observations is demonstrated.Figure 6(a) shows a scatter diagram of ese diagrams both help identify agreement or otherwise between measured and modelled wave data, and it is recommended that these visual checks are used when evaluating wave predictions.
In the case of spectral wave models, it is also helpful to examine the frequency domain di erences in the energy distribution between model and measured spectra.e same approach applies to directional wave spectra, noting however, that it is unusual to have measured wave spectra from more than one location in the model domain.

Sediment Models
Accurately simulating the behaviour of sediments in numerical models presents one of the greatest challenges.e principal aim of sediment models is to reproduce the observed spatial and temporal variations in observed erosion and accretion.Here, guidance is provided for the calibration of numerical models for sand (sediment coarser than sand (e.g., shingle) cannot be represented reliably through 2-dimensional modelling) (median grain diameter, D 50 > 63 µm) and silt/mud (D 50 < 63 µm).Attention is rst given to the essential data required to successfully calibrate sediment models of estuaries and open shelf sea environments.
e methods used to measure bed load, suspended load, and net sediment transport are reviewed brie y, drawing attention to potential errors and uncertainties that must be considered in the model calibration process [69].e issues associated with the calibration of cohesive, noncohesive, and mixed grain-size sediment models are then discussed.
It is emphasised from the outset that a primary requirement of all sediment modelling is accurate information about the physical properties of the sediment (grain-size distribution, bulk density, porosity, etc.), bedforms (active and moribund), and the spatial distribution and thickness of the sediments.Also, sight should not be lost of the fact that although the physical characteristics of sediments may be well expressed, biological mediation and the behaviour of some cohesive sediment remain di cult to parameterise in models [70], and although some of these problems can be overcome using in situ measurements (e.g., eld umes [71]), these are costly to deploy and are not normally undertaken in practice.

Data Sources.
Obtaining suitable data of su cient quality for the calibration of a sediment model is a widely recognised challenge [72].In addition to sediment data obtained directly from in situ water and bed samples (grabs), typically, there are two primary types of data which are   Advances in Civil Engineering required for the calibration of sediment models: (a) measurements of the sediment transport ux (bed load and suspended load) over time scales of a few tidal cycles and (b) measurements of bed-level changes attributable to local erosion and accretion to provide information on net sediment transport over a period of weeks and months.A comprehensive review of instrumentation used to measure sediment transport is given by Williams [73].

Measuring Bed Load and Bedforms.
In estuaries and shelf sea environments, bed load is the dominant mode of sediment transport for sand.Sediment traps, frequently used to measure bed load in rivers, have been deployed in estuarine and shelf sea environments with mixed success (cf.[74]).e Arnhem, Helley-Smith, and Delft Nile samplers are the most commonly used devices owing to their robustness and ease of handling in the eld.However, their accuracy depends on the number of samples collected which may be restricted by high analysis costs.
e method involves deploying a quantity of a material at a known location and subsequent sampling campaigns on a grid of sample positions to determine the dispersion of the material.Both approaches use materials with the same dynamic behaviour as the natural sediments and with su ciently distinct characteristics to make it easily detectable in very low concentrations (http://www.partrac.com/(accessed on 1 August 2014)).Sediment tracing techniques have values in studies examining the sediment ux and have been used e ectively to study dredging impacts and disruptions to sediment supply attributable to structures.A comprehensive review of tracing techniques and options is given by Black et al. [76].
Passive acoustic techniques using hydrophones to record the sediment-generated noise, SGN, arising during bed load transport of coarse sediments have been used (e.g., [77]) but to be e ective, they require objective calibration which can be very costly.However, improvements to processing software and computing power now allow automated analysis of video images to detect particle displacements at subsecond temporal resolution for the entire eld of view and the visual analysis of bed load images in providing useful data.Attempts to quantify bed load have also exploited the bottom tracking feature of ADCPs in combination with conventional pressure di erence samplers.Together, these instruments can be used to determine the bed load transport velocity and bed load transport rate, respectively (e.g., [78,79]).Rates of bed load transport have also been inferred from rates of bed form migration measured using rotary sonar devices (e.g., [80,81]).At a much larger scale, remote sensing techniques have been applied to link large bedform migrations with bed load sediment transport rates (e.g., [82]).

Measuring Suspended Load.
Only very ne sand, silt, and mud are transported in suspension in estuarine and shelf sea environments.In the simplest approach used to quantify the suspended load, water samples are collected in situ to determine the concentration of suspended particulate matter (SPM) and the grain-size distribution either at the surface or at a speci ed depth in the water column using, for example, triggered water bottles or pump sampling (cf.[83,84]).Samples can be collected either at discrete times or at set times throughout a tidal cycle.
ere is a low-tomoderate level of certainty in the resultant SPM data owing to potential errors in the way the water samples are collected, the short temporal sampling period, and the presence of varying quantities of organic particles.e deployment of colocated current meters enables the sediment ux to be determined, which in turn can provide useful information on sediment resuspension and settling velocity.
Turbidity meters can provide continuous or discrete measure of SPM concentrations by detecting the attenuation of light passing through the instrument's sampling volume.
ey are best suited to suspensions of silt and clay-size particles.Self-logging turbidity meters are capable of recording accurately turbidity at a single depth within the water column for long periods [85].Turbidity data are also obtained using a CTD probe (conductivity, temperature, and  Advances in Civil Engineering depth) equipped with a turbidity sensor.In estuaries and shelf sea locations, CTDs are normally lowered and raised through the water column for a period of 12.5 hours to provide information on the temporal changes in the SSC pro le during a tidal cycle.However, there is the potential for the sensors to become fouled over time giving erroneous data, especially if the sensor becomes exposed during low water and the optical systems are compromised by sediment and/or biological lms, leading to unrealistically high turbidity values.In addition, the gain setting (sensitivity of the instrument) must be correctly adjusted to accommodate the range of SPM concentrations in the area.For example, turbidity measurements could reach the upper limit of the instrument if the gain setting is too low or barely register turbidity if the gain setting is too high.is problem can be overcome using an instrument with a logarithmic response which allows measurements of SPM spanning several orders of magnitude.Although overcoming the problems associated with saturation and aliasing, the overall instrument precision is reduced as a result.Optical turbidity instruments are calibrated using primary solutions such as formazin, and turbidity is expressed in formazin turbidity units (FTU) or nephelometric turbidity units (NTU).e main problem with this approach arises when conversions are made between FTU (or NTU) and in situ water samples where di erences can be as large as ±200% [86].Furthermore, the material in suspension may be a complex mixture of organic and inorganic particles which adds further complexity to the conversion between FTU and SSC.Calibration is therefore required to get the concentration of SPM into meaningful units for sediment modelling purposes (e.g., mg/l). is can be achieved either by collecting water samples at speci c times to calibrate the measurements or by calibrating the instrument in laboratory conditions for a range of concentrations before and after deployment.It is important that calibration is performed over an applicable range of SPM concentration values.However, there are additional problems attributable to occulation when measuring muddy sediments.In these cases, water sampling can destroy the delicate structures, and changes in temperature/salinity can enhance or reduce occulation potential.Both factors can lead to errors, and thus, these kinds of data must be treated with caution in the context of model calibration [87].
Optical backscatter sensors (OBSs) measure turbidity and suspended solids concentrations by detecting infrared light scattered from SPM (cf.[88,89]).OBS instruments are best suited to suspensions of silt and clay-size particles.e response of the OBS sensors strongly depends on the size, composition, and shape of the suspended particles, and calibration like that used for turbidity sensors is required to obtain SPM concentration data.OBS instruments are subject to the same problems with biofouling and other optical contamination as turbidity sensors (e.g., [90]).SSC pro les can be obtained using vertical arrays of OBS.Further information is given by, for example, Kineke and Sternberg [91], Hoitink and Hoekstra [92], and Boss et al. [93].
Aerial and satellite remote sensing imagery can be used in some circumstances to indicate the advection rate and direction of suspended sediment plumes in the surface and near-surface layers of the water column.Remote sensing algorithms have been widely used to extract information on suspended sediment concentrations from multispectral sensor data (e.g., [94,95]).
For sand-size particles, the use of multiple-frequency acoustic backscatter (ABS) to measure the concentration of suspended sediment is becoming more widespread.Inversion techniques can be applied to obtain suspended sediment concentration (SSC) pro les directly (e.g., [96]), and information about the grain size in suspension can also be extracted (e.g., [97]).Typically, these instruments measure SSC at intervals of 1 cm up to a few metres above the bed where the bulk of suspended sediment is present.SSC pro les can also be derived from ADCP data, albeit with less spatial resolution, using a similar acoustic inversion technique (e.g., [98]).While one or more samples are required for calibration and measurements are spatially averaged, the instrument can provide useful SSC pro le information over extended periods.

Estimations of Net Sediment Transport.
In many modelling studies, the required outcome concerns the prediction of net sediment transport over periods of days, weeks, or months.
ere are several useful data that can assist the model calibration process for this aim.In areas where frequent (e.g., annual) maintenance dredging is undertaken, information is likely to be available to describe the frequency and volumes of sediment removed.ese data can be used to de ne changes in bed levels (e.g., accretion amounts and rates over known periods), and through comparisons between predicted accretion rates and rates derived from dredging data, it may be possible to calibrate a sediment model, albeit with limited accuracy (e.g., [99]).In addition, dredging volume data can also be used to provide an indication of the interannual variability of accretion and guide the modelling process.However, owing to sediment loss during the dredging activities and to uncertainty about the bulk density of the material removed, these measurements may not be as accurate as might be desired and should only be used to provide an indication of the volumes of sediment accreting in the area.It is not possible to attribute accretion to a mode of transport, and thus, sediment formulae that predict total sediment transport must be used.
Several acoustic systems have been developed to image the bed at a large scale including echo sounding devices and side-scan and multibeam sonar (e.g., [100][101][102]).ese data can be used to determine net sediment budgets and transport pathways and assist model calibration (e.g., [103,104]).Repeat subtidal bathymetric surveys can provide valuable information on bedform mobility from which net sediment responses can be determined (e.g., [105]).At the scale of estuaries, Mason et al. [77] illustrate how areas and volumes of sediment accretion and erosion can be estimated using the waterline method employing remote sensing and hydrodynamic modelling.Recent advances in LiDAR now make it possible to penetrate water to depths exceeding 10 m provided water clarity is good enough and thus allow Advances in Civil Engineering subtidal survey opportunities (e.g., [106]).Monitoring of large-scale changes in morphology and/or bathymetry in coastal and estuarine environments brought about by sediment mobilisation, transport, and accretion can also now be measured routinely with systems such as ARGUS (http: //www.planetargus.com/(accessed on 1 August 2014)) and X-band radar (http://www.oceanwaves.de/(accessed on 1 August 2014)) (e.g., [107]).Although remote sensing would never be selected to generate a primary bathymetric data set, it has been used in situations where monitoring of rapid bathymetric changes may be required (e.g., following beach nourishments, breaching, etc).Bathymetric and topographic survey data are obtained at irregularly spaced locations.To make the data usable in a numerical model, it is necessary to use an interpolation routine to transform the data onto a grid of regularly spaced data points.Care must be taken to select the most appropriate interpolation method as any error will impact on calculations of change in bathymetry and topography.
LiDAR data are especially useful for intertidal areas.Although spatial positioning is accurate (typically ±5 cm), the vertical accuracy is at best ±20 cm.Furthermore, standing water on the beach can result in spurious data, and signi cant postprocessing may be required.Although the use of LiDAR to determine accurate accretion and erosion rates is not recommended, it does provide extensive (and rapid) spatial cover which may prove to be useful in several applications.In some instances, an assessment of changes in beach topography might be enhanced through reference to xed, identi able structures (e.g., quay walls and engineering structures) which can be used to calibrate repeat surveys.

Sediment Transport Models.
Shelf sea and estuarine models normally provide output de ning the predicted cumulative erosion/sedimentation for a stated bulk density giving the cumulative change in bed level over the model period.Total sand transport is usually expressed as a net value over a speci ed period, allowing transport vectors to be plotted which may be comparable with information directly available from the literature.
A wide range of sediment transport formulae are available to predict bed load transport, suspended load transport, and total load transport of noncohesive and cohesive sediments (e.g., [108]).All are derived to represent the best t to empirical data sets derived in the laboratory or in some cases from the eld.
e sediment calibration data available will determine the accuracy of the model and limit how much validation is possible.Typically, a sediment model will be calibrated using SSC data, with validation utilising measured sedimentation rates (e.g., [109]).It is important to keep in mind that the sediment transport model is driven by modelled hydrodynamics and that highly nonlinear relationships exist between bed shear stress, ow turbulence, and sediment mobilisation, transport, and accretion.us, any limitation with the initial hydrodynamic calibration could impact signi cantly on the sediment model.It is therefore critically important to obtain the best hydrodynamic calibration possible.
When modelling sediment transport, it is important to recognise the heterogeneity of the seabed and the homogeneity of most sediment transport models.It is therefore essential that roughness maps previously described are used to characterise as accurately as possible the physical properties of the sediments (grain roughness) and the morphology of the bed (form drag).It should be remembered when interpreting model outputs that, since sediment transport formulae are empirical and are based on a limited amount of calibration data from laboratory and/or eld studies, the prediction of sediment responses to hydrodynamic forcing is at best limited to accuracy of no more than a factor of two ( [110][111][112]).

Sediment Properties.
As the physical and dynamic properties of noncohesive sediments are less complex, the amount of information needed to setup and calibrate sand transport models is less than that required for mud.In the absence of measurements, the speci c density, porosity, and bulk density for quartz sand are assumed to be 2650 kg/m 3 , 0.45 kg/m 3 , and 1460 kg/m 3 , respectively.e median grain diameter (D 50 ) is normally measured using grain-size analysis of samples or published data (e.g., BGS (http://www.bgs.ac.uk/discoverymetadata/13605549.html)).While the spatial distribution of sand-sized sediment and information on the depth of any deposit is helpful, it is rarely available, leading to ambiguity about sediment source limitations.Information about bedforms is available either from observations described above or generated through theoretical equations linking bedform dimensions with the sediment grain size and the hydrodynamic regime [50].
e threshold bed shear stress, a critical parameter in sediment models de ning the bed shear stress required to mobilise the sediments, is normally calculated using a selected empirical formula (e.g., [50]) and expressed as a Shields parameter.
For cohesive sediment transport models, the following sediment data are normally required: sediment density; grain size; settling velocity; and the threshold bed shear stress for erosion and deposition.If this information cannot be obtained from in situ measurements and/or analysis of samples, Whitehouse et al. [113] provide a good account of formulae for deriving some useful properties of cohesive sediments.It is common practice to measure "wet" sediment density in situ using a density probe which can then be converted to "dry" density (e.g., [113]).To obtain the correct dry density required by some models, it is recommended that the porosity factor is changed until the wet density is the same as the measured density.Typically, porosity values between 0.75 and 0.98 should be applied for sediments consolidated for less than 1 year and then 0.25 to 0.75 for longer periods of consolidation.
Typical settling velocities for mud range from 0.003 m/s to 0.0001 m/s and can be calculated based on the grain size (if known) using empirical formulae (cf.[113]).However, caution must be exercised when using this approach due to occulation, which can increase signi cantly the size (and hence settling velocity) of particles in suspension.ese may also incorporate organic matter in their matrix, thereby 16 Advances in Civil Engineering a ecting the density (and possibly reducing the settling velocity).Furthermore, sampling of suspended sediments in situ frequently destroys the ocs or alters signi cantly their physical properties.ere is no simple solution to this problem, and modelling assumptions and limitations must be stated clearly.Although knowledge of mineralogy, salinity, turbulent kinetic energy, and water temperature makes it possible to calculate the potential oc size, this is further complicated by temporal and spatial variations in these parameters.It is also noted that many mud models use the SSC as a parameter for de ning the settling velocity, not grain size.
e critical bed shear stress for erosion, τ crit_E, can be estimated using the Mitchener et al. [114] formula which accounts for sediment density.Several methods to measure τ crit_E exist and comprise laboratory devices to analyse samples from the eld and carousel umes for eld deployment [71].
ese can be very e ective and allow investigation of how τ crit_E changes as erosion of a given sample proceeds (normally increasing).As with most sediment dynamics, extreme care should be exercised when attempting to parameterise physical properties and processes using empirical approaches.
e critical bed shear stress for deposition, τ crit_D , is frequently used as a calibration parameter.However, it is highly dependent on the local conditions.Generally, values between 0.1 N/m 2 and 0.3 N/m 2 provide e ective calibration settings for mud models.It is important to note that the default value in some models may not be appropriate for a case (e.g., in Delft3D τ crit_D � 1000 N/m 2 and must be changed prior to any model runs).Examples to guide the use and calibration of cohesive sediment models are provided by van Kessel et al. [115] and Carniello et al. [116].7.In this case, SSC is measured continuously using a turbidity instrument.A second example of model output and SSC calibration data is shown in Figure 8.In this case, SSC data were obtained from water samples.Both Figures 7 and 8 demonstrate that the general pattern of SSC is similar for the modelled and measured data.For most applications, the aim should be to achieve a model calibration of ±20% of the measured average concentrations.In areas where time series of SSC measurements are available from multiple sites, a calibration level of ±30% for average SSC at most of the sites would be deemed as a good level of calibration.If there are only discrete values of SSC from water samples (or a handheld turbidity probe), experience shows that calibration of only ±40% is achievable since the discrete measurements are subject to higher levels of uncertainty.

Noncohesive and Cohesive Sediments: Sedimentation.
Provided su cient good quality data are available, sedimentation rates provide one of the best means of validating the longer-term performance of sediment models and provide an integrated view of the net result of modelled suspended sediments and bed load.Given the complexity of sediment transport and the errors associated with measurements and empirical sediment transport formulae alluded to above, it is normal practice to apply a scaling factor to the modelled sediment transport rates.e ect, this is a global correction factor to the sediment transport rates predicted by the model that provides an e ective means of matching the model predictions of sedimentation with the observations.e scaling factor has no physical meaning and simply represents many complex physical processes not present within the model including, for example, biological and sediment consolidation factors which can signi cantly alter the physical properties of the sediment with respect to mobilisation and transport.
As a general rule, the scaling factor should be less than 5. Higher values indicate a more signi cant issue with the accuracy of the modelling, or with site-speci c complexity, that may require a nonstandard approach.In such cases, the model approximations cannot be relied upon to describe sedimentation/accretion, and eld monitoring is recommended to supplement the model de ciencies.
Dredging data are frequently used as a measure of longterm sedimentation and are normally expressed as the volume of sediment removed from an area per year.Although such data are useful, they are frequently complicated by a poorly de ned relationship between the dredged volume and the rarely provided bulk density value which can give rise to signi cant errors.Recourse must be made to estimated bulk density values, and sensitivity analyses should be used to quantify sedimentation for a plausible range of values.When validating a model using dredging data, the volume of sediment accumulation predicted by the model (normally over a 15-day spring-neap cycle) would normally be scaled to match as closely as possible the measurements.In recognition of the many sources of errors and uncertainty, a model predicting the dredged volumes to within 50% of the measured rates is normally deemed to be satisfactory for most practical applications.For example, in study of the Humber, the modelled sedimentation volume was 2,180,000 m 3 /yr, while the average volume of sediment dredged was 1,830,000 m 3 /yr, with the values ranging from 790,000 m 3 /yr to 3,915,000 m 3 /yr over a 5-year period (Mott MacDonald, per.com.).

Morphological Models
Morphological modelling in estuarine and coastal environments is challenging, and useful description of the range of approaches employed is provided by Roelvink [117].e primary limitation to the accuracy of morphological models concerns the length of time over which the model is run, with results from long runs (e.g., monthly-decadal) likely to deviate signi cantly from reality [118,119].
From the outset, it is very important to establish a conceptual understanding of sediment transport and historical morphological changes in each study area before attempting a morphological model.is must draw together existing evidence and provide a qualitative description of the process controls and how the morphology of the system responds to these drivers.For long-term assessments of morphology, this also requires consideration of climate change factors.A conceptual understanding can provide the hypothesis (e.g., sources and sinks) with which to test the performance of the model and to provide some guidance on expected magnitudes and directions of sediment transport and the associated morphological changes.
e largest constraint to calibrating morphological models is the availability of high-quality data sets that adequately describe the model parameters over a su cient length of time.In an assessment of data requirements by Splinter et al. [120], it was concluded that (a) calibration of a seasonally dominated site required longer data sets but was less sensitive to sampling interval and (b) calibration of a storm-dominated site required shorter and more frequently sampled data sets.Most studies show that morphological calibrations that are based on short observational records (i.e., < one year) are not robust.To determine initial estimates of calibration coe cients and to hindcast shortterm (1-5 years) shoreline variability, Splinter et al. [120] recommend monthly monitoring programs for at least two years.For longer-term predictions of morphology, longer data sets are required to improve the performance of the models.

18
Advances in Civil Engineering iterate towards an equilibrium condition and predict largescale changes in sediment balances (sediment budget) over medium-to long-term periods, and (c) XBeach, a deterministic process-based model suitable for predicting morphological changes resulting from storm impacts (e.g., [128,129]).An example of outputs from an XBeach model setup to predict the impact of a shore-normal groyne is shown in Figure 9.
However, bed-level changes occur over long time scales compared with the hydrodynamic forcing, and thus until recently, owing to computational limitations, shelf sea morphodynamic models have unable to predict very far into the future using traditional morphodynamic upscaling techniques such as the "continuity correction" method.To overcome this limitation, Lesser et al. [122] and Roelvink [130] have developed the morphological acceleration factor (MORFAC) concept to enable morphological predictions to extend over decadal (e.g., [131]) and centennial [132] time scales.
Whilst the certainty in predicting morphology cannot be proven, it may be possible to bound the uncertainty using sensitivity analysis for key process drivers and to determine a range of possible outcomes.Where possible, a range of di erent morphological modelling approaches should be applied, and where there is general agreement between approaches, then it may be possible to draw additional con dence from the results using an ensemble of model outputs.
De ning what is and what is not a good morphological model performance depends on the spatial and temporal scales considered.At a minimum: (a) the observed/measured sedimentation-erosion patterns must be broadly in agreement with the model outputs; (b) contour plots of measured and computed sedimentation and erosion need to agree as closely as practicable; (c) predicted volume changes over control areas must agree as closely as practicable with soundings or dredging gures; and (d) the shape, migration, and area change of measured and computed cross sections are required to agree.
e incomplete description of the physics underpinning morphological processes and an imperfect knowledge of the initial conditions and parameters will always lead to increasing errors in the model predictions and limit the ability of shelf sea and estuarine morphodynamic models to accurately predict the future true state of the environment.

Improving Predictions and Reducing Uncertainty
An alternative emerging approach to address the problem of model prediction uncertainties involves the application of data assimilation techniques.ese techniques keep model parameters xed and produce an updated model state that matches as closely as possible the true state by combining observational data with model predictions.is updated model state is then used to initiate the next model forecast.However, even if the initial system state can be described awlessly, model parameters simplify the physical processes and, by doing so, will result in the growth of prediction errors.At present, assimilation methods being developed to improve morphological forecast reliability are producing encouraging results (e.g., [133][134][135][136][137]).For example, ad hoc data assimilation schemes and techniques using more re ned heuristic tuning of model state variables are being used to improve the performance of suspended sediment transport models (e.g., [138,139]).
Two main types of uncertainty pervade morphological models: (a) scenario uncertainty stems from uncertainty about the nature of the future weather and weather events (magnitude and frequency) responsible for driving morphological change and (b) response uncertainty relates to the uncertainty in predicting how the morphological system will respond to given forcing conditions.To reduce uncertainty in morphological model predictions, the ensemble approach widely adopted by a climate change scientist (e.g., [140]) may prove to be helpful.e ensemble modelling approach aims to address uncertainty arising from two main sources: (a) incomplete description of the physical processes bringing about morphological changes and (b) limited computing power that constrains how accurately processes can be parameterised in models.For example, in models of estuaries or shelf sea systems, subgrid scale processes such as turbulence can only be represented in a simple way.ere are two possible routes to take in ensemble modelling: (a) perturbedphysics studies investigate how model predictions are a ected by the choice of input parameters through running systematically a single model with di erent parameter values and (b) multimodel studies investigate how predictions di er between di models.e e ect of the initial conditions of the model can also be tested using both approaches.

Summary and Conclusions
e modelling guidance presented in the paper has drawn on published guidelines and on the extensive practical experience of the authors and their colleagues using a range of model types in modelling projects concerned with, for example, managed realignment, environmental impact assessments for o shore wind farms, tidal energy, coastal defences, dredging/disposal sites, beach and estuarine morphodynamics, barrages, and cooling water discharges.Statistical guidelines to establish calibration standards for a minimum level of performance for coastal and estuarine hydrodynamic and sediment models are summarised in Table 7 and are based in part on the recommendations from Evans [1] and Bartlett [2].
While naturally these guidelines remain open to challenges from modellers requiring more exacting model performance, they have been found to deliver models with a good prognostic performance across a broad range of metrics and recognise the practical limitations imposed on model calibration processes by the accuracy and the temporal and spatial resolutions of the available calibration data.eir use in coastal and estuarine modelling studies is therefore recommended.Rate of turbulent energy dissipation κ:

Figure 1 :
Figure 1: Schematic diagram of typical model calibration and validation steps required for a hydrodynamic model.

Figure 2 :
Figure 2: (a) Bathymetry with a clear vertical datum problem and (b) the same bathymetry after datum correction.

Table 3 :Figure 3 :
Figure 3: (a) Measured median grain size, D 50 , obtained from seabed samples and (b) the derived drag coe cient, C d , which accounts for D 50 and bedforms detected in a multibeam survey.

6. 1 .Figure 4 :
Figure 4: Simple examples of a hydrodynamic model calibration: (a) measured (P 1 and P 2 ) and predicted (M 1 and M 2 ) water levels at two locations with C d set to 0.0035 in the model; (b) the same plot as (a) with C d 0.002; and (c) the same plot as (a) with the bathymetry corrected by +0.45 m.

Figure 5 :
Figure 5: Comparisons between measured and predicted wave time series showing H s , T p , and direction.

Figure 6 :
Figure 6: (a) Scatter diagram showing the relationship between measured and modelled H s and (b) a peak over threshold (POT) analysis of H s between modelled and measured wave data.

Figure 7 :
Figure 7: Time series of (a) measured water level and (b) SSC (continuous) and predicted SSC from a calibrated cohesive sediment model.

Figure 8 :
Figure 8: Details of measured water level and SSC (intermittent samples) and predicted SSC from a calibrated cohesive sediment model.

Figure 9 :
Figure 9: Examples of XBeach model output: (a) baseline bathymetry; (b) baseline erosion/accretion after 8 tidal cycles with oblique waves; (c) scheme starting bathymetry with a shore-normal groyne; (d) scheme erosion/accretion after 8 tidal cycles with oblique waves; and (e) snapshot of tidal and wave-induced ows around the groyne.

Table 2 :
[50]rically derived values of the drag coe cient (at 1 m above the bed) for di erent bottom types; from Soulsby[50].

Table 4 :
Example statistics to demonstrate the level of agreement between measured/observed data and model prediction.

Table 5 :
[3]el performance statistics for ood studies in estuaries including storm surge elevation (h surge ), high water levels (h max ), predicted peak water levels, ow velocity (U), predicted uvial inputs (Q), ood area predicted correctly for two or more historical oods (A), and predicted ood depth error (d err ) (A � design; B � appraisal; C � strategy; and U � unsatisfactory); from Defra/EA[3].Type RMSE for h surge (m) RMSE for h max (m) Predicted peak water level (mm) RMSE for U (m/s) Q (% of measured ows) A (%) d err (m) s .Here, the line of unity indicates that the model slightly overestimates H s .In Figure6(b), a peak over threshold (POT) analysis for modelled and measured H s data is shown.

Table 7 :
[2]tistical guidelines to establish calibration standards for a minimum level of performance for coastal and estuarine hydrodynamic and sediment models.etable is based in part on the recommendations from Evans[1]and Bartlett[2].