We seek to provide practical lower bounds on the prediction accuracy of path loss models. We describe and implement 30 propagation models of varying popularity that have been proposed over the last 70 years. Our analysis is performed using a large corpus of measurements collected on production networks operating in the 2.4 GHz ISM, 5.8 GHz UNII, and 900 MHz ISM bands in a diverse set of rural and urban environments. We find that the landscape of path loss models is precarious: typical best-case performance accuracy of these models is on the order of 12–15 dB root mean square error (RMSE) and in practice it can be much worse. Models that can be tuned with measurements and explicit data fitting approaches enable a reduction in RMSE to 8-9 dB. These bounds on modeling error appear to be relatively constant, even in differing environments and at differing frequencies. Based on our findings, we recommend the use of a few well-accepted and well-performing standard models in scenarios where
Predicting the attenuation of a radio signal between two points in a realistic environment has entertained scientists and experimenters for more than 70 years. This is for good reason: accurate predictions of path loss and propagation have many important applications in the design, rollout, and maintenance of all types of wireless networks. As a result, there have been no shortage of models proposed in the literature that claim to predict path loss within some set of constraints. Yet, despite the large quantity of work done on modeling path loss, there is an important shortcoming that this paper begins to address: there have been relatively few comparative evaluations of path loss prediction models using a sufficiently representative data set as a basis for evaluation. Those studies that do exist make comparisons between a small number of similar models. And, where there has been substantial work of serious rigor done, for instance in the VHF bands where solid work in the 1960s produced well-validated results for analog television (TV) propagation, it is not clear how well these models work for predicting propagation in different types of systems operating at different frequencies. The result is that wireless researchers are left without proper guidance in picking among dozens of propagation models. Further, among the available models it is not clear which is best or what the penalty is of using a model outside of its intended coverage. In [
In this paper, we implement and analyze 30 propagation models spanning 65 years of publications using five novel metrics to gauge performance. Although many of these models are quite different from one another, they all make use of the same basic variables on which to base their predictions: position (including height and orientation) of the transmitter and receiver, carrier frequency, and digital elevation model and land cover classification along the main line-of-sight (LOS) transmit path. These models are a mix of approaches: empirical (purely), analytical, stochastic or some combination thereof. In addition, we make use of explicit measurement-based approaches to put a lower bound on the accuracy of direct fitting methods. The present study does not include ray-tracing models (e.g., [
The focus in this paper is the efficacy of these models at
In the end, the results show that no single model is able to predict path loss consistently well. Even for the seemingly simple case of long links between well-positioned antennas in a rural environment, the available models are unable to predict path loss at an accuracy that is usable for any more than crude estimates. Indeed, no model is able to achieve a Root Mean Square Error (RMSE) of less than 14 dB in rural environments and 8-9 dB in urban environments—a performance that is only achieved after substantial hand tuning. Explicit data-fitting approaches do not perform better, producing 8-9 dB RMSE as well. This conclusion motivates further work on more rigorous
Table
Models studied along with their categorization, required input, coverage remarks, relevant citations, and year of (initial) publication.
Name | Short name | Category | Coverage notes | Citations | Year |
---|---|---|---|---|---|
Friis Freespace | friis | Foundational | [ | 1946 | |
Egli | egli | Basic | 30 MHz < | [ | 1957 |
Hata-Okumura | hata | Basic | 1 km < | [ | 1968 |
Edwards-Durkin | edwards | Basic/Terrain | [ | 1969 | |
Allsebrook-Parsons | allsebrook | Basic/Terrain | [ | 1977 | |
Blomquist-Ladell | blomquist | Basic/Terrain | [ | 1977 | |
Longley-Rice Irregular | itm | Terrain | 1 km < | [ | 1982 |
Terrain Model (ITM) | 20 MHz < | ||||
Walfisch-Bertoni | bertoni | Basic | [ | 1988 | |
Flat-Edge | flatedge | Basic | [ | 1991 | |
TM90 | tm90 | Basic | [ | 1991 | |
COST-231 | cost231 | Basic | 1 km < | [ | 1993 |
Walfisch-Ikegami | walfisch | Basic | 200 m < | [ | 1993 |
Two-Ray (Ground Reflection) | two.ray | Foundational | [ | 1994 | |
Hata-Davidson | davidson | Basic | 1 km < | [ | 1997 |
Oda | oda | Basic | [ | 1997 | |
Erceg-Greenstein | erceg | Basic | [ | 1998 | |
Directional Gain Reduction | grf | Supplementary | Dir. Recv. Ant., | [ | 1999 |
Factor (GRF) | |||||
Rural Hata | rural.hata | Basic | [ | 2000 | |
ITU Terrain | itu | Terrain | [ | 2001 | |
Stanford University | sui | Basic | [ | 2001 | |
Interim (SUI) | |||||
Green-Obaidat | green | Basic | [ | 2002 | |
ITU-R | itur | Basic | 1 km | [ | 2002 |
ECC-33 | ecc33 | Basic | 1 km | [ | 2003 |
Riback-Medbo | fc | Supplementary | 460 MHz | [ | 2006 |
ITU-R 452 | itur452 | Terrain | [ | 2007 | |
IMT-2000 | imt2000 | Basic | Urban | [ | 2007 |
deSouza | desouza | Basic | [ | 2008 | |
Effective Directivity | edam | Supplementary | Directional antennas | [ | 2009 |
Antenna Model (EDAM) | |||||
Herring Air-to-Ground | herring.atg | Basic | [ | 2010 | |
Herring Ground-to-Ground | herring.gtg | Basic | [ | 2010 |
At a high level, a model task is to predict the value of
It is worth noting that among the models we have implemented, very few were designed for exactly the sort of networks we are testing them against. Indeed, some are very specific about the type of environment in which they are to be used. Table
The first models worth considering are purely analytical models derived from the theory of idealized electromagnetic propagation. These models are simple to understand and implement and as a result they have been widely adopted into network simulators and other applications and often function at the center of more complex models. Important examples include Friis equation for free space path loss between isotropic transmitters [
The models that we call “basic” models are the most numerous. They compute path loss along a single path and often use corrections based on measurements made in one or more environments. In general, they use the distance, carrier frequency, and transmitter and receiver heights as input. Some models also have their own parameters to select between different modes of computation or fine tuning. Here we subdivide these models into deterministic and stochastic categories. The stochastic models use one or more random variables to account for channel variation (and hence are able to predict a distribution instead of a median value). The Egli model [
Terrain models are similar to the basic models but also attempt to compute diffraction losses along the line of sight path due to obstructions (terrain or buildings, for instance). They are an order of magnitude more complex but are immensely popular, especially for long propagation distances at high power in the VHF band (i.e., television transmitters). Important examples include the ITM [
Supplementary models cannot stand on their own but are instead intended to make corrections to existing models. These models are best subdivided into the phenomenon they are wishing to correct for: stochastic fading [
There are also two major categories of models that we are not considering in this study: many-ray (ray-tracing) models and active-measurement models. Although to some extent these models typify the state-of-the-art with respect to propagation modeling, they are not the models that are widely used in simulators and propagation planning tools. To a large extent, this is because they have greater data requirements. Many-ray models require high-resolution data describing the environment and substantial computation time. These predict the summed path loss along many paths by uniform theory of diffraction (or similar) [
Active-measurement models take the perspective that the only way to make realistic predictions is to combine an
The vast majority of existing work analyzing the efficacy of path loss models has been carried out by those authors who are proposing their own improved algorithm. In such cases, the authors often collect data in an environment of interest and then show that their model is better able to describe this data than one or two competing models. Unfortunately, this data is rarely published to the community, which makes comparative evaluations impossible. One noteworthy exception is the work of the European Cooperation in the field of Scientific and Technical Research Action 231 (COST-231) group in the early 1990s, which published a benchmark data set (900 MHz measurements taken in European cities) and produced a number of competing models that were well-performing with respect to this reference [
Similarly, there was substantial work done in the USA, Japan, and several other countries in the 1960s and 1970s to derive accurate models for predicting the propagation of analog TV signals (e.g., [
There are several studies similar to this work, which compare a number of models with respect to some data. In [
In this section, we describe data sets collected to act as a ground truth basis for comparison to model predictions. These measurements were collected over the course of several years in multiple environments and with differing (but consistent) hardware. They range from “clean” measurements taken in rural New Zealand to “noisy” measurements collected in the urban center of a large US city along with some special measurements to investigate points of particular interest, such as measurements with phased-array and directional antennas, and some in suburban environments. Overall, these data sets combine to paint a unique picture of the real-world wireless radio environment at varying levels of complexity. Table
Summary of data sets.
Campaign | Name | Environment | Type | Frequency | Method | Sites | Measurements |
---|---|---|---|---|---|---|---|
A | wart | Campus | Point-to-point | 2.4 GHz | Packet | 7 | 33,881 |
A | wart/snow | Campus | Point-to-point | 2.4 GHz | Packet | 7 | 24,867 |
B | pdx | Urban | Urban mesh/infrastructure | 2.4 GHz | Packet | 250 | |
B | pdx/stumble | Urban | Urban mesh/infrastructure | 2.4 GHz | Packet | 59,131 | 200,694 |
C | boulder/ptg | Campus | Infrastructure/downstream | 2.4 GHz | Packet | 1,693 | 1,693 |
C | boulder/gtp | Campus | Infrastructure/upstream | 2.4 GHz | Packet | 329 | 329 |
D | cost231 | Urban | Infrastructure/downstream | 900 MHz | CW | 2,336 | 2,336 |
E | wmp/a | Rural | Point-to-point/Infrastructure | 5.8 GHz | Packet | 368 | 2,090,943 |
E | wmp/g | Rural | Point-to-point/infrastructure | 2.4 GHz | Packet | 368 | 20,314,594 |
With the exception of the COST-231 data (campaign D in Table
To get an idea of how accurate commodity radios are in measuring Received Signal Strength (RSS), some calibration experiments were performed in a conductive setting. Each of four radio cards was directly connected to an Agilent E4438C Vector Signal Generator (VSG). The cards were all Atheros-based Lenovo-rebranded Mini-PCI Express, of the same family (brand and model line) chipset to those used for all of our packet-based measurements. The VSG was configured to generate 802.11 frames and the laptop to receive them. For each of the four cards many samples were collected while varying the transmit power of the VSG between −20 dBm and −95 dBm (lower than the receive sensitivity threshold of just about any commodity 802.11 radio) on 5 dB increments. Finally, a linear least squares fit was performed, finding a slope of 0.9602 and adjusted
However, there is a drawback to this approach. Packet-based methods necessarily “drop” measurements for packets that cannot be demodulated. All receivers have fundamental limits in their receive sensitivity that are a function of their design. However, because packet-based measurement techniques rely on demodulation of packets to determine the received signal strength, they have a necessarily lower sensitivity than receivers that calculate received power from pure signal (continuous wave measurements). Additionally, without driver modification, commodity receivers generally update noise floor measurements infrequently. For the purpose of analyzing accuracy of median path loss prediction, these limitations are not problematic. In one sense, commodity hardware “loses” only the least interesting measurements—if we are unable to decode the signal at a given point, we are at least aware that the signal is
It should be noted that packet-based measurement methods are not appropriate for all modeling tasks—the tradeoff between convenience and affordability of commodity hardware versus the completeness of the measurements must be considered. For instance, if the goal of a measurement campaign is to sense signals or interference near the noise floor in order to predict capacity for next generation protocols or if the goal is to model delay spread or Doppler shift, then packet-based measurements will not be sufficient. However, our work here has less demanding data requirements than these applications. For the purpose of measuring median Signal-to-Noise Ratio (SNR) at a given point in space from the perspective of a typical receiver, packet-based measurements made with commodity hardware are both sufficiently accurate and convincingly representative.
In cooperation with the Waikato Applied Network Dynamics (WAND) research group at the University of Waikato [
The network used in our study is a large commercial network that provides Internet access to rural segments of the Waikato region in New Zealand (as well as some in other regions). Our overall approach to measurement involves periodically broadcasting measurement frames from all nodes and meanwhile recording any overheard measurement frames. Every two minutes, each device on the network transmits a measurement frame at each supported bitrate. Meanwhile, each device uses a monitor mode interface to log packets. Because this is a production network, privacy concerns are of clear importance which is why all measurements are made with injected packets and a Nondisclosure Agreement (NDA) was required for use of parts of the data that contained sensitive information (principally client locations).
The network is arranged in the typical hub-and-spoke topology, as can be seen in Figure
The largest of three disconnected sections of the network (80 × 100 km). Link color indicates strength: blue implies strong, red implies weak. Backhaul nodes (mainly 5.8 GHz) are red and CPEs are light blue.
After collection, the data requires scrubbing to discard frames that have arrived with errors. Because there is substantial redundancy in measurements (many measurements are made between every pair of participating nodes), discarding some small fraction of (presumably randomly) damaged frames is unlikely to harm the integrity of the data overall. As a rule, any frame that arrives with its checksum in error or those from a source that produces less than 100 packets is discarded. For the work here, one representative week of data collected between July 25th, 2010, and August 2nd, 2010 is used. Because detailed documentation about each node simply did not exist, some assumptions were made for analysis. The locations of nodes for which there is no specific Global positing system (GPS) reading are either hand coded or, in the case of some client devices, geocoded using an address. Antenna orientations for directional antennas are assumed to be ideal—pointing in the exact bearing of their mate. All nodes are assumed to be positioned 3 m off the ground, which is correct for the vast majority of nodes. While these assumptions are not perfect and are clearly a source of error, they are reasonably accurate for a network of this size and complexity. Certainly, any errors in antenna heights, locations, or orientations are on the same scale as those errors would be for anyone using one of the propagation models analyzed to make predictions about their own network of interest.
In the end, our scrubbed data for a single week constitutes 19,235, and 611 measurements taken on 1328 links (1262 802.11 b/g links at 2.4 GHz and 464 802.11a links at 5.8 GHz) from 368 participating nodes. Of these nodes, the vast majority are clients and hence many of the antennas are of the patch panel variety (70%). Of the remaining 30%, 21% are highly directional point-to-point parabolic dishes and 4.5% each of omnidirectional and sector antennas.
In addition to the “baseline” measurements in a rural setting, we collected measurements in three additional environments to complete the picture of the urban/suburban wireless propagation environment. The three campaigns cover the three transceiver configurations that are most important in the urban wireless environment (see Figure
Visual schematic of three urban data sets. A: roof-to-roof measurements from CU WART (Wide area radio testbed), B: ground- (utility poles) to-ground (mobile node) measurements in Portland, and C: roof-to-ground and ground-to-roof measurements from CU WART.
The first data set, A, was collected using the University of Colorado at Boulder (CU) Wide Area Radio Testbed (WART), which is composed of six 8-element uniform circular phased-array antennas [
University of Colorado Wide Area Radio Testbed (CU-WART).
The second set of urban measurements, B, involves three data sets from three urban municipal wireless networks: a (now defunct) municipal wireless mesh network in Portland, OR, the Google WiFi network in Mountain View, Ca, and the Technology For All (TFA) network in Houston, TX. All three data sets involve data collected with a mobile client. As a standard practice we truncate the precision of the GPS coordinates to five significant digits, which has the effect of averaging measurements within a 0.74 m (
In this network, 70 APs are deployed on utility poles in a 2 km by 2 km square region. Each AP has a 7.4 dBi omnidirectional antenna that provides local coverage in infrastructure mode. These measurements were collected during the summer of 2007. This data set, which consists of both laborious point testing and extensive war-driving data, is most representative of ground-to-ground links in urban environments. Collection involved a two-stage process. First, a mobile receiver was driven on all publicly accessible streets in the 2 km by 2 km region. The receiver was a Netgear WGT-634u wireless router running OpenWRT linux [
The Google WiFi network [
Google WiFi Network in Mountain View, CA.
The final set of street-level infrastructure measurements comes from the community wireless mesh network constructed by Rice University and the TFA nonprofit organization in Houston, TX [
TFA-Wireless Network measurements in Houston, TX.
The final data set, C, involves two sets of measurements: one from the CU WART and one set of published measurements from a well-placed transmitter in Munich, Germany.
The first data set was collected using a mobile node (a Samsung brand “netbook”) with a pair of diversity antennas. In this experiment, the 6 rooftop CU WART nodes were configured to transmit 80-byte “beacon” packets every
All nodes, including the mobile node, were configured to log packets using a second monitor mode (promiscuous) wireless interface. The mobile node was additionally instrumented with a USB GPS receiver that was used both to keep a log of position and to synchronize the system clock so that the wireless trace was in sync with the GPS position log. These measurements were collected during the summer of 2010. During the experiment, the mobile node was attached to an elevated (nonconducting) platform on the front of a bicycle. The bicycle was pedaled around the CU campus on pedestrian paths, streets, and in parking lots. This data set is most representative of an infrastructure wireless networks where a well-positioned static transmitter must serve mobile clients on the ground. This data set is subdivided into the upstream part (boulder/gtp) and the downstream part (boulder/ptg).
The second group of measurements are from a reference data set collected by the COST-231 group at 900 MHz [
Each of the 30 models is implemented from their respective publications in the Ruby programming language. Only one of the models, the ITM [
Terrain Models require access to a Digital Elevation Model (DEM). In the the case of the International Telecommunications Union Radiocommunication Sector (ITU-R) 452 model, a Landcover Classification Database (LCDB) is required as well. The DEM used for the networks in the United States is publicly available raster data set from the United States Geological Survey (USGS) Seamless Map Server, providing 1/3-arcsecond spatial resolution. The US LCDB is also provided by the USGS as a raster data set, which is generated by the USGS using a trained decision tree algorithm. For the New Zealand data sets, DEM and LCDB data are provided by the Environment Waikato organization. The DEM has a vertical precision of 1 m and an estimated accuracy of 5-6 m RMSE. The GDAL library [
In our implementation of Hata-Okumura, and its derivative models, a few crude corrections are made to antenna heights in the event that they fall outside of the models coverage (and would therefore produce anomalous results). First, the minimum of the two heights is subtracted from both so that they are relative. For instance, antenna heights of 30 and 40 m become 0 and 10. Then, heights are swapped if necessary so that the transmitter height is always higher than the receiver height (at this point the receiver height will be zero). Next, one is added to the receiver height and one is subtracted from the transmitter height, keeping the relative difference but setting the receiver height to 1 m. For instance 0 and 10 m would become 1 and 11 m. Finally, the transmitter height is decreased or increased as necessary so that it is above the minimum (30 m) and below the maximum (200 m).
These corrections are necessary to use the Hata-Okumura model with transmitter or receiver heights that would otherwise produce meaningless (infinite) results. It is not certain what the impact is on the model performance by making these corrections. However, it stands to reason that even if the performance is negatively impacted, an inaccurate prediction will still be closer to the true answer than an anomalous (infinite) prediction. We believe this to be acceptable due diligence in terms of applying the model outside of its domain of coverage (where the accuracy of predictions is already questionable).
To obtain results we ask each model to offer a prediction of median path loss for each link in the data. The model is fed whatever information it requires, including DEM and LCDB information. The model produces an estimate of the loss
Some models come with tunable parameters of varying esotericism. For these models, we try a range of reasonable parameter values without bias towards those expected to perform best.
This entire process requires a substantial amount of computation but is trivially parallelizable. To make the computation of results tractable, we subdivide the task of prediction into a large number of simultaneously executing threads and merge the results after completion. This must occur in two sequential stages. During the first stage, path profile information is extracted and prepared for each link in parallel, and during the second stage this information is provided to each algorithm for each link, which can also be done in parallel. With the merged data in hand, each prediction is compared with an oracle value for the link. This oracle value is computed from the measured received signal strength for the link as well as known values for the transmitter power and antenna gain.
It is worth noting that very few of the models tested were designed with the exact sort of network that we are studying in mind. Indeed, some are very specific about the type of environment in which they are to be used. In this study both appropriate and “inappropriate” models are given an equal chance at making predictions for our network—there is no starting bias about which should perform best.
The performance of the models is analyzed with respect to several metrics in order of decreasing stringency: RMSE and and spread corrected root mean square error (SC-RMSE), Competitive success, Individual accuracy relative to spread, Skewness, Rank correlation.
RMSE is the most obvious and straightforward metric for analyzing the error of a predictive model of this sort. As discussed above, for a given model we compute an error value (
Schematic explaining error (
Computing SC-RMSE is identical to RMSE as shown in (
The competitive success metric is the percentage of links in a given data set that a given model has made the best prediction for. For each link we keep track of the model that makes the prediction with the smallest
We would expect that when analyzing many models, if one model (or a set of related models) is dominant for a given environment, then it would score near 100 on this metric. Because the percentage points are divided evenly between all models tested, if we test a large number of models, this metric may be spread too thinly to be useful for analysis (i.e., too many similar models share the winnings and no single model comes out on top).
The individual accuracy metric is the percentage of links where the given model is able to make a prediction within one or two standard deviations of the measured spread
The fourth metric is skewness, which is simply the sum of model error across all links
This metric highlights those models that systematically over- and underpredict. Some applications may have a particular cost/benefit for under- or overpredictions. Models that systematically overpredict path loss (and therefore underpredict received signal strength) score a high value on this metric. Models that systematically underpredict score a large negative value. And, models that make an equal amount of under- and overpredictions will score a value of zero.
Our final metric is rank correlation using Spearman’s
We begin by explicitly fitting the data to a theoretical model and looking at the number of measurements required for a fit. This gives an initial estimate of expected error for direct (naïve) fits to the collected data. Then, to analyze the performance of the algorithms, we apply five domain-oriented metrics of decreasing stringency. We discuss the performance results for each data set with respect to these metrics, as well as general trends and possible sources of systematic error. Finally, explicit parameter fitting of the best models is performed, and this best-case performance is used to define practical lower bounds on model prediction error.
In this section we attempt to explicitly fit the relationship between attenuation and distance as a straight line on a log/log plot. To this end, we extend the classic equation for freespace path loss from [
Figure
Explicit power law fits to data. Fit parameters are provided on the plots.
COST-231
WMP
TFA
Table
Summary of results by data set.
Name | Top three performing models by SC-RMSE | Ideal RMSE | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
wart | 1.86 | 9.05 | 13.26 | 15 | flatedge | 13.73 | itu.terrain | 13.89 | hatao | 14.03 | 1.96 |
wart/snow | 1.92 | 9.25 | 13.36 | 15 | itu.terrain | 13.93 | flatedge | 14.16 | hatao | 14.19 | 1.87 |
pdx | 2.25 | 19.53 | 7.8 | 5 | allsebrook200 | 8.38 | hatal | 8.97 | davidson | 9.37 | 1.14 |
pdx stumble | 1.79 | 27.08 | 8.96 | 40 | allsebrook400 | 8.34 | itur25 | 10.50 | hatam | 10.51 | 1.02 |
boulder/ptg | 0.79 | 19.56 | 7.36 | 20 | allsebrook400 | 7.90 | ecc33m | 9.38 | hatam | 10.47 | 0.94 |
boulder/gtp | 0.27 | 10.88 | 3.67 | 5 | allsebrook400 | 5.45 | hatal.fc | 7.15 | edwards200 | 8.51 | 1.01 |
cost231 | 6.25 | 51.19 | 6.36 | 15 | edwards200 | 9.23 | hatam | 9.99 | itur25 | 10.55 | 1.23 |
wmp | 0.62 | 13.74 | 13.92 | 15 | flatedge | 15.34 | allsebrook200 | 16.72 | egli | 16.83 | 5.98 |
tfa | 0.95 | 22.76 | 7.89 | 20 | herring.atg | 8.90 | allsebrook200 | 9.03 | flatedge | 10.83 | 1.43 |
0.54 | 6.15 | 7.37 | 30 | davidson | 13.56 | itu.terrain | 16.12 | hatal | 16.83 | 2.93 |
In order to understand how many measurements are needed to create a fit of this sort, we take successively increasing random samples of the data sets and use these subsets to generate a fit. We then look at how the residual error of the model (with respect to the complete data set) converges as the subsample size increases. Figure
Number of samples required for naïve fit for the “Google” data set. Plots show fit standard error for fits increasing random samples, and a horizontal line is given at the RMSE obtained for all points.
Figure
Five metric results for all data sets combined.
RMSE
Success/Accuracy
Rank Correlation
Skewness
Looking first at the results for the rural (WMP) data, the best performing models achieve an RMSE on the order of 15 dB. The best models are the Allsebrook model (with its terrain roughness parameter set to 200 m) at just under 18 dB RMSE (16.7 dB when corrected) and the Flat-Edge model (with 10 “buildings” presumed) at 16.5 dB RMSE (15.3 dB when corrected). The urban models do much better in terms of RMSE. The best models achieve an RMSE on the order of 10 dB and the worst (of the best) approach more than 50 dB. The overall winners are the Hata model, the Allsebrook model, the Flat-Edge model, and the ITU-R model. This follows from expectations because all of these models were derived for predicting path loss in urban environments. The Hata model and Allsebrook model are based on measurements from Japanese and British cities, respectively. The Flat-Edge model is a purely theoretical model based on the Walfisch-Bertoni model, which computes loss due to diffraction over a set of uniform screens (simulating buildings separated by streets). Table
For the second metric, competitive success, look to the leftmost (red) bar in the second of the plots. For most of the data sets, there is no clear winner with the best models sharing between 10 and 15 percent of the winnings. This indicates that there is no single model that outperforms all others. There are a few exceptions. For the PDX data set, the Davidson model takes 40% of the winnings, in the COST-231 data set, the ITU-R 25 model takes 30%, in the Google data set, the Davidson model takes more then 30%, and, in the downstream Boulder measurements (boulder/gtp), the Davidon model again takes 25% of the winnings. There is not, however, a single model or two that outperform all others in a large subset of our data. Hence, we can conclude that
The third metric is percentage of predictions within one (or two) standard deviation of the true median value. This metric requires multiple measurements at each point in order to estimate temporal variation in the channel. Of our data sets, six have this data available: WMP, COST-231, PDX/Stumble, Google, TFA, and WART. For the WMP data the best performing models (Allsebrook, Flat Edge, Herring Air-to-Ground, and ITU-R) score between 10% (for within one standard deviation) and 20% (for within two standard deviations) on this metric. We see similar results for our other data sets but different winners. For the PDX/Stumble data the winners are Herring Air-to-Ground, Hata, and ITU-R 25. For the WART data set, the winners are the ITM, ITU-Terrain, and Blomquist. For the COST-231 data set the winners are Herring Air-to-Ground, Hata, and Allsebrook. Again, the best performing model appears to be largely environment dependent.
Our fourth metric is skewness. The interpretation of this metric is largely application dependent—it is hard to know in advance whether over- or underestimates are more harmful. If a model makes an equal amount of over- or underestimates (resulting in zero skewness) but has a large RMSE, is it better than a model that systematically overestimates but has a small RMSE? The Hata model is particularly well behaved by this metric, producing a value near zero for all data sets. As one would expect, the Hata-derived models perform similarly (i.e., ITU-R 25, Davidson, etc.). The rest of the models seem to vary largely from data set to data set, although ITU-R 452 performs well for some data sets.
The final metric is rank correlation. For just about all of the models we see a rank correlation around 0.5, which indicates a moderate (but not strong) correlation between measured and predicted rank orderings. Models that perform particularly poorly by this metric achieve values much lower on occasion. A result near zero indicates that there is no noticeable correlation between rank orderings. The COST-231 rank correlations are substantially higher than all other data sets. We believe this is related to the fact that the COST-231 data more closely fits theoretical expectations of the relationship of path loss to distance. Hence, models that use something like Friis equation at their core will produce rank values that are closer to data in this data set. Overall, however, there does not seem to be a consensus about which model performs best at rank ordering—the winners are different for each data set.
In order to determine the minimum obtainable error with these models, we take two well-performing models that have tunable parameters, Allsebrook-Parsons and Flat-Edge, and proceed by searching the parameter space to find the best possible configuration (data from the Boulder, WART, and PDX data sets were used for this experiment). The Allsebrook-Parsons model takes three parameters (besides carrier frequency, which is common to nearly all the models):
For the Allsebrook model, the
Explicit parameter fitting for the Allsebrook and Flat-Edge model parameters.
If we consider 9 dB to be the minimum achievable error of a well-tuned model, it is interesting to note that approximately the same performance can be achieved with a straight line fit through a small number (
In order to understand which variables may serve to explain model error, we performed a factorial analysis of variance (ANOVA) using spread corrected error as the fitted value and transmitter height, receiver height, distance, line-of-sight (a boolean value based on path elevation profile), and data set name. Although all of these variables show moderate correlations (which speaks to the fact that many models add corrections based on these variables), some are much better explanations of variance than others. Perhaps not surprisingly; distance and data set name are the biggest winners with extremely large
Correlation between model accuracy and link distance for each data set. Distance is bucketed by kilometer.
WMP
One conclusion from this is that hybrid models, which combine the strengths of multiple simpler models, may perform better than any one model alone. To understand the possible benefit of hybridized models, we implemented three hybrid models and applied them to the WMP data. The WMP data was chosen because it includes the largest variety of link lengths. The first uses the Hata model (for medium cities) for links under 500 m (where it is well-performing) and the Flat-Edge model (with 10 “buildings”) for longer links (hatam.flatedge10). This model performs marginally better than all other models, producing a corrected RMSE of 14.3 dB. Very slightly better performance is achieved by combining the Hata model with the Egli Model (14.2 dB RMSE). We also tried using the TM90 model for links less than 10 miles and the ITM for longer links (tm90.itmtem), but this combination is not well-performing with respect to our measurements. Treating this tuning and hybridization as an optimization problem with the goal of producing the best performing configuration of the existing protocols is a promising project for future work. Taking this approach; however, one must be careful to avoid overfitting a model to the data available.
As an example of what these results mean for real applications, consider Figure
Comparison of predicted coverage maps for Portland, OR using two well-performing models, with and without the same scale Gaussian error included. True green indicates predicted recieved signal at −30 dBm, and true red indicates predicted recieved signal at the noise floor (−95 dBm). Intermediary values are linearly interpolated between these two color values.
Allsebrook
Allsebrook with noise
Hata
Hata with noise
Comparing these maps to the empirical and operator assumed coverage maps shown in Figure
Measurements from “pdx/stumble” data set. Signal strength at measurement points is plotted as green (light) when it is strong to red (dark) when it is weak. The operator-assumed coverage is given as 500′ circles centered at each AP, and the goal coverage area is given as larger 1000′ circles centered at each AP.
Yet, the future holds promise. Consider the final column in Table
In this section, we discuss several important observations based on the results above.
One interesting additional observation from this data is that modeling path loss from directional transmitters is especially difficult. This can be seen in the fact that our data from the directional CU-WART testbed is particularly noisy. There have been at attempts to model this phenomenon explicitly in the past [
It is worth noting that some algorithms will generate errors when used outside of their intended coverage. If we give these models the benefit of the doubt and only make use of those predictions where no errors or warnings were generated, the overall performance looks better. For instance, the corrected RMSE for ITM (with parameters for a temperate environment) on the WMP data set improves from 28.2 dB to to 23.1 dB if the most egregious errors are discarded (which stem from problems predicting refraction over certain terrain types and are only 290 of 2492 predictions) and down to 17.3 dB when only those predictions that generate zero warnings are used (which usually stem from links that are too short and are only 696 of 2492 predictions). This is a substantial improvement—at 17.3 dB corrected RMSE, the ITM is performing on par with the best of the other models.
In a result that appears completely counterintuitive, the rural data set is much more difficult to model than our urban data sets. To look for sources of systematic error, we analyzed the covariance (correlation) between “best prediction error” (the error of the best prediction from all models) and various possible factors. There appears to be no significant correlation between carrier frequency (and therefore neither modulation scheme nor protocol) and antenna geometry. However, there is a large correlation between error and distance. It is our hypothesis that the reason the WMP data is especially difficult to model has to do with two factors. Because researchers have assumed that rural environments are “easy” or “solved,” there has been substantially more work in developing (empirical) models for urban environments. The majority of state-of-the-art rural models on the other hand are largely analytical and were mostly developed 30 or more years ago (i.e., the ITM). This data set has an exceptionally large variety of link lengths, and as has been shown, prediction error is strongly correlated with distance for many models.
In this work, we have performed the first rigorous evaluation of a large number of path loss models from the literature using a sufficiently representative data set from real (production) networks. Besides providing guidance in the choice of an appropriate model when one is needed, this work was largely motivated by a need to create baseline performance values. Without an existing well-established error bound for these approaches, it is impossible to evaluate the success (or failure) of more complex approaches to path loss modeling (and coverage mapping). For the models implemented here and the data sets analyzed, it is possible to say that
Direct approaches to data fitting, such as a straight line fit to the log/log relationship between path loss and distance, produce a similar level of error: 8-9 dB for urban environments and
Among the most important outcomes of this work is a set of guidelines for researchers, which can help provide direction in the complicated landscape of path loss prediction models. As a general rule, when it is feasible to make direct measurements of a network, one should do so. We have shown that a small number of measurements can have substantial power in terms of tuning the models we have studied and in fitting parameters for basic empirical models. When it is not possible to make measurements of a network, the careful researcher should choose from standard well-accepted models such as Okumura-Hata or Davidson, which generally have the least systematic skew in predictions and are among the best performing models overall. In simulation studies, we advocate a repeated-measures approach, where stochastic models are used in a repeated-measures/Monte Carlo experimental design, so that a realistic channel variance can be modeled. For this application, the recent proposal of Herring appears to be a good choice or, for the greatest comparability, the Hata model with stochastic Lognormal fading. Although there are a large number of models from which to choose, our work here shows that in many cases the most important factors that a researcher should consider are having a realistic expectation of error and choosing a model that enables repeatability and comparability of results.