Dataset Paper Mapping the Slums of Dhaka from 2006 to 2010

1 Department of Epidemiology, Mailman School of Public Health, Columbia University, 722 West 168th Street, Room 517, New York, NY 10032, USA 2Department of Tropical Medicine, School of Public Health and Tropical Medicine, Tulane University, 1440 Canal Street, New Orleans, LA 70112, USA 3Geography Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany 4Department of Public Health Medicine, School of Public Health, University of Bielefeld, P.O. Box 100131, 33501 Bielefeld, Germany


Introduction
Within recent decades, high rates of urbanization in low and middle income countries led to the development of megacities with more than 10 million inhabitants [1,2].Megacity development is often combined with a loss of governability due to weak political structures [2].In addition, health and social infrastructures are poor or nonexistent, and the housing sector is incapable of fulfilling the demand of the high numbers of rural migrants flushing into the cities every day [3].
In Dhaka, the capital city of Bangladesh, the population increases by half a million each year, a rate that would result in a population of almost 23 million by 2025 [1,4].With over 15 million inhabitants today, Dhaka is the second fastest growing megacity in the world [1].Many of the city's immigrants initially concentrate in slums, due to poverty and limited alternatives [5].Within the slums, poor environmental conditions and deprived infrastructures can be found [4,6,7].These induce negative impacts on the physical and psychological well-being of urban slum residents [8][9][10][11][12].The Centre for Urban Studies estimated that the total population of Dhaka's slums was more than doubled between 1996 and 2005, from 1.5 to 3.4 million people.The limited knowledge about slum settlement size, distribution, and dynamics presents an enormous challenge for urban health [8].
Various slum classifications exist in the literature, yet there is no universal definition for a slum community or for slum housing.Moreover, slum characteristics are not consistent across countries or even across cities. Widely applied is the notion of the UN-habitat group [13], which defines a slum household as one or a group of individuals living under the same roof in an urban area and lacking one or more of the following five amenities: (1) durable housing, 2 Dataset Papers in Science (2) sufficient living area, (3) access to improved water, (4) access to improved sanitation facilities, and (5) secure tenure.Since an identification of slums by these criteria requires attributes only available from field sources [14], remote sensing is not capable of directly discerning demographics, internal conditions, or microstructure beyond the spatial resolution of the sensor.Nonetheless, remote sensing can provide an excellent adjunct or alternative to field surveys of urban environments.Satellites are capable of periodically monitoring LULCC and covering large areas and whole cities with relatively short processing turnaround.Applications of remote sensing are best able to identify slums through the features of housing density, structure, and roof composition [15,16].Identification from imagery can be performed manually, algorithmically, or through a combination of methods.All methods utilize areas of known slum established in fieldwork or preceding studies.The classification must be customized to the locale since slums can develop through the degradation of formal housing or through informal processes.
Several notable studies have previously attempted to map urban slums or land use land cover change (LULCC) in the Dhaka region.Griffiths et al. [17] employed a support vector machine method to classify LULCC from several multispectral and multitemporal data sources for the years 1990, 2000, and 2006.Dewan and Yamaguchi [18] analyzed landsat imagery using a supervised maximum likelihood classification algorithm for the years 1975, 1992, and 2003.While the results of both studies contained only a single urban class, Netzband et al. [14] additionally differentiated the socioeconomic status of urban residential areas in central Dhaka using the normalized difference vegetation index (NDVI) from Quickbird data.Kit et al. [15] measured the spatial heterogeneity of Quickbird imagery to map the slums of Hyderabad in India.The most comprehensive and nearly complete slum maps of Dhaka have been published by the Centre for Urban Studies (CUS) in their 2005 census and mapping of slums (CMS) [19].The dataset was constructed through the visual inspection of IKONOS 2003 one-meter panchromatic images followed by their verification with a large-scale ground survey.However, years later, the report's original authors recognize the need for an updated mapping project [16].The 2005 CMS has become outdated due to the volatility of residential development and severe flooding [20] in the intervening period.Hence, recent data of slum location and distribution are scarce in Dhaka, and field studies covering large areas can be cost prohibitive and time consuming.
The goal of this study was to present an attempt to acquire new large-scale spatial data on Dhaka slums efficiently through the use of remote sensing.Quickbird satellite images and freely available ancillary sources were employed to map the distribution of slums for the Dhaka metropolitan area (DMA) in 2006 and 2010.The dataset provides high-sensitive, high-resolution, multiyear slum delineation, and change data.Slum distribution maps such as ours identify areas of concentrated poverty, poor environmental conditions, and health inequalities in Dhaka [4, 7, 9-12, 16, 21].The data can provide direct assistance to policymakers and be utilized for future spatial studies of land-use change and urban expansion.

Methodology
To delineate slum area from nonslum area in Dhaka, we used primarily very high-resolution (VHR) satellite imagery.The mapping process was applied to the Dhaka metropolitan area (DMA), a region encompassing both the wards of the Dhaka city corporation (DCC) and the unions of the DMA.The 91 wards and 10 unions comprise a total land area of 306 km 2 .All editing and processing were performed in ArcGIS 10 [22].Shapefiles were drawn using the editor tool.Ward and union boundary files were obtained from the Centre for Urban Studies, the organization responsible for the 2005 CMS [16,19].The shapefiles were projected to WGS 1984, UTM Zone 46N, and georeferenced to the Quickbird images.
We do not believe that the 18-day discrepancy has meaningfully impacted our results.Due to a slight offset, the 2010 scene georeferencing was adjusted to match the 2006 images using control points.The 2005 CMS was the primary source of ground-verified data.It is important to note that those maps were based on 2003 IKONOS imagery and subsequent confirmation by fieldwork in 2004/2005 [16,19].Slum distribution in 2006 and 2010 has more than likely undergone significant changes since the study in part due to a major flood in 2004 that damaged a huge amount of urban infrastructure [20].
Additional ground-verified data were derived from our survey of 15 Dhaka slums from 2007 to 2009 for diverse spatial-epidemiological studies [9-12, 21, 37].Sampled slum households were geolocated with GPS and photos were taken with respect to privacy of subjects.
The Google Earth [38] application displays highresolution imagery from various time periods.The satellite sources include IKONOS, SPOT, and Quickbird.These images were of lower quality than our own, but they proved to be enormously helpful in presenting alternate solar illumination angles and seasons.Many city blocks became easier to interpret under different lighting conditions.Google Earth also links to geolocated photos posted by users.The amateur photos offer a glimpse of ground conditions that assisted in clarifying the true nature of suspected slum buildings.For example, covered markets can be indistinguishable from slums on satellite imagery due to similar construction materials [16].The plentiful photos of markets by users allowed us to avoid mislabeling these areas.
The final slum datasets were created in three stages: (1) suspected slums were demarcated in ArcGIS over 2006 Quickbird satellite images; (2) using the previous output as a base, new slum additions and subtractions were then mapped over 2010 Quickbird images; (3) the 2006 and 2010 slums maps were compared and changes were mapped.
2006 mapping procedure was as follows.Slum polygons were first drawn in ArcGIS 10 over the January 6, 2006, Quickbird scenes.The mapping focused on slums greater than 1 acre in area.Smaller units were judged too difficult to differentiate by eye.After completion, the following processing was performed to aggregate polygons and delete small, isolated slums.First, the polygons within a distance of 10 m were aggregated to a single polygon using the aggregate tool.This was performed because the editing process resulted in many thousand overlapping polygons.Second, a 10 m buffer was created around all polygons greater than 1 acre, using the selection and buffer tools.Finally, all polygons intersecting the buffer were selected and exported to a new shapefile.In effect, this removed all polygons that were smaller than 1 acre and were not lying within 10 m zones surrounding a polygon larger than 1 acre.An isolation distance of 10 m was chosen because this represented about the average width of major roads.Slums separated by more than this distance were thereby considered separate clusters.
To calculate the total slum size by admin area, that is, the wards and unions, which contained them, the intersect tool was used and polygons were split along the boundaries of the polygons of Dataset Item 1 (Spatial Data).For cleaning, artifact polygons smaller than 0.02 acres were selected and deleted.Furthermore, the dissolve tool was used to aggregate all slum polygons within one ward or union in order to calculate the slum area by admin level.The output was deemed the final 2006 slum map (Dataset Item 2 (Spatial Data)).
2010 mapping procedure was as follows.To maintain consistency and comparability with the 2006 slum borders, those polygons were used as a base from which modifications were made to reflect changes in 2010 slum distribution.First, a copy of the 2006 slum shapefile was overlaid onto the 2010 Quickbird images.Additions to 2006 slums were drawn into an addition shapefile, and no-longer-existing 2006 slums were drawn into a subtraction shapefile.After completion, the additions were joined to the 2006 polygons with the union tool, and the subtractions were deleted with the erase tool.The resulting 2010 slum shapefile was then processed in the same manner as the initial 2006 shapefile.Isolated polygons < 1 acre and not within 10-meter zones surrounding a polygon > 1 acre were removed, polygons were split along ward and union borders, and artifacts were cleaned.
For calculating the slum polygon by admin level (ward and union), the same procedure was used as with Dataset Item 2 (Spatial Data).The output was deemed the final 2010 slum map (Dataset Item 3 (Spatial Data)).
Change detection procedure was as follows.Shapefiles representing the growth and removal of slum between 2006 and 2010 were created through erase procedures.The new slum area was acquired by subtracting the 2006 slum polygons (Dataset Item 2 (Spatial Data)) from the 2010 slum polygons (Dataset Item 3 (Spatial Data)).The lost slum area was acquired by subtracting the 2010 slum polygons (Dataset Item 3 (Spatial Data)) from the 2006 slum polygons (Dataset Item 2 (Spatial Data)).Dataset Item 4 (Spatial Data) presents the location of slum growth and new settlement between 2006 and 2010 (cf. Figure 1).Dataset Item 5 (Spatial Data) presents the pattern of slum decline between 2006 and 2010.Together, these maps represent slum volatility.For calculating the slum polygon by ward and union, the same procedure was used as with Dataset Item 2 (Spatial Data).
Strength and limitations of this dataset are the following.The dataset described in this paper successfully depicts the location and distribution of Dhaka's slums through visual inspection of VHR satellite imagery, but they also have some important limitations worth mentioning.First, identification is based entirely on visual interpretation and comparison with known slum appearances.The borders of most slums were fairly obvious, but difficulties arose in dense urban areas of mixed commercial and residential status.Using best judgment, these areas were marked as slum and, consequently, are expected to have high rates of false positives.Areas of heavy foliage cover that could obscure settlements were less common but likely contain the majority of false negatives.Second, the method is vulnerable to biases of human perception and interpretation.Third, the output is solely the distribution of slum land cover.The dataset does not contain information on housing conditions, availability of utilities, population density, or demographics.As a result, the classifications do not abide by the programmatic definition of slum previously mentioned.The slum maps require the attribute information available from fieldwork to be useful for targeted planning and policymaking.Fourth, the slum size calculation per ward or union very much depends on the shapefile used for those administrative areas.However, our shapefile in Dataset Item 1 (Spatial Data) is comparable to the latest population and housing census 2011 [39] in order to facilitate comparisons.
Despite these limitations, our dataset can have applications in targeted programs that maximize allocations for the underserved or for studies related to urban planning, public health, and the environment, more specifically, as follows: (i) the optimal allocation of public services, including sanitation, electricity, and other infrastructure [14]; (ii) targeted NGO and government health intervention campaigns, for example, immunizations,cholera treatments, and health education [8,40,41]; (iii) the epidemiological modeling of infectious disease [42]; (iv) the promotion of sustainable urbanization and landuse [18]; Dataset Papers in Science (v) the conservation of wetlands that act as floodplains and water retention areas [43]; (vi) disaster management of severe flooding, the risk of which is increasing due to climate change [4]; (vii) studies of the interactions between megacity growth and climate change [18]; (viii) models to predict socioeconomic factors and environmental degradation [14].

Dataset Description
The dataset associated with this Dataset Paper consists of 5 items which are described as follows.
Dataset Item 1 (Spatial Data).ESRI shapefile containing polygons that represent the 91 wards of the Dhaka City Corporation and the 10 unions of the Dhaka Metropolitan Area along with the ward/union boundary shapefile derived from the CMS and adapted to fit with the names of the most recent population and housing census.This shapefile is made up of seven separate file formats, each containing different information with regard to the shapefile.For example, the geometric information of the spatial feature itself (e.g., spatial boundary of Dhaka wards and unions) is stored in the " * .shp"format, while the " * .dbf" format contains attribute data attached to the shapefile.The format " * .prj"defines the coordinate system and projection for the shapefile and the formats " * .sbn," " * .sbx," and " * .shx"contain indices.Finally, the format " * .shp.xml" contains the geospatial metadata of the shapefile, a descriptive document of the shapefile in xml format.Associated data table (spatial.data.1.dbf)contains 101 entries with the columns "ward union" and "name" providing information for each polygon.The column "ward union" gives the number or name of the ward or union, respectively.The column "name" gives some additional information on the ward or union name where this administrative area is part of another one.
Dataset Item 2 (Spatial Data).ESRI shapefile containing polygons that represent the delineations of slums larger than 1 acre in the Dhaka Metropolitan Area as seen in 2006 Quickbird images [23][24][25][26][27] along with the ward/union boundary shapefile derived from the CMS and adapted to fit with the names of the most recent census.This shapefile is made up of seven separate file formats, each containing different information with regard to the shapefile.For example, the geometric information of the spatial feature itself (e.g., spatial boundary of the slum cluster) is stored in the " * .shp"format, while the " * .dbf" format contains attribute data attached to the shapefile.The format " * .prj"defines the coordinate system and projection for the shapefile and the formats " * .sbn," " * .sbx," and " * .shx"contain indices.Finally, the format " * .shp.xml" contains the geospatial metadata of the shapefile, a descriptive document of the shapefile in xml format.Associated data table contains 95 entries according to the number of wards or unions in which they were found.The column "ward union" gives the number or name of ward or union in which they were found, respectively.Furthermore, the column "SUM area a" gives the total slum area by ward/union in acre, while the column "SUM area s" gives the total slum area by ward/union in m 2 .
Dataset Item 3 (Spatial Data).ESRI shapefile containing polygons that represent the delineations of slums larger than 1 acre in the Dhaka Metropolitan Area as seen in 2010 Quickbird images [28][29][30][31][32][33][34][35][36] along with the ward/union boundary shapefile derived from the CMS and adapted to fit with the names of the most recent census.This shapefile is made up of seven separate file formats, each containing different information with regard to the shapefile.For example, the geometric information of the spatial feature itself (e.g., spatial boundary of the slum cluster) is stored in the " * .shp"format, while the " * .dbf" format contains attribute data attached to the shapefile.The format " * .prj"defines the coordinate system and projection for the shapefile and the formats " * .sbn," " * .sbx," and " * .shx"contain indices.Finally, the format " * .shp.xml" contains the geospatial metadata of the shapefile, a descriptive document of the shapefile in xml format.Associated data table contains 92 entries according to the number of wards or unions in which they were found.The column "ward union" gives the number or name of ward or union in which they were found in 2010, respectively.Furthermore, the column "SUM area a" gives the total slum area by ward/union in acre, while the column "SUM area s" gives the total slum area by ward/union in m 2 .
Dataset Item 4 (Spatial Data).ESRI shapefile containing polygons that represent the areas of the slums that exist in Dataset Item 3 (Spatial Data) but do not exist in Dataset Item 2 (Spatial Data) along with the ward/union boundary shapefile derived from the CMS and adapted to fit with the names of the most recent census.These data reveal the expansion of slum settlements into areas previously not containing slum.This shapefile is made up of seven separate file formats, each containing different information with regard to the shapefile.
For example, the geometric information of the spatial feature itself (e.g., spatial boundary of the slum cluster) is stored in the " * .shp"format, while the " * .dbf" format contains attribute data attached to the shapefile.The format " * .prj"defines the coordinate system and projection for the shapefile and the formats " * .sbn," " * .sbx," and " * .shx"contain indices.Finally, the format " * .shp.xml" contains the geospatial metadata of the shapefile, a descriptive document of the shapefile in xml format.Associated data table contains 60 entries according to the number of wards or unions in which they were found.The column "ward union" gives the number or name of ward or union in which they were found, respectively.Furthermore, the column "SUM area a" gives the total slum area by ward/union in acre, while the column "SUM area s" gives the total slum area by ward/union in m 2 .
Dataset Item 5 (Spatial Data).ESRI shapefile containing polygons that represent the areas of slum that exist in Dataset Item 2 (Spatial Data) but do not exist in Dataset Item 3 (Spatial Data) along with the ward/union boundary shapefile derived from the CMS and adapted to fit with the names of the most recent census.These data reveal the loss of slum settlements from areas previously containing slum.This shapefile is made up of seven separate file formats, each containing different information with regard to the shapefile.
For example, the geometric information of the spatial feature itself (e.g., spatial boundary of the slum cluster) is stored in the " * .shp"format, while the " * .dbf" format contains attribute data attached to the shapefile.The format " * .prj"defines the coordinate system and projection for the shapefile and the formats " * .sbn," " * .sbx," and " * .shx"contain indices.Finally, the format " * .shp.xml" contains the geospatial metadata of the shapefile, a descriptive document of the shapefile in xml format.Associated data table contains 66 entries according to the number of wards or unions in which they were found.The column "ward union" gives the number or name of ward or union in which they were found, respectively.Furthermore, the column "SUM area a" gives the total slum area by ward/union in acre, while the column "SUM area s" gives the total slum area by ward/union in m 2 .

Concluding Remarks
The dataset presented here can be considered a stepping stone for further research of slums and urban expansion in Dhaka.
To be considered accurate, the slum maps may be additionally verified with ancillary data from fieldwork or alternate remote sensing techniques.Nonetheless, the distribution data is of sufficient spatial resolution to be compared with the 2005 CMS and other LULCC mapping attempts in order to reveal urban trends and model the growth of informal settlements.

Figure 1 :
Figure 1: Map showing the generated shapefiles of Dhaka slums for 2010.