Hough-Transform-Based Interpolation Scheme for Generating Accurate Dense Spatial Maps of Air Pollutants from Sparse Sensing

. Air pollution is a significant health risk factor and causes many negative effects on the environment. Thus, arises the need for studying and assessing air-quality. Today, air-pollution assessment is mostly based on data acquired from Air Quality Monitoring (AQM) stations. These AQM stations provide continuous measurements and considered to be accurate; however, they are expensive to build and operate, thus scattered sparingly. To cope with this limitation, typically, the information obtained from those measurements is generalized with interpolation methods such as IDW or Kriging. Yet, the mathematical basis of those schemes defines that pollution extremum values are obtained at the measuring points. In addition, they are not considering the location of the pollution source or any physicochemical characteristics of pollutant hence do not reveal the real spatial air-pollution patterns. This research introduces a new interpolation scheme which breaks the interpolation process into two stages. At the first stage, the source of pollution and its estimated emission rate are inferred through a detection procedure which is based on the Hough Transform. At the second stage, based on the detected source location and emission, spatial dense pollution maps are created. The method requires, for its computation, to assume a dispersion model. To this end, any model can be used as sophisticated as it may be. Spatial maps created with simplified dispersion models in a computational simulation, show that the suggested interpolation scheme manages to create more accurate and more physically reasonable maps than the state-of-the-art.


Introduction
Air pollution is a significant risk factor for multiple health situations including eye irritation, breathing difficulties, lung cancer, heart diseases and respiratory infections [1].
In addition, air-pollution causes many negative effects on the environment like decreased visibility, acid rain, global warming, climate change, water quality deterioration and ecosystems destruction [2]. Thus, arises the need for studying and assessing air-quality's characteristics, dispersion patterns and behavior. Today, numerous air-pollution studies are based on data acquired from Air Quality Monitoring (AQM) Stations [3]. However, AQM are typically scattered sparingly, mainly near main roads, industrial factories, or near highly populated areas [4]. Thus, the AQM network has a limited ability to account for spatial variability of pollution levels in heterogeneous regions, such as urban areas, which in return, renders exposure assessment as a difficult task [5]. To cope with the measurements sparsity, the information obtained from those measurements is often generalized with mathematical methods to improve the spatio-temporal coverage. To this end, interpolation schemes are sought.
Interpolation is a mathematical method of constructing a continuous function that obtains the measured values (or close values) at the measuring point. Environmental interpolation is based on the assumption that data attributes are continuous over space and spatially dependent [6]. Grossly speaking, interpolation methods can be divided into deterministic and geostatistical methods. The first include Inverse Distance Weighted (IDW), Nearest Neighbor (NN) and radial basis functions [6], while the latter involve, for example, various types of Kriging methods [7]. Next, we focus on IDW and ordinary Kriging, owing to their frequent use in spatial maps creation.
There are many studies in the field of air pollution modeling that used IDW or Kriging for creating dense spatial map of air pollution. IDW, for example, was applied for examining the ratio between low birth weight and air pollution exposure during pregnancy [8]. In that research, the IDW interpolation was utilized for estimating PM10 levels at future mothers' home address. Clark et. al [9] examined the effect of early life exposure to air pollution on development of childhood asthma. For estimating the average exposure level of an area, IDW interpolation was applied. Ventura et. al [10] introduced multi-objective pollutant AQMs optimization. In their research, they applied a Kriging interpolation scheme for creating dense spatial maps. Sarigiannis and Saisana [11] used Kriging interpolation method to create pollution maps of CO and O3 as an additional input to their multi-objective optimization scheme, which was based on remote sensing satellites.
IDW and Ordinary Kriging are both well-known and widely used interpolation methods. However, these methods are not appropriate for creating air-pollution spatial dense maps for several reasons: The mathematical basis of those schemes defines that all interpolated values over the study area are essentially a weighted average of the measurements points, thus extremum values cannot be obtained at any other place than the measuring points. In addition, these methods do not consider the location of pollution sources or any physicochemical characteristics of pollution. Hence, the resulted dense pollution maps do fall short in describing accurately the real spatial patterns of pollution. Regarding these in the interpolation process is expected to result in better and more accurate interpolation methods.
This research introduces a Hough Transform-Based Interpolation (HTBI) method, which generates accurate dense pollution maps through finding sources' locations and the utilization of an air pollution dispersion model. The Hough Transform is a mathematical method, originated in image processing, used for detecting geometric shapes, like lines, circles or ellipses [12]. The main idea is converting from representing the shape in x,y coordinates (Cartesian) system to a parametric space, where the feature of interest is best represented. In this research, a feature space, which will represent best the source location is devised.
The method consists of two phases: at first, based on ambient concentration and assuming a dispersion model, the HTBI detects the sources' emission rates and locations. Then, using this information, the interpolation scheme builds the continuous pollution field. The suggested HTBI scheme applies no constraint on the assumed dispersion model. Hence, any dispersion model found in the literature (e.g., [13]-[15]), as sophisticated as it may be, can be incorporated into the suggested scheme.

Notation
The It is worthwhile noting that the discussion here is limited to a single pollutant interpolation, i.e., the generation of a dense map of the specific pollutant is based on a set of sparse measurements of the same pollutant.

Interpolation Scheme
Each sample ∈ { }, represents a measurement in . W.L.O.G, if we order { }, and { } is sorted so is the emission rate of source ; is a weighted combination of the contributions from all the sources, ��⃗ . Assuming a dispersion model, , so the k th element of the vector ����⃗ is the decay coefficient of source k, , in location i; sensor i's measurement, , is all sources contributions at i and is given by: Consequentially, forming the set { } as a vector, all sensors' measurements can be represented by the following matrices multiplication: Given [ ], we assume that there exists a matrix , which satisfies: For finding and , a search on the entire Ω is suggested. To this end, Ω is divided into disjoint catchments. We assume that each catchment, ⊆ Ω is small enough so the pollution is uniform all over it. For each of the catchments an estimated emission rate � is calculated, based on accepted measurements from single sample ; where is a single row of : Thus, � introduces the estimated emission rate from the single source , had it was located at , based on the single measured sample at . The same process is applied for all for each of the sensors: Applying Equation (5), results in each having its unique set of � �⃗ , one estimate for each sensor. Using the standard deviation (STD) of the estimates, the catchment with the lowest STD is the approximated location of . Once the source location, , is obtained, the emission rate of is estimated by the average of the catchment's estimates: Having the estimated emission rate , of the source , and its estimated location, , with the dispersion model , we can now estimate the dense pollution map over Ω: The process is illustrated in the simple example of Fig. 1, where three sensors are deployed in a region with one source (see Fig. 1a). While the catchments can assume any geographical region and shape, for the sake of simplicity, the region, Ω, is divided into squared catchments, forming a squared grid. Sensor 1, which measures a pollution level of 33 (i.e. 1 = 33), is located at catchment (1,3); Sensor 2, located at (2,4), measures 29; and Sensor 3, at (3,3) measures a level of 30. Keeping in mind the source's location, , is unknown, Fig. 1b demonstrates the execution of Equation (4), where each catchment is assigned with the estimated source's emission rate if the source was located in this catchment, given Sensor 1's measurement, and an exponential isotropic decay dispersion model, with an extinction coefficient . I.e., for , the Cartesian distance from the source, the pollution level at each location on the map is given by [16]: If the source was located at (2,2), then the estimated emission rate, � , based on Sensor 1's measurement, should have been 38. If the source was located at (1,4), then � , according to Sensor 1, would be 47.3. Fig. 1c and Fig. 1d are the estimation maps, generated in the same fashion as b, for Sensor 2 and Sensor 3 respectively. Assuming the dense pollution maps are a collection of isolines, the estimated emission rate values of the three sensors should agree in one grid location [17]. To evaluate the agreement, we compute, in each , the three sensors' estimates' standard deviation. The lower the STD, the higher the agreement. This is illustrated in Fig. 1e. The smallest STD, indeed is obtained at location (1, 1), where, in this example the source is located. . 1. Source location identification through HTBI assuming simple radial dispersion model.

Dispersion Models
As mentioned earlier, represents the pollution decay function of the dispersion model. The suggested scheme, HTBI, does not apply any constraint on the dispersion model used. It can be any model, as long as it allows to compute the expected pollution on any given location on the map, given the emission rate and all other meteorological parameters required by the specific model in use. In this research, two models were used, the above isotropic decay dispersion model [16] (Equation (8)) , and the wellknown Gaussian Plume Dispersion (GPD) model [18]: Where x is the downwind, y is the crosswind and z is the vertical distances of from the source; � is the time-averaged wind speed at the hight of release H; and σy and σz represent the standard deviations of the crosswind and vertical Gaussian distribution of the pollutant concentration, respectively. The model also assumes full reflection from the ground.

Computational Simulation
For generating a continuous pollution field, the two types of dispersion models, described above were used. Specifications of the models are: =8 ton/hr; wind speed (for the GPD model):4 m/hr; wind direction (GPD model): 285°; effective stack-height: 120m.
The continuous fields were sampled by the set of sensors described in Fig. 2 . To simulate real conditions, additive white Gaussian noise with Signal to Noise Ratio (SNR) of 10% (10 dB) was added to the readings of the sensors. Each sensor is now reporting the ambient level in its location as derived from the dispersion model with noise added. See Table 1 for ambient data measured in each sensor, for the radial and the GPD models.
Using only the noisy readings obtained from the sensors, { }, the source's location is estimated and then the dense pollution maps are created. (1) (2) The results obtained for the radial dispersion model (Equation (8)) are displayed in Fig.  3. The highest ambient pollution level is located at the source location and exponentially decay as moving away. However, both IDW and Kriging models create a pollution map in which the maximum pollution level is obtained at the closest sensor to the sources' location, and decay as the distance from the source decreases ( Fig. 3(a) and (b) respectively). HTBI, on the other hand, find the accurate source's location and then computes the accurate dense pollution map (Fig. 3 (c)). The interpolation results for the GPD model are presents in Fig. 4. As both IDW and Kriging do not consider physicochemical characteristics nor atmospheric conditions, the maximum of the dense pollution maps is found at the closest sensor downwind from the sources' location ( Fig. 4(a) and (b) respectively). Moreover, the created maps demonstrate a roughly radial dispersion around this point, which is not the true condition, due to the wind. HTBI, as presented in Fig. 4(c), does manage to create a dense spatial map which complies with the Gaussian plume behavior. This is attributed to the fact that the HTBI method does incorporates the Gaussian model, as it can incorporate any dispersion model. The suggested algorithm is deterministic in nature, i.e., for the same input, the system will produce the same output. Therefore, the uncertainty in the system stems from the uncertainty of the measurements, i.e., measurements noise [19]- [22]. The results of Fig. 3 and Fig. 4 were obtained at a noise level of 10%. (SNR, of 10dB). For evaluating the robustness of the algorithm, different noise levels were tested with the system. The radial model (Equation (8)) showed stability even with up to 50% errors (SNR of 3dB). The Gaussian model's (Equation (7)) robustness showed dependency on the catchments size. For larger catchment sizes (e.g., cell size of 40m 2 ), our algorithm showed stability up to 10% SNR. However, increasing the spatial resolution to a cell size of 20m 2 , the HTBI showed higher sensitivity to measurement noise and showed the correct source location and interpolation maps for noise levels of up to 5% (13 dB). For lower SNR values the algorithm faced difficulties in locating the source and consequentially generating the dense pollution maps.

Conclusions
IDW and Ordinary Kriging are well-known and commonly-used interpolation methods for creating dense spatial maps, however they are not considering the physicochemical properties of the pollution characteristics nor the source location, therefore not accurate for this task. In this research, we introduced the Hough Transform Based Interpolation (HTBI), a two-phase interpolation scheme, which addresses these limitations. At the first phase, the HTBI detects sources' locations and their estimated emission rate. Using this information, at the second phase, a dense pollution spatial map is built. The method incorporates an air-pollution dispersion model into its calculations. This may be any dispersion model that can be found in the literature. Comparing between the dense pollution maps created by the HTBI, IDW and Ordinary Kriging shows that the HTBI creates spatial maps, which represents the true pollution maps better and thus, is more accurate and sensible interpolation scheme. However, this work showed a computational simulation of a simple configuration with only one emission source. Implementing the method to a real-word situation is challenging. Air pollution emitted from many sources including industrial zones and transportation (line source). Hence, HTBI should be adjusted to face with this complex situation of multi sources detection. Despite the above, HTBI indeed can be used in its current form, a single source detection. We can imagine at least two scenarios in which such configuration applies. The first is indeed when a single source can be identified. For example, when considering SO2 which is emitted mainly from factories, and the study area contains only single industrial zone. The second is a case of leaks and we would like to identify the leak's source. In these cases, HTBI will be able to produce better and accurate spatial pollution maps than the existing methods. Current work, carried out these days, is focusing on the implementation of HTBI in exactly such scenarios. Another aspect this work sheds light on is the number of sensors and the way they scattered in the study area. It is obvious that the higher the number of sensors, the easier it will be to locate the source. There is a need for further research in finding the optimal number of sensors in a given area. The parameters that should be considered are the size of the study area, the characters of it (an open area is not the same as crowded urban area.), the coverage capacity and accuracy of the sensors and more.