On Distance Mapping from non-Euclidean Spaces to Euclidean Spaces

. Most Machine Learning techniques traditionally rely on some forms of Euclidean Distances, computed in a Euclidean space (typically R d ). In more general cases, data might not live in a classical Euclidean space, and it can be diﬃcult (or impossible) to ﬁnd a direct representation for it in R d . Therefore, distance mapping from a non-Euclidean space to a canonical Euclidean space is essentially needed. We present in this paper a possible distance-mapping algorithm, such that the behavior of the pairwise distances in the mapped Euclidean space is preserved, compared to those in the original non-Euclidean space. Experimental results of the mapping algorithm are discussed on a speciﬁc type of datasets made of timestamped GPS coordinates. The comparison of the original and mapped distances, as well as the standard errors of the mapped distributions, are discussed.


Introduction
Traditionally, most data mining and machine learning have relied on the classical (or variations thereof) Euclidean distance (Minkowski distance with the exponent set to 2), over data that lies in R d (with d ∈ N + ). One problem with this approach is that it forces the data provider to process the original, raw data, in such a way that it is in R d , while it might not be natural to do so. For example, encoding arbitrary attributes (such as words from a specific, finite set) using integers, creates an underlying order between the elements, which has to be carefully taken care of. It also creates a certain distance between the elements, and the various choices in the conversion process are practically unlimited and difficult to address. We therefore look here into the possibility of not converting the data to a Euclidean space, and retain the data in its original space, provided that we have a distance function between its elements. In effect, we "only" require and concern ourselves, in this paper, with metric spaces, over potentially non-Euclidean spaces. With the fact that most data mining and machine learning methods rely on Euclidean distances and their properties, we want to verify that in such a case, the distances in a non-Euclidean metric space can behave close enough to the Euclidean distances over a Euclidean space. The reasoning behind this is that traditional Machine Learning and Data Mining algorithms might expect the distances between elements to behave in a certain manner, and respect certain properties, which we attempt to mimic with the following distance mapping approach.
A distance-mapping algorithm, in the context of this paper, is an approach that takes a set of objects as well as their pairwise distance function in the specific non-Euclidean space (so, a non-Euclidean metric space), and maps those pairwise distances to a canonical Euclidean space, in such a way that the distance distribution among the objects is approximately preserved in the mapped Euclidean space.
Distance-mapping algorithms are a useful tool in data applications, for example, data clustering and visualisations. Another good use case for distancemapping, is mutual information [3,7,10], which is used to quantitatively measure the mutual dependence between two (or more) sets of random variables in information theory. Mutual information is most often estimated by constructing the k-nearest neighbors [7,10] graphs of the underlying data, which thus rely on the Euclidean distances. Hence, there is a strong need to re-calculate the distances over a potentially non-Euclidean space.
In the following section 2, we first introduce basic notations to describe the distance mapping approach in sections 3, 4 and 5. After a short discussion about important implementation details in section 7, we finally present and discuss results over a synthetic data set in section 8.

Notations
As in the data privacy literature, one traditionally defines a dataset of N records by X = [x 1 , ..., x N ] T , the matrix of N samples (records) with d attributes is the set of all the possible values for a certain attribute A (j) . Hence, we can see the vector [a N ] T ∈ X (j) as a discrete random variable for a certain attribute over all the N samples.
Let us consider a metric space X (j) = (X (j) , d (j) ) using the set X (j) explained above, endowed with the distance function d (j) : X (j) × X (j) −→ R + . Generally, X (j) need not be an Euclidean metric space.

Distances over non-Euclidean spaces
Now we consider two metric spaces X (i) = (X (i) , d (i) ) and X (j) = (X (j) , d (j) ). Let us assume X (i) to be a canonical Euclidean space with the distance function d (i) the Euclidean norm and X (i) = R d , while X (j) is a non-Euclidean space endowed with a non-Euclidean distance function d (j) . Assume x (j) and y (j) are two sets of discrete independent and identically distributed (iid) random variables for a certain attribute over X (j) . The distances within the metric space X (j) can then be constructed by another set of random variable z (j) : where the values of z (j) are over R + .
We denote by f z (j) (d) the probability density function (PDF) of z (j) , which describes the pairwise distance distribution over the non-Euclidean metric space X (j) . In the same way, we define f z (i) (d) to be the distribution of pairwise distances over the Euclidean metric space X (i) .
We assume that there is a way to transform the distribution of the non- in such a way that f map z (j) (d) can be as close as possible to the Euclidean distance distribution f z (i) (d): meaning that the two probability density function are equal at every point evaluated, in the limit case.
As we are using a limit number of realisations N of the random variables to estimate the distribution f z (j) (d), the limit over N is based on the assumption that we can "afford" to draw sufficiently large enough number N of the variables to possibly estimate f z (j) (d) to be close enough to f z (i) (d). We present the mapping approach used in this paper in the following section 4, by solving an integral equation so as to obtain equal probability masses.

Mapping solution
We propose to use Machine Learning (more specifically, Universal Function Approximators [4]) to map the distribution f z (j) of the non-Euclidean distance to the distribution f z (i) of the Euclidean distance, with the fact that most Machine Learning techniques are able to fit a continuous input to another different continuous output.
We then want to make it so that given a certain distance z = d (j) (x, y) obtained over X (j) , we calculate α such that We want to obtain α values so that the probability masses of the distances in the Non-Euclidean metric space X (j) and the Euclidean metric space X (i) are the same. The α value is the mapped distance over X (i) . To obtain α, firstly we need to calculate the integral in the left part of Eq. 3 with the given z; secondly we need to calculate the integral in the right part of Eq. 3 as a function of α; the α value can then be solved with Eq. 3.
The following section 5 describes in practice the algorithm to achieve the distance mapping proposed.
To calculate the integral in the left part of Eq. 3, we first need to construct the distribution function f z (j) using Machine Learning for functional estimates. The algorithm for distance mapping is explained as follows: A.1 Draw as many samples as possible from x (j) and y (j) (random variables over Use a Machine Learning algorithm to learn this histogram: this creates an un-normalized version of f z (j) (t); A. 5 Compute the integral f z (j) (t) over its domain to obtain the normalizing constant C (j) ; A.6 Normalize the estimated function from 4. with the constant C (j) ; A.7 This yields a functional representation g (j) (t) of f z (j) (t) that behaves as an estimate of the PDF of z (j) ; A. 8 We can finally integrate g (j) (t) from 0 to z (which was the given distance value) to obtain a value we denote β: and this is also done numerically; A. 9 We assume the cumulative distribution function (CDF) of the Euclidean distances α = d (i) (x, y) to be F z (i) (α). Solving Eq. 3 now becomes: where F −1 z (i) (β) is the inverse of the CDF in the mapped Euclidean space X (i) . The distances are then mapped from z to α.
Note that this algorithm is independent on the nature of X (i) : at this point, X (i) can be any metric space. In the following, we look at the two possibilities of mapping a non-Euclidean space to a Euclidean space, or to another, non-Euclidean space (for completeness sake).
So, we are presented with two possibilities: is the canonical Euclidean space, i.e. X (i) = R and d (i) is the Euclidean distance over R; -X (i) is not the canonical Euclidean space, and the set of all necessary values X (i) does not have to be R, while d (i) is the Euclidean distance.

First case
In the case of X (i) being the canonical Euclidean space, we can find analytical expressions for f z (i) in Eq. 3, by making assumptions on how the variables x (i) and y (i) are distributed. If such assumptions are not acceptable for some reason, it is always possible to revert to the estimation approach mentioned above, or possibly solve analytically as below for other well-know distributions.
If x (i) and y (i) are normally distributed We assume that x (i) and y (i) follow a normal distribution N (µ, σ 2 ) with mean µ and variance σ 2 . x (i) and y (i) are iid. It is then clear that z (i) = d(x (i) , y (i) ) = |x (i) − y (i) | is distributed as a folded normal distribution of mean 0 and variance 2σ 2 : z (i) ∼ N f (0, 2σ 2 ). The probability density function of z (i) can then be described as: Its CDF follows that Finally, as we have calculated β in Eq. 5, we can solve easily If x (i) and y (i) are uniformly distributed If we assume that x (i) and y (i) follow a uniform distribution U(a, b), with (a ≤ b), and are iid. The probability distribution of the distances z (i) is then obtained that which means that Solving as before, we finally have that given the fact that the other solution is not acceptable for our case (negative result for a distance value).

If the distances |x (i) -y (i) | are Rayleigh distributed
A Rayleigh distribution often arises when the metric space is analyzed by the magnitude of the orthogonal two-dimensional vector components. If we assume the distances z (i) = |x (i) − y (i) | follow a Rayleigh distribution with the scale parameter σ that The cumulative density then becomes The mapped distances α can then be solved by Other distributions We have discussed above the most common distributions of the distances z (i) in the canonical Euclidean space X (i) . Certainly, the PDF of the distances z (i) can exist in other less common fashions in certain specific circumstances. The implementation of the solution is not consummate so far. However, for the datasets used in practice with this work, which consist of timestamped GPS coordinates in the form of latitudes and longitudes, the discussed typical distributions are sufficient enough to illustrate the mapped distributions in the canonical Euclidean space. We will discuss about it in section 8.

Second case
In the second case, where X (i) is not canonical Euclidean space, we basically have to perform the same estimate of f z (i) and its integral from 0 up to α, as we did for f z (j) . The result is another function G(α): We then have to solve numerically for α.

Using ELM to learn the functional distribution
We propose to use Extreme Learning Machines (ELM) [5,6] as the mapping tool between distance functions. The reason for choosing this specific Machine Learning technique is its excellent performance/computational time ratio among all the techniques. The model is simple and involves a minimal amount of computations. Since we are dealing with the limit problem of the number of records N to estimate the distribution f z (j) (d) (in Eq. 2), the ELM model is applicable in that it can learn the mapping in reasonable time for large amounts of data, if such a need arises. ELM is a universal function approximator, which can fit any continuous function. The ELM algorithm was originally proposed by Guang-Bin Huang et al. in [6], and further developed, e.g. in [12,9,8], and analysed in [2]. It uses the structure of a Single Layer Feed-forward Neural Network (SLFN) [1]. The main concept behind the ELM approach is its random initialization, instead of a computationally costly procedure of training the hidden layer. The output weights matrix is then to be found between the hidden representation of the inputs and the outputs.
It works as following: Consider a set of N district observations (x i ,y i ), with x i ∈ R d , y i ∈ R c , and i = 1, ..., N . In the case the SLFN would perfectly approximates the data, with the errors between the estimated outputsŷ i and the actual outputs y i being zeros:ŷ i = y i . The relation between inputs, weights and outputs is then: where ϕ : R d → R c is the activation function of the hidden neurons; w j is the input weights; b j is the biases; and β j is the output weights. Eq. 18 can also be written compactly as: with β = (β T 1 , ..., β T n ) T , and Y = (y T 1 , ..., y T n ) T . The output weights β can be solved from the hidden layer representation of inputs H and the actual outputs Y : where H † is the Moore-Penrose generalised inverse [11] of the matrix H. The ELM training does not require iterations, so the most computationally costly part is the calculation of a pseudo-inverse of the matrix H. This makes ELM an extremely fast Machine Learning method. Thus, we propose to use ELM to learn the distribution of the pairwise distances over non-Euclidean spaces f z (j) (t) or F z (j) (t).

Implementation improvement
When implemented the mapping solution straightforwardly as in section 5, the algorithm spends most of the CPU time on calculating the integral of f z (j) (t) over the distances z (j) numerically as in Eq. 4. This consumes lots of computational time. This is because the number of the pairwise distances z (j) is N (N − 1)/2, which can obviously grow to a very large value when the data size N increases. Thus, we avoided the integration calculations by using machine leaning to learn the CDF F z (j) , instead of learning the PDF f z (j) in A.4. This yields a function representation of F z (j) (t) (with the normalisation constant directly from F z (j) ). β can then be obtained straight from F z (j) (z).
The second most CPU consuming step in this algorithm is to find the most suitable described distribution of f z (i) (t) or F z (i) (t) in the Euclidean space X (i) . To choose whether the non-Euclidean distribution should best be mapped to a Normal, Uniform, Rayleigh, or other distribution, we have to fit the F z (j) (t) to those well defined canonical Euclidean distances distributions and find the optimised parameters in the best suitable distribution with the least errors.
Again, if we use the pairwise distances z (j) in F z (j) directly, the fitting computation is very heavy as we are trying to fit the data with N (N − 1)/2 points. To make it easy, we use the functional representation of F z (j) (t) with the userdefined distances in the pre-defined domain, with the purpose only to find the best distribution and its parameters (the functional presentation of F z (i) (t)). Then the mapped distance α can be obtained from Eq. 6 with the calculated β and the inverse functional representation of F z (i) (t).
In the following section 8, we present results over the typical data used for this work, GPS traces (latitude and longitude).

Experimental results
We have tested the proposed mapping algorithm by mapping the pairwise distances in the dataset of GPS coordinates in the form of latitudes and longitudes, which is shown as the trajectory in Fig. 1. Assume we have a dataset X = [x 1 , ..., x N ] T to depict the trajectory of one specific person, where the attributes of each record x i explain the locations by latitude and longitude coordinates at the corresponding time t i .
Note that the metric space of the GPS coordinates X (gps) = (X (gps) , d (gps) ) is a non-Euclidean space, because the distance d (gps) of two GPS coordinates (lat, lon) is the shortest route between the two points on the Earth's surface, namely, a segment of a great circle.
We first explore the limit condition on the number of records N in Eq. 2, in that N needs to be sufficiently large to possibly estimate f z (j) (or F z (j) ) to be close enough to f z (i) (or F z (i) ). We test on experimental datasets with various N = 10, 30, 100, 1000, within which each location record is randomly chosen along the introduced trajectory in Fig. 1. Fig. 2 illustrates the comparisons of the CDF F z (j) (d) of the pairwise distances obtained from X (gps) = (X (gps) , d (gps) ), and the CDF F z (i) (d) of the mapped distances in the Euclidean space, with N = 10, 30, 100, 1000 for the four subplots respectively.
It is clear to see that, in this specific simple case, with small N values of 10 and 30, there exists comparable disagreements of the CDF distributions between X (gps) and the mapped Euclidean space. Meanwhile, with the larger N values of 100 and 1000, the limit condition on N is well satisfied, as it is plain to see that the non-Euclidean GPS metric X (gps) behaves over its non-Euclidean space,   accurately close to the mapped Euclidean metric over the mapped Euclidean space.
Thus, we can see that the number of record N = 100 is sufficient to closely estimate the distribution f z (j) (d) to f z (i) (d) in this very simple case. The Standard Errors (SE) of the mapped distribution f z (i) are calculated, meanwhile selecting meticulously denser and broader N values from 5 to 5000, along the specific route. Fig. 3 shows the SEs of the mapped distribution with the dependence on N . As the latitude and longitude coordinates are linearly altering in this simple case, the distances are mapped to a uniform distribution straightforwardly. The SE of the mapped f z (i) (d) converges closely to 0 at very small N ≃ 50.

Conclusion
We have developed and implemented a distance-mapping algorithm, which projects a non-Euclidean space to a canonical Euclidean space, in a way that the pairwise distances distributions are approximately preserved. The mapping algorithm is based on the assumptions that both spaces are actual metric spaces, and that the number of records N is large enough to estimate the pairwise distances f z (j) (or F z (j) ) to be close enough to f z (i) (or F z (i) ). We have tested our algorithm by illustrating the distance mapping of an experimental dataset of GPS coordinates. The limitation condition of N is discussed by the comparison of F z (j) and its mapped F z (i) , using various N values. The standard errors of the mapped distance distribution F z (i) is also analyzed with various N .
Our distance mapping algorithm is performed with the most common canonical Euclidean distance distributions. Certainly, less common distributions are needed to be implemented as well, and might require specific adjustments. More diversified experimental examples are needed for completeness.