Hyperdimensional Computing with Learnable Projection for User Adaptation Framework

. Brain-inspired Hyperdimensional Computing (HDC), a machine learning (ML) model featuring high energy efficiency and fast adaptability, provides a promising solution to many real-world tasks on resource-limited devices. This paper introduces an HDC-based user adaptation framework, which requires efficient fine-tuning of HDC models to boost accuracy. Specifically, we propose two techniques for HDC, including the learnable projection and the fusion mechanism for the Associative Memory (AM). Compared with the user adaptation framework based on the original HDC, our proposed framework shows 4.8% and 3.5% of accuracy improvements on two benchmark datasets, including the ISOLET dataset and the UCIHAR dataset, respectively.


Introduction
With the emergence of Internet of Things (IoT), much data is generated by embedded devices [1].Many IoT applications collect and analyze those data to train machine learning (ML) models in cloud servers and deploy the trained models back to devices for inference.However, performance of deployed models is sensitive to misaligned data distribution caused by subject difference [2] and time-varying properties [3,4], as depicted in Fig. 1(a).To compensate for the potential performance degradation, a user adaptation framework is required to boost model performance by dynamically adapting models to user data [5][6][7].The authors of [2] proposed a cloud-based model adaptation framework that requires edge clients to upload data for retraining.However, the wirelessly transmitted data poses a threat to user's privacy [8] and introduces extra energy consumption.Therefore, an on-device adaptation framework is preferable [9,10].However, an ML model with high energy efficiency and fast adaptability is necessary due to limited resources on embedded devices.Recently, brain-inspired Hyperdimensional Computing (HDC) is emerging as a lightweight alternative to high-complexity ML models.HDC emulates patterns of neural activities in human brains by projecting data into high-dimensional (HD) vectors, called Hypervectors (HVs) [11].Through exploiting the mathematical properties of HVs in the HD space, HDC has shown the characteristics of high energy efficiency and fast adaptability in a wide variety of real-world applications, such as image classification [12,13], speech recognition [14], and bio-signal processing [15].These advantages make HDC suitable for on-device user adaptation framework, as depicted in Fig. 1(b).However, we argue that the original HDC algorithm ignores significant correlation between features during projection, leading to suboptimal accuracy.To close the knowledge gap, we enable HDC to learn the feature correlation of input data to improve its overall performance.In the cloud, we first transform the original processing flow of HDC into a learnable network, called learnable HDC (L-HDC).L-HDC explicitly emulates the fundamental operations of HDC and learns the feature correlation of input data by backpropagation.After training, we transform L-HDC back to the original HDC, while the weights of L-HDC are kept in the original HDC to perform feature-aware projection.Given the characteristics of high energy efficiency and fast adaptability, HDC models on resource-limited devices can efficiently adapt to the user data.We evaluate the effectiveness of our proposed framework on two benchmark datasets, including the speech recognition dataset ISOLET [16] and the human activity recognition dataset UCIHAR [17].Based on our simulation results under the settings of a user adaptation framework, our proposed HDC with learnable projection outperforms that with the original projection by 4.8% and 3.5% in ISOLET and UCIHAR, respectively.To the best of our knowledge, this paper first applies HDC with learnable projection to a user adaptation framework to improve classification accuracy.
The rest of the paper is organized as follows.Section 2 describes the algorithm of HDC.Section 3 illustrates our proposed HDC with learnable projection for the user adaptation framework.Experimental settings and simulation results are shown in Section 4. Finally, we conclude this paper in Section 5.

Hyperdimensional Computing
HDC operates with randomly generated bipolar HVs whose components are -1 or 1 with equal probability.The processing flow of HDC is shown in Fig. 2 and includes the following steps.Projection into HD Space.The first step of HDC is to project features into HVs.A feature comprises two parts, i.e., its feature identifier (ID) and its actual value, which are projected into -dimensional HVs through an Item Memory (IM) and a Continuous Item Memory (CIM), respectively.Assume there are  features in one data sample,  = { 1 ,  2 , …   } ∈ {+1, −1}  , where   is the projected HV for the  ℎ feature ID.
When  is large enough, every two HVs in  are nearly orthogonal [18], which means (  ,   ) ≅ 0.5,   ≠  and (•) is the normalized Hamming distance between two vectors.In other words, the projection of the original HDC assumes that there is approximately no correlation among each feature.CIM is used as a look-up table to project the actual value into an HV.HDC generates the CIM by the following procedures.First, HDC finds the maximum value   and minimum value   for each feature.The range between   and   is then equally quantized to  levels, denoted by { 1 ,  2 , …   } ∈ ℤ with  1 and   corresponding to   and   .Each scalar in { 1 ,  2 , …   } is associated with an HV in  = { 1 ,  2 , …   } ∈ {+1, −1}  .Namely,   is the projected HV for   Moreover, to preserve the spatial correlation of neighboring levels, if   and   are relatively close,   and   have relatively small Hamming distance.To achieve this, two randomly generated HVs  1 and   are first assigned to   and   , respectively.Then, / randomly selected bits are flipped to generate an HV for the next level.This process repeats until  HVs in  are generated.The bit-flipping approach ensures a high correlation between adjacent levels, while  1 and   are orthogonal.For each feature, by looking up the nearest quantized level to its actual value, the HV for its value is selected and denoted as   , where   ∈ ,  = 1, 2, … ,  .After projecting each feature into HVs, a set of two-vector pairs  = {( 1 ,  1 ), ( 2 ,  2 ), … , (  ,   )} is generated for the following step.
Encoding.In this stage, HDC encodes the set  into one representative HV.First, HDC performs the binding operation, an element-wise XOR operation (⊗) between two HVs, to each two-vector pair in the set  and generates  HVs.Then, HDC bundles those  HVs into an encoded HV  ∈ {+1, −1}  by performing element-wise addition (+) followed by a sign function [•], which is expressed as Training.After encoding, we denote    as the encoded HV of the  ℎ data sample corresponding to the  ℎ class.Then, HVs belonging to the  ℎ class are accumulated to form a class HV   ∈ ℤ  , which is computed as: is the number of data samples in the  ℎ class.
Note that we compute the Hamming distance with the   but update the .After several iterations, the new   is obtained by bipolarizing the updated .

Proposed Hyperdimensional Computing with Learnable Projection for User Adaptation Framework
In this section, we introduce two proposed techniques used in HDC for the user adaptation framework, as illustrated in Fig. 3.The first one is the learnable projection, which transforms the original HDC into a learnable network L-HDC.The architecture of L-HDC is similar to a binarized neural network (BNN) [19] since both models compute with bipolar weights.When training on cloud, L-HDC learns a better featureaware projection by backpropagation compared with the original one described in Section 2.Then, the weights of learnable projection in L-HDC are transformed back to the learned  and learned  to utilize the efficient and fast adaptability of HDC on resource-limited devices.The second one is the AM fusion mechanism to exploit the information learned from L-HDC and further improve the performance when we deploy HDC on devices.The details of these two techniques are elaborated as follows.Training with Bipolar Weights.In L-HDC, the training method used in the second layer (FC layer) is similar to that in BNN [19], which uses bipolar weights to compute gradients and accumulates the gradients on real-valued weights.However, the connection of the first layer tailored for HDC is not FC so that we cannot directly apply the training method in BNN.To solve this problem, we illustrate how to update and compute the gradients of  and   as follows.
Suppose the gradient of the output of the first layer computed by the  ℎ training sample is  ∈   , the gradient of   , denoted as   ∈  × , is computed as where • is the dot product of two vectors.And the gradient of   , denoted as   ∈  × , is computed as Finally, we update   and   by the gradients obtained from ( 4) and ( 5) with the updating techniques mentioned in [19].[+] [+] [+] The first layer The second layer

AM Fusion Mechanism
In the training and adaptation process, L-HDC needs to iteratively compute gradients of weights and update weights by the gradient descent, which cause huge computational burdens on edge devices.Therefore, after completing training L-HDC in the cloud, we transform the architecture of L-HDC back to that of the original HDC to efficiently adapt to user's data by using (3).Note that the   and   of L-HDC become the corresponding HVs in the learned  and the learned  of HDC, which preserves the information of the feature correlations.However, since   is in the bipolar representation and the adaptation of HDC relies on updating , whose components are integers, we cannot directly apply   to the original HDC.
Intuitively, based on the learned  and the learned , we can obtain a new AM, denoted as  ℎ , by training HDC from scratch.Nevertheless, the information learned in   of L-HDC can be further exploited to improve the performance.To inherit the information in   , we first transform   into an AM, whose HVs corresponds to the weights of   .Then, we multiply the AM by a scaling factor  to generate   with integer HVs and add  ℎ to   followed by division by 2. Lastly, the combined AM is deployed to edge devices for further user adaptation.In Section 4, we demonstrate that the AM fusion mechanism which combines two types of AMs achieves higher accuracy than directly using one  ℎ .

4
Experimental Settings and Simulation Results

Datasets and Experimental Setup
To evaluate the effectiveness of our proposed techniques for the user adaptation framework, we conduct experiments on two benchmark datasets, including the speech recognition ISOLET dataset [16] and the human activity recognition UCIHAR dataset [17].In ISOLET, 150 subjects speak the name of 26 letters of the alphabets, and 617 features of voice signals are extracted.Each subject contains 52 data As for UCIHAR, it recognizes 6 human activities based on 3-axial linear acceleration and angular velocity at a constant rate of 50Hz.There are 30 subjects in UCIHAR and each is with 343 data samples on average.To simulate the scenario of user adaptation, we refer to the experimental settings in [5,7].For both datasets, we first divide them by subjects into two parts, representing the public dataset on cloud and the user dataset on edge.In practice, we randomly divide the ISOLET dataset into 100 subjects and 50 subjects as the public dataset and the user dataset, respectively.Likewise, we randomly divide UCIHAR dataset into 25 subjects and 5 subjects as the public dataset and the user dataset.Then, we further separate the public dataset into the public training dataset and the public validation dataset set with the ratio of 3:1.And the user dataset is also divided into the user adaptation dataset and the user testing dataset with the ratio of 1:1.
In the experiments, we first train L-HDC by the public training data with a learning rate of 0.001 and evaluate it with the public validation data to obtain the best model.
After training, we generate the learned , , and the new  by the AM fusion mechanism for the HDC model on edge.Then, the user adaptation data is used to adapt the HDC model.Finally, we utilize the user testing data to evaluate the accuracy of the HDC model.All experiments are conducted over 10 independent trials to obtain the averaged simulation results.The experimental setups are summarized in Table 1.

Comparisons
We compare our proposed framework with the other two for user adaptation, including the original HDC framework, and the L-HDC framework without the AM fusion mechanism.All the frameworks train the model with 100 iterations on cloud and adapt the model with 50 iterations to ensure performance saturation.The details of these two compared frameworks are described as follows.
Original HDC Framework.Compared with our proposed framework, the original HDC framework trains the original HDC model on cloud and directly deploys the trained HDC model on devices for user adaptation.As shown in the simulation results, our proposed framework with the learnable projection achieves higher accuracy than the original HDC framework which ignores the feature correlations.
Our Proposed Framework without AM Fusion Mechanism.After training L-HDC on cloud, the learned  and the learn  are deployed to the HDC model on edge.But in this comparison, we only transfer  ℎ to edge devices for user adaptation rather than that obtained by the AM fusion mechanism.The simulation results demonstrate that the AM fusion mechanism performs better than that with  ℎ after adaptation.

Analysis of Accuracy
Fig. 5 illustrates the accuracy curves on the user testing data of different frameworks in ISOLET and UCIHAR.Compared with the original HDC framework with  = 3,000, our proposed framework with the same dimensionality improves 4.9% and 11.3% of accuracy in ISOLET and UCIHAR, respectively, before adaptation (training iterations=100).On the other hand, our proposed framework still provides 1.1% and 3.5% accuracy improvement compared to the original HDC framework with higher dimension  = 10,000.Both simulation results demonstrate L-HDC with the learnable projection achieves better performance than HDC with the original projection method since the feature correlations are considered.Compared with our proposed framework without the AM fusion mechanism, that with the AM fusion mechanism achieves 1.5% and 1.6% higher accuracy in ISOLET and UCIHAR, respectively, after adaptation (training iterations=150).The results show that the AM fusion mechanism can effectively preserve the knowledge learned from L-HDC and thus improve the performance.Overall, the proposed framework outperforms the original HDC framework by 4.8 % and 3.5% in ISOLET and UCIHAR, respectively.All simulation results are summarized in Table 2.

Cloud Server Edge Device
Cloud Server Edge Device

Conclusions
In this paper, we propose two techniques, learnable projection and AM fusion mechanism, to improve the performance of HDC in the user adaptation framework.We transform the architecture of the original HDC into L-HDC to learn the feature correlations of data.Moreover, the AM fusion mechanism exploits the AM obtained from L-HDC to further improve HDC model's performance.
Based on the simulation results, L-HDC gives each deployed model a better initial point than the original HDC in terms of accuracy for 4.9% and 11.3% improvement in ISOLET and UCIHAR, respectively.Moreover, AM fusion mechanism avoids the accuracy degradation and enhances the accuracy progress on edge devices.Overall, our proposed framework compared with the original HDC framework improves 4.8% and 3.5% of accuracy in ISOLET and UCIHAR, respectively.

Fig. 1 .
Fig. 1.(a) Without user adaptation, the accuracy of the deployed model easily suffers from misaligned data distribution and time-varying properties.(b) The characteristics of high energy efficiency and fast adaptability make HDC suitable for the user adaptation framework.

Fig. 3 .
Fig. 3. Overview of the proposed HDC with the learnable projection for the user adaptation framework.

Fig. 4 .
Fig. 4. (a) Demonstration of weights in L-HDC corresponding to memory tables in the original HDC.(b) The overall architecture of L-HDC.

Fig. 5 .
Fig. 5. Accuracy analysis on the user testing data of different frameworks in (a) the ISOLET dataset and (b) the UCIHAR dataset.

Fig. 6
Fig. 6 illustrates the histogram of the normalized Hamming distance between each HV in the  of the original HDC and the learned  obtained from L-HDC, respectively.As mentioned in Section 2, HVs in  of the original HDC are nearly orthogonal to each other.This indicates that features are uncorrelated to each other, and thus most of their mutual normalized Hamming distance is around 0.5.For the learned , L-HDC can learn the feature correlations during the training process.Therefore, the histogram of the learned  is more diverse and provides HDC with a better projection for higher performance, as shown in Section 4.3.

Fig. 6 .
Fig. 6.Histogram of the normalized Hamming distance between each two HVs in the learned IM (orange) and the original IM (blue) To enhance the classification accuracy, HDC performs adaptation by iteratively validating the training data.If the training sample is correctly classified, no change happens.However, if the query HV  of the training sample is misclassified, then  is added to the correct class HV   in  and subtracted from the incorrectly predicted class HV   , which can be computed as: (1) a -class classification task, there are  class HVs stored in an Associative Memory  = { 1 ,  2 , …   } ∈ ℤ  .To perform efficient inference on devices, HDC bipolarizes each class HV in  into a bipolarized one and generate a corresponding bipolarized AM  Inference.In the inference phase, a testing data is first transformed by(1)and is encoded as a query HV.Then, HDC computes the Hamming distance between the query HV and the class HVs in the bipolarized AM.The class with the minimum distance is outputted as a prediction.Adaptation.
To facilitate the subsequent training, we find the index of each feature value by looking up its nearest quantized level and record the index in a matrix  ∈ ℤ × , where  is the number of the training samples with  feature components.[][] symbolizes the entry of the  ℎ row and the  ℎ column of .

Table 1 .
Experimental setups for the user adaptation framework in (a) the ISOLET dataset and (b) the UCIHAR dataset.

Table 2 .
Averaged accuracy of different frameworks on the user testing data in (a) the ISOLET dataset and (b) the UCIHAR dataset