A Comparison of PHY-Based Fingerprinting Methods Used to Enhance Network Access Control

. Network complexity continues to evolve and more robust measures are required to ensure network integrity and mitigate unauthorized access. A physical-layer (PHY) augmentation to Medium Access Control (MAC) authentication is considered using PHY-based Distinct Native Attribute (DNA) features to form device ﬁngerprints. Speciﬁcally, a comparison of waveform-based Radio Frequency DNA (RF-DNA) and Constellation-Based DNA (CB-DNA) ﬁngerprinting methods is provided using unintentional Ethernet cable emissions for 10BASE-T signaling. For the ﬁrst time a direct comparison is achievable between the two methods given the evaluation uses the same experimentally collected emissions to generate RF-DNA and CB-DNA ﬁngerprints. RF-DNA ﬁn-gerprinting exploits device dependent features derived from instantaneous preamble responses within communication bursts. For these same bursts, the CB-DNA approach uses device dependent features derived from mapped symbol clusters within an adapted two-dimensional (2D) binary constellation. The evaluation uses 16 wired Ethernet devices from 4 diﬀerent manufacturers and both Cross-Model (manufacturer) Discrimination (CMD) and Like-Model (serial number) Discrimination (LMD) is addressed. Discrimination is assessed using a Multiple Discriminant Analysis, Maximum Likelihood (MDA/ML) classiﬁer. Results show that both RF-DNA and CB-DNA approaches perform well for CMD with average correct classiﬁcation of % C =90% achieved at Signal-to-Noise Ratios of SNR ≥ 12 . 0 dB. Consistent with prior related work, LMD discrimination is more challenging with CB-DNA achieving % C =90.0% at SNR =22 . 0 dB and signiﬁcantly outperforming RF-DNA which only achieved % C =56.0% at this same SNR .


Introduction
Network Access Control (NAC) solutions implement strategies which allow devices and/or users access to a given network. There are many NAC solutions that can be employed by a network administrator to include mapping Medium Access Control (MAC) addresses to specific ports, device credentials, and querying the hardware and software of a device. Each of these potential solutions suffer from weakness to include an attackers ability to spoof specific device information or steal device credentials. As each year passes technical capability expands and more devices are able to connect to a network. This expansion creates unique security challenges and increases the potential for unauthorized access. Physicallayer (PHY) augmentation of MAC based authentication processes provides one means to improve security and network authentication reliability. The envisioned PHY-augmented authentication process utilizes a device's digital ID (e.g., MAC address) and PHY features extracted from the device's communication signal. Ideally, the device's fingerprint consists of unique PHY features that enable reliable discrimination. The final authentication decision, to allow or deny network access, is based on 1) presentation of an authorized MAC address and 2) a statistical match between the current Distinct Native Attribute (DNA) features of the device presenting the MAC address and the stored DNA for the actual device assigned the MAC address.
The majority of PHY-based fingerprinting methods are based on features generated from transient, invariant or entire burst responses as discussed in the review presented in [1]. It is concluded in [1] that many of the PHY fingerprinting techniques discussed lack proper performance evaluation. It is the goal of this work to conduct performance evaluation between the two most prevalent approaches in [1]. The contributions of the research presented in this paper includes: 1) a direct comparison of performance in waveform-based Radio Frequency DNA (RF-DNA) and Constellation-Based DNA (CB-DNA) approaches by, 2) utilizing for the first time the same collected emissions for both approaches, 3) the CB-DNA approach is expanded for the first time to include conditional constellation point sub-clusters, and 4) expand CB-DNA classification to include Like-Model Discrimination (LMD).
The paper is organized as follows. Section 2 provides background information and related work on some of the most recent works in device fingerprinting. Section 3 discusses the experimental setup and outlines the PHY-based RF-DNA and CB-DNA device fingerprinting approaches. This is followed by device discrimination results in Sect. 4 and a summary and conclusions in Sect. 5.

Background
Device hardware fingerprinting is possible due to variations in manufacturing processes and device components. These variations inherently induce PHY feature differences that vary across devices [2]. Amplifiers, capacitors, inductors and oscillators also possess slight imperfections that influence device fingerprints [2][3][4][5]. The resultant variation can cause deviation in communication symbol rate, center frequency, and induce AM/FM/PM conversion [2]. Thus, it is possible to exploit device imperfections even when the intrinsic components used are supposedly identical [1,6].
As noted previously, the review in [1] focused primarily on PHY based fingerprinting techniques, with non-PHY based approaches prior to 2009 only briefly addressed. Non-PHY based fingerprinting techniques as in [7][8][9][10][11][12] are relevant and can be used to fingerprint devices by actively probing or passively monitoring network packet traffic. Fingerprinting is accomplished by exploiting clock-skew via round trip time and inter-arrival time estimation in the collected network traces. These non-PHY based approaches are noted here for completeness and a comparison of PHY based and non-PHY based approaches is the subject of subsequent research.
PHY based device fingerprinting works in [6,[15][16][17][18][19]21] generally rely on invariant non-data modulated Region of Interest (ROI) within the burst (turn-on transient, preamble, midamble, etc.) to extract fingerprint features. Additional works [1][2][3][4][5], utilize the data modulated burst response regions to extract their fingerprint features from device dependent modulation errors. Transient-based approaches are generally avoided given 1) the limited duration of the transient response, and 2) the transient response being influenced by environmental conditions that affect the communication channel and limit its usefulness [4]. As noted in Sect. 3.2, the CB approaches require a signal constellation for calculating error statistics and thus are only applicable for CB communication applications. This is not a constraint of the RF-DNA approach presented in Sect. 3.1 which has been successfully used for both communication applications [15,16,18,19,21], and non-communication applications such as discriminating between device components and operational states [6,17,22,23].
A new approach to CB-DNA was first introduced in [20] which included development of a 2D binary signal constellation for unintentional wired Ethernet emissions using features from two binary composite constellation point clusters. Nearest Neighbor (NN) and Multiple Discriminant Analysis, Maximum Likelihood (MDA/ML) classifiers were used to assess device discrimination for Cross-Model (manufacturer) Discrimination (CMD) with the MDA/ML classifier out performing NN. Results here extend this earlier work by 1) exploiting discriminating feature information in multiple conditional constellation point sub-clusters that form the binary composite clusters, and 2) assessing Like-Model (serial number) Discrimination (LMD) capability as required for the envisioned network device ID authentication process.

Experimental Methodology
This work varies from traditional PHY fingerprinting approaches in that it is fingerprinting wired network devices via the unintentional RF emissions given off by the Ethernet cable. The experimental methodology here was adopted from [24] and is summarized briefly for completeness. The emission collection setup included interconnecting two computers using 10BASE-T Ethernet signaling over a category 6 Ethernet cable. A LeCroy WavePro 760Zi-A oscilloscope operating at a sample frequency is f s =250M Samples/Sec (MSPS) and a high sensitivity Riscure 205HS near-field RF probe were used to collect the unintentional RF emissions. An in-line baseband filter with bandwidth of W BB =32M hz was used to limit the collection bandwidth. The Ethernet cable and RF probe were placed in a test fixture to maintain relative cable-to-probe orientation while the Ethernet cards were swapped in and out for collection.
As shown in Table 1 [20], a total of 16 network cards were tested, with four cards each from D-Link (DL), Intel (IN), StarTech (ST), and TRENDnET (TN). The last four MAC address digits show that some devices vary only by a single digit and are likely from same production run. Four unique LAN transformer markings are provided and used to analyze results. The LAN transformer is the last part that the signal goes through prior to reaching the RJ45 output jack [20].

RF-DNA Fingerprinting
The RF-DNA fingerprinting approach has been most widely used for intentional signal responses of wireless devices [15,16,18,19]. For this work, the RF-DNA approach adopts the technique introduced in [24] for collecting unintentional RF emissions from Ethernet cables and producing RF-DNA fingerprints on a burstby-burst basis. Useful RF-DNA has been historically extracted from invariant signal amble regions [6,15,16] and thus the 10BASE-T preamble response was targeted here for initial assessment. RF-DNA features can be extracted from various ROI responses, a few of which include Time Domain (TD) [16], Spectral Domain (SD) [18], Fourier Transform (FT) [18], and Gabor Transform (GT) [15]. Instantaneous amplitude {a(k)}, phase {φ(k)}, and frequency {f (k)} are TD sequences used for RF-DNA fingerprint generation using the preamble as the ROI; k denotes discrete time samples. Composite RF-DNA fingerprints are generated by 1) centering (mean removal) and normalizing {a(k)}, {φ(k)}, and {f (k)}, 2) dividing each TD sequence into N R equal length subregions as illustrated in Fig. 1, 3) calculating three statistical features of variance (σ 2 ), skewness (γ), and kurtosis (κ) for each TD sequence to form Regional Fingerprint F a,φ,f Ri as in (1) (2) [17]. Statistical features across entire ROI response are commonly included as well, hence the regional indexing in (2) to N R +1 total elements. The total number of RF-DNA features in (2) is a function of N R , TD responses, and statistics. Varying N R provides a means to investigate performance for various feature vector sizes. Fingerprints were generated over the preamble ROI using three TD responses ({a(k)}, {φ(k)}, {f (k)}), three statistics (σ 2 , γ, κ) per response, for N R =16, 31, 80 with (2) and produced RF-DNA fingerprints having N F eat =144, 279, and 720 total features, respectively.

CB-DNA Fingerprinting
As with RF-DNA fingerprinting approach, the majority of CB fingerprinting works utilize intentional RF emissions from wireless devices with unique features derived from modulation errors in the constellation space, i.e., differences (error) between received projected symbol points and ideal transmitted constellation points [1][2][3]5]. The CB-DNA approach adopted here differs from previous approaches by utilizing statistical features from unconditional and conditional projected symbol clusters (not modulation errors) in a 2D constellation space. The CB-DNA fingerprinting process used here was adopted from [20] and is summarized here for completeness. CB-DNA fingerprints were generated from a single burst with example constellations being illustrated in Fig. 2 for the four card manufacturers with blue circles and black squares clusters representing Binary 0 and Binary 1, respectively. This research expands on [20] by utilizing for the first time conditional subclusters illustrated in Fig. 3 for card manufacturer StarTech. The conditional subclusters are based not only on the current demodulated bit but the proceeding and succeeding bit as well. The eight distinct conditional sub-clusters correspond to the eight possible bit combinations that can precede and succeed the bit being estimated i.e., bit combinations of [0 X 0], [0 X 1], [1 X 0], and [1 X 1], where X denotes the bit being estimated. CB-DNA fingerprint generation begins by dividing constellation points into their respective unconditional and conditional cluster regions for a total of N CR =2+8=10. Statistical CB-DNA features are then calculated for each cluster region using the mean (µ), variance (σ 2 ), skewness (γ), and kurtosis (κ) along the Z − G and Z + G dimensions shown in Fig. 3. Joint statistics in both the Z − G and Z + G direction are also considered and include covariance (cov), coskewness (β 1×2 ), and cokurtosis (δ 1×3 ) which provide an extra six features per region. The resultant statistics form a Regional Cluster Fingerprint  (4) [20]. The total number of CB-DNA features in (4) is a function of N CR , statistics, and dimensions i.e., Z − G and Z + G . Varying N CR provides a means to investigate performance for various feature vector sizes. Fingerprints were generated using N CR =2, 8, and 10 (µ, σ 2 , γ, κ, cov, β 1×2 , δ 1×3 ) with 4 statistics from each of the Z − G and Z + G dimensions and 6 joint statistics producing CB-DNA fingerprints having N F eat =28, 112, and 140 total features, respectively.

Device Discrimination
The effect of varying SN R on discrimination performance was assessed to characterize the effect of varying channel conditions. This was done by adding in-dependent like-filtered Additive White Gaussian Noise (AWGN) N N z realizations to each experimentally collected emission to achieve the desired SN R for Monte Carlo simulation. Given an average collected SN R=30.0 dB, device discriminability was assessed for simulated SN R ∈ [12 32] dB in 2 dB steps. For Monte Carlo simulation results in Sect. 4, a total of N N z =6 independent AWGN realizations were generated, filtered, power-scaled and added to the collected signal responses to generated signals at the desired SN R. Given N N z =6 AWGN realizations and N S =1000 collected signal responses per card, a total of N F =N S × N N z =6000 independent fingerprints per card were available for discrimination assessment. Consistent with prior related work [6,15,16], device discriminability was assessed using a MDA/ML classification process. MDA/ML processing was implemented for N C =4 and 16 classes using an identical number of Training (N T ng ) and Testing (N T st ) fingerprints for each class. A total of N F =24, 000 (CMD) and N F =6, 000 (LMD) fingerprints were generated for each N C per Sect. 3.1 and Sect. 3.2 for RF-DNA and CB-DNA methods, respectively. MDA/ML training was completed for each N C using N T ng =N F /2 fingerprints and K-fold crossvalidation with K=5 to improve MDA/ML reliability. This involves: 1) dividing the training fingerprints into K equal size disjoint blocks of N T ng /5 fingerprints, 2) holding out one block and training on K-1 blocks to produce projection matrix W, and 3) using the holdout block and W for validation [25]. The W from the best training iteration is output and used for subsequent MDA/ML testing assessment. The process is repeated to generate an SN R-dependent W(SN R) for each analysis SN R.

Discrimination Results
The MDA/ML classification results are presented for CMD (manufacturer) and LMD (serial number) performance using the 16 devices in Table 1. Device fingerprint generation occurs using identical burst-by-burst emissions per methods in Sect. 3.1 and Sect. 3.2, with RF-DNA using only the burst preamble and CB-DNA using the entire burst to include preamble. A total of 1000 bursts are processed from each device with three AWGN realizations added to each burst to create 3000 fingerprints per device for classification. Discrimination results are based on two classification models created per Sect. 3.3. The CMD results are based on N T st =12, 000 testing fingerprints and LMD results are based on N T st =3, 000 fingerprints. An arbitrary performance benchmark of %C=90% correct classification is used for comparative assessment with summary analysis based on CI=95% binomial confidence intervals. Given the large number of independent trials for all results in Sect. 4, the resultant CI=95% confidence intervals are less than the vertical extent of data markers in Fig. 4 through Fig. 6 and therefore omitted for visual clarity. Fig. 4 shows average RF-DNA results for CMD and LMD. The %C=90% benchmark is achieved for CMD with all three N R values at SN R ≥21 dB, with N R =80 performance starting out with %C=92% at SN R=12 dB and the other N R =16 and N R =31 cases requiring an additional 6.0 dB and 10.0 dB gain in SNR, respectively, to achieve %C=92%. At SN R ≈26 dB the markers for N R =80 and N R =31 begin to overlap suggesting those two MDA/ML models yield statistically equivalent performance at SN R=26.0 dB and higher with N R =16 %C being slightly lower. The LMD results for RF-DNA in Fig. 4 never reach the %C=90% benchmark. However, the N R =80 case outperforms the others by approximately 15% and 20% at SN R=30 dB. Thus, the RF-DNA model for N R =80 was chosen for comparison with the CB-DNA model. Fig. 5 shows CMD and LMD results for CB-DNA fingerprinting while varying the use of composite clusters and sub-clusters. The CMD and LMD results using N CR = 2 are about 5% and 25% respectively worse in correct classification with respect to the N CR =10 cases. For CMD the N CR =10 model achieves 96% correct classification on average at 12 dB where the N CR =2 model peaks out at 94% at 32 db, which shows that the N CR =10 model is superior. The results for CMD with N CR =8 are similar to N CR =10. LMD results for N CR =8 are constantly a few percentage points lower than N CR =10 and requires an additional 4 dB gain to achieve %C=90 over N CR =10 at 22 dB. LMD increases the complexity of the classification and reaches an average of %C=90% across all 16 device for N CR =10 with average collected SN R=22.0 dB.
%C classification results for RF-DNA and CB-DNA Fingerprinting are provided in Fig. 6 for CMD and LMD. The CMD comparison shows that CB-DNA reaches %C=96% at SN R=12 dB while RF-DNA reaches %C=96% at SN R ≈ 16 dB (approximately 6.0 dB higher). The LMD comparison shows that CB-DNA consistently out performs RF-DNA by at least 24% at all SN R levels.
The results in Fig. 6 enable direct comparison of RF-DNA and CB-DNA Fingerprinting however, average %C performance hides individual class interactions. Thus, MDA/ML confusion matrix results for SN R=24.0 dB are introduced to highlight cross-class misclassification for CMD (Table 2) and LMD (Table 3); matrix rows represent input class and matrix columns represent called class.
The table entries are presented as %C CB-DNA / %C RF-DNA with bold entries denoting best or equivalent performance. The CMD confusion matrix in Table 2 is nearly symmetric with all misclassification occurring between DL and TN devices. This is attributable to DL and TN devices using identical LAN transformers as indicated in Table 1. The diagonal entries show that CMD performance, for CB-DNA is better than or equivalent to RF-DNA. The resultant CMD averages for CB-DNA (%C=98.9%) and RF-DNA (%C=98.21%) are pursuant with Fig. 6. Table 2. CMD confusion matrix for CB-DNA and RF-DNA Fingerprinting at SN R=24 dB and 12,000 trials per class. Entries presented as % CB-DNA / % RF-DNA with bold entries denoting best or equivalent performance. The LMD confusion matrix results in Table 3 summarize misclassification of the complete 16-by-16 confusion matrix. Results are presented as individual manufacturer confusion matrices with "Other" entries representing all misclassi-   Table 2, with 1) the IN and ST devices are never misclassified as another manufacturer, and 2) nearly 100% of the DL "Other" misclassifications being TN devices, and vice versa-this confusion is again attributed to DL and TN devices using identical LAN transformers as indicated in Table 1. Most notably in Table 3 are bold diagonal entries showing that CB-DNA outperformed RF-DNA performance for all devices.

Summary and Conclusions
A PHY augmentation to MAC-based authentication is addressed using PHYbased Distinct Native Attribute (DNA) features to form device fingerprints. Specifically, a previous Radio Frequency (RF-DNA) fingerprinting approach and new Constellation Based (CB-DNA) fingerprinting approach that exploits 2D constellation statistics are considered. The two methods are compared using fingerprints generated from the same set of unintentional 10BASE-T Ethernet cable emissions. Prior to this preliminary investigation it was hypothesized that CB-DNA would outperform RF-DNA. Considerable differences in the amount of burst information being exploited was the basis for this conjecture, i.e., RF-DNA fingerprinting only exploits a fraction of the Ethernet burst (64 preamble symbols) while CB-DNA exploits the entire Ethernets burst (average of 1,400 symbols here). When comparing RF-DNA results here to previous related work [15][16][17], it is noted that Cross-Model Discrimination (CMD) results are consistent but Like-Model Discrimination (LMD) results are poorer. One reason for this is more stringent signaling characteristics of the Ethernet standards as well as the devices here sharing similar LAN transformer markings.
As measured by average percentage of correct classification (%C), the final RF-DNA vs. CB-DNA outcome shows that CB-DNA outperforms RF-DNA for the 16 devices considered. For CMD there was only a marginal difference at SN R=24 dB with CB-DNA at %C=98.9% and RF-DNA at %C=98.21%. Of particular note for CMD is that 100% of the misclassification error occurred between DL and TN devices which use the same LAN transformer. For LMD there was considerable improvement at SN R=24 dB, with CB-DNA at %C=91.5% and RF-DNA at %C=59.9%. LMD is generally more challenging than CMD and results show that both approaches suffer when classifying LMD. However, CB-DNA performance remained above the 90% threshold and only suffered a 6.2% degradation in %C while RF-DNA dropped by more than 30%.
From a device authentication and network security perspective, LMD performance is most important. Results here show that CB-DNA outperformed RF-DNA by a considerable margin. LMD results at the collected SN R=30.0 dB include like model %C=94% for CB-DNA and only %C=69% for RF-DNA.
These CB-DNA results are encouraging and work continues to improve performance. This includes investigating alternatives such as the Generalized Relevance Learning Vector Quantized-Improved (GRLVQI) classifier which provides a direct indication of feature relevance on classifier decision [21,26]. Feature relevance enables dimensional reduction analysis, which in-turn reduces processing complexity and enhances real-world applicability. Furthermore, the use of CB-DNA for device verification and rogue detection and rejection remains under investigation as well.