Analysis of Privacy Policies to Enhance Informed Consent (Extended Version)

In this report, we present an approach to enhance informed consent for the processing of personal data. The approach relies on a privacy policy language used to express, compare and analyze privacy policies. We describe a tool that automatically reports the privacy risks associated with a given privacy policy in order to enhance data subjects' awareness and to allow them to make more informed choices. The risk analysis of privacy policies is illustrated with an IoT example.


Introduction
One of the most common argument to legitimize the collection of personal data is the fact that the persons concerned have provided their consent or have the possibility to object to the collection. Whether opt-out is considered as an acceptable form of consent (as in the recent California Consumer Privacy Act 1 ) or optin is required (as in the European General Data Protection Regulation -GDPR 2 ), a number of conditions have to be met to ensure that the collection respects the true will of the person concerned. In fact, one may argue that this is seldom the case. In practice, internet users generally have to consent on the fly, when they want to use a service, which leads them to accept mechanically the conditions of the provider. Therefore, their consent is generally not really informed because they do not read the privacy policies of the service providers. In addition, these policies are often vague and ambiguous. This situation, which is already critical, will become even worse with the advent of the internet of things ("IoT") which has the potential to extend to the "real world" the tracking already in place on the internet.
A way forward to address this issue is to allow users to define their own privacy policies, with the time needed to reflect on them, possibly even with the help of experts or pairs. These policies could then be applied automatically to decide upon the disclosure of their personal data and the precise conditions of such disclosures. The main benefit of this approach is to reduce the imbalance of powers between individuals and the organizations collecting their personal data (hereafter, respectively data subjects, or DSs, and data controllers, or DCs, following the GDPR terminology): each party can define her own policy and these policies can then be compared to decide whether a given DC is authorized to collect the personal data of a DS. In practice, DSs can obviously not foresee all possibilities when they define their initial policies and they should have the opportunity to update them when they face new types of DCs or new types of purposes for example. Nevertheless, their privacy policies should be able to cope with most situations and, as time passes, their coverage would become ever larger.
However, a language to define privacy policies must meet a number of requirements to be able to express the consent of the DSs to the processing of their personal data. For example, under the GDPR, valid consent must be freely given, specific, informed and unambiguous. Therefore, the language must be endowed with a formal semantics in order to avoid any ambiguity about the meaning of a privacy policy. However, the mere existence of a semantics does not imply that DSs properly understand the meaning of a policy and its potential consequences. One way to enhance the understanding of the DSs is to provide them information about the potential risks related to a privacy policy. This is in line with Recital 39 of the GDPR which stipulates that data subjects should be "made aware of the risks, rules, safeguards and rights in relation to the processing of personal data and how to exercise their rights in relation to such processing". This approach can enhance the awareness of the DSs and allow them to adjust their privacy policies in a better informed way.
A number of languages and frameworks have been proposed in the literature to express privacy policies. However, as discussed in Section 6, none of them meets all the above requirements, especially the strong conditions for valid consent laid down by the GDPR. In this report, we define a language, called PILOT, meeting these requirements and show its benefits to define precise privacy policies and to highlight the associated privacy risks. Even though PILOT is not restricted to the IoT, the design of the language takes into account the results of previous studies about the expectations and privacy preferences of IoT users [12].
We introduce the language in Section 2 and its abstract execution model in Section 3. Then we show in Section 4 how it can be used to help DSs defining their own privacy policies and understanding the associated privacy risks. Because the language relies on a well-defined execution model, it is possible to reason about privacy risks and to produce (and prove) automatically answers to questions raised by the DSs. In Section 5, we discuss the benefits of PILOT in the context of the GDPR and also the aspects of the GDPR that cannot be covered by a privacy policy language. In Section 6, we compare PILOT with existing privacy policy languages, and we conclude the report with avenues for further research in Section 7.

The Privacy Policy Language PILOT
In this section we introduce, PILOT, a privacy policy language meeting the objectives set forth in Section 1. The language is designed so that it can be used both by DCs (to define certain aspects of their privacy rules or general terms regarding data protection) and DSs (to express their consent). DCs can also keep DSs policies for later use for accountability purposes-i.e., to show that data has been treated in accordance with the choices of DSs.
DCs devices must declare their privacy policies before they collect personal data. We refer to these policies as DC policies. Likewise, when a DS device sends data to a DC device, the DS device must always include a policy defining the restrictions imposed by the DS on the use of her data by the DC. We refer to these policies as DS policies.
In what follows we formally define the language PILOT. We start with definitions of the most basic elements of PILOT (Section 2.1), which are later used to define the abstract syntax of the language (Section 2.2). This syntax is then illustrated with a working example (Section 2.3).

Basic definitions
Devices and Entities. We start with a set D of devices. Concretely, we consider devices that are able to store, process and communicate data. For example, a smartphone, a laptop, an access point, an autonomous car, etc.
Let E denote the set of entities such as Google or Alphabet and ≤ E the associated partial order-e.g., since Google belongs to Alphabet we have Google ≤ E Alphabet . Entities include DCs and DSs. Every device is associated with an entity. However, entities may have many devices associated with them. The function entity : D → E defines the entity associated with a given device.
Data Items, datatypes and values. Let I be a set of data items. Data items correspond to the pieces of information that devices communicate. Each data item has a datatype associated with it. Let T be a set of datatypes and ≤ T the associated partial order. We use function type : I → T to define the datatype of each data item. Examples of datatypes 3 are: age, address, city and clinical records. Since city is one of the elements that the datatype address may be composed of, we have city ≤ T address. We use V to the denote the set of all values of data items, V = ( t∈T V t ) where V t is the set of values for data items of type t. We use a special element ⊥ ∈ V to denote the undefined value. A data item may be undefined, for instance, if it has been deleted or it has not been collected. The device where a data item is created (its source) is called the owner device of the data item. We use a function owner: I → D to denote the owner device of a given data item.
Purposes. We denote by P the set of purposes and ≤ P the associated partial order. For instance, if newsletter is considered as a specific type of advertisement, then we have newsletter ≤ P advertisement .
Conditions. Privacy policies are contextual: they may depend on conditions on the information stored on the devices on which they are evaluated. For example, (1) "Only data from adults may be collected" or (2) "Only locations within the city of Lyon may be collected from my smartwatch" are examples of policy conditions. In order to express conditions we use a simple logical language. Let F denote a set of functions and terms t be defined as follows: t : #» t is a list of terms matching the arity of f . The syntax of the logical language is as follows: ϕ ::= t 1 * t 2 | ¬ϕ | ϕ 1 ∧ ϕ 2 | tt | ff where * is an arbitrary binary predicate, t 1 , t 2 are terms; tt and ff represent respectively true and false. For instance, age ≥ 18 and smartwatch_location = Lyon model conditions (1) and (2), respectively. We denote the set of wellformed conditions as C. In order to compare conditions, we use a relation, ⊢ : C × C. We write ϕ 1 ⊢ ϕ 2 to denote that ϕ 2 is stronger than ϕ 1 .

Abstract syntax of PILOT privacy policies
In this section we introduce the abstract syntax of PILOT privacy policies, or, simply, PILOT policies. We emphasize the fact that this abstract syntax is not the syntax used to communicate with DSs or DCs. This abstract syntax can be associated with a concrete syntax in a restricted form of natural language. We do not describe this mapping here due to space constraints, but we provide some illustrative examples in Section 2.3 and describe a user-friendly interface to define PILOT policies in Section 4.3. The goal of PILOT policies is to express the conditions under which data can be communicated. We consider two different types of data communications: data collection and transfers. Data collection corresponds to the collection by a DC of information directly from a DS. A transfer is the event of sending previously collected data to third parties.
Definition 1 (PILOT Privacy Policies Syntax). Given Purposes ∈ 2 P , retention_time ∈ N, condition ∈ C, entity ∈ E and datatype ∈ T , the syntax of PILOT policies is defined as follows: We use DUR, DCR, PP to denote the sets of data usage rules, data communication rules and PILOT privacy policies, respectively. The set of transfer rules is defined as the set of sets of data communication rules, TR ∈ 2 DCR . In what follows, we provide some intuition about this syntax and an example of application. Data Usage Rules. The purpose of these rules is to define the operations that may be performed on the data. Purposes is the set of allowed purposes and retention_time the deadline for erasing the data. As an example, consider the following data usage rule, This rule states that the data may be used only for the purpose of research and may be used until 26 /04 /2019 . Data Communication Rules. A data communication rule defines the conditions that must be met for the data to be collected by or communicated to an entity. The outer layer of data communication rules -i.e., the condition and entity -should be checked by the sender whereas the data usage rule is to be enforced by the receiver. The first element, condition , imposes constraints on the data item and the context (state of the DS device); entity indicates the entity allowed to receive the data; dur is a data usage rule stating how entity may use the data. For example, age > 18, AdsCom, dur 1 . states that data may be communicated to the entity AdsCom which may use it according to dur 1 (defined above). It also requires that the data item age is greater than 18. This data item may be the data item to be sent or part of the contextual information of the sender device.
Transfer Rules. These rules form a set of data communication rules specifying the entities to whom the data may be transferred. PILOT Privacy Policies. DSs and DCs use PILOT policies to describe how data may be used, collected and transferred. The first element, datatype, indicates the type of data the policy applies to; dcr defines the collection conditions and TR the transfer rules. In some cases, several PILOT policies are necessary to fully capture the privacy choices for a given datatype. For instance, a DS may allow only her employer to collect her data when she is at work but, when being in a museum, she may allow only the museum. In this example, the DS must define two policies, one for each location.

Example: Vehicle Tracking
In this section, we illustrate the syntax of PILOT with a concrete example that will be continued with the risk analysis in Section 4. The use of Automatic Number Plate Recognition (ANPR) [10] is becoming very popular for applications such as parking billing or pay-per-use roads. These systems consist of a set of cameras that automatically recognize plate numbers when vehicles cross the range covered by the cameras. Using this information, it is possible to determine how long a car has been in a parking place or how many times it has traveled on a highway, for example.
ANPR systems may collect large amounts of mobility data, which raises privacy concerns [11]. When data is collected for the purpose of billing, the consent of the customer is not needed since the legal ground for the data processing can be the performance of a contract. However, certain privacy regulations, such as the GDPR, require prior consent for the use of the data for other purposes, such as commercial offers or advertisement.
Consider a DC, Parket, which owns parking areas equipped with ANPR in France. Parket is interested in offering discounts to frequent customers. To this end, Parket uses the number plates recorded by the ANPR system to send commercial offers to a selection of customers. Additionally, Parket transfers some data to its sister company, ParketWW, that operates worldwide with the goal of providing better offers to their customers. Using data for these purposes requires explicit consent from DSs. The PILOT policy below precisely captures the way in which Parket wants to collect and use number plates for these purposes.
The condition (tt) in (1) means that Parket does not impose any condition on the number plates it collects or transfers to ParketWW. This policy can be mapped into the following natural language sentence: Parket may collect data of type number_plate and use it for commercial _offers purposes until 21 /03 /2019 . This data may be transferred to ParketWW which may use it for commercial _offers purposes until 26 /04 /2019 .
The parts of the policy in italic font correspond to the elements of PILOT's abstract syntax. These elements change based on the content of the policy. The remaining parts of the policy are common to all PILOT policies.
To obtain DSs consent, Parket uses a system which broadcasts the above PILOT policy to vehicles before they enter the ANPR area. The implementation of this broadcast process is outside the scope of this report; several solutions are presented in [8]. DSs are therefore informed about Parket's policy before data is collected. However, DSs may disagree about the processing of their data for these purposes. They Inria can express their own privacy policy in PILOT to define the conditions of their consent (or denial of consent).
Consider a DS, Alice, who often visits Parket parkings. Alice wants to benefit from the offers that Parket provides in her city (Lyon) but does not want her information to be transferred to third-parties. To this end, she uses the following PILOT policy: In practice, she would actually express this policy as follows: Parket may collect data of type number_plate if car_location is Lyon and use it for commercial _offers purposes until 21 /03 /2019 .
which is a natural language version of the above abstract syntax policy.
In contrast with Parket's policy, Alice's policy includes a condition using car_location , which is a data item containing the current location of Alice's car. In addition, the absence of transfer statement means that Alice does not allow Parket to transfer her data. It is easy to see that Alice's policy is more restrictive than Parket's policy. Thus, after Alice's device 4 receives Parket's policy, it can automatically send an answer to Parket indicating that Alice does not give her consent to the collection of her data in the conditions stated in Parket 's policy. In practice, Alice's policy can also be sent back so that Parket can possibly adjust her own policy to match Alice's requirements. Parket would then have the option to send a new DC policy consistent with Alice's policy and Alice would send her consent in return. The new policy sent by Parket can be computed as a join of Parket's original policy and Alice's policy (see Appendix C for an example of policy join which is proven to preserve the privacy preferences of the DS).
This example is continued in Section 4 which illustrates the use of PILOT to enhance Alice's awareness by providing her information about the risks related to her choices of privacy policy.

Abstract execution model
In this section, we describe the abstract execution model of PILOT. The purpose of this abstract model is twofold: it is useful to define a precise semantics of the language and therefore to avoid any ambiguity about the meaning of privacy policies; also, it is used by the verification tool described in the next section to highlight privacy risks. The definition of the full semantics of the language, which is presented in a companion paper [21], is beyond the scope of this report. In the following, we focus on the two main components of the abstract model: the system state (Section 3.1) and the events (Section 3.2).

System state
We first present an abstract model of a system composed of devices that communicate information and use PILOT policies to express the privacy requirements of DSs and DCs. Every device has a set of associated policies. A policy is associated with a device if it was defined in the device or the device received it. Additionally, DS devices have a set of data associated with them. These data may represent, for instance, the MAC address of the device or workouts recorded by the device. Finally, we keep track of the data collected by DC devices together with their corresponding PILOT policies. The system state is formally defined as follows.
Definition 2 (System state). The system state is a triple ν, π, ρ where: • ν : D × I ⇀ V is a mapping from the data items of a device to their corresponding value in that device.
• π : D → 2 D×PP is a function denoting the policy base of a device. The policy base contains the policies created by the owner of the device and the policies sent by other devices in order to state their collection requirements. A pair (d, p) means that PILOT policy p belongs to device d. We write π d to denote π(d).
• ρ : D → 2 D×I×PP returns a set of triples (s, i, p) indicating the data items and PILOT policies that a controller has received. If (d ′ , i, p) ∈ ρ(d), we say that device d has received or collected data item i from device d ′ and policy p describes how the data item must be used. We write ρ d to denote ρ(d).
In Definition 2, ν returns the local value of a data item in the specified device. However, not all devices have values for all data items. When the value of a data item in a device is undefined, ν returns ⊥. The policy base of a device d, π d , contains the PILOT policies that the device has received or that have been defined locally. If (d, p) ∈ π(d), the policy p corresponds to a policy that d has defined in the device itself. On the other hand, if (d, p) ∈ π(e) where d = e, p is a policy sent from device e. Policies stored in the policy base are used to compare the privacy policies of two devices before the data is communicated. The information that a device has received is recorded in ρ. Also, ρ contains the PILOT policy describing how data must be used. The difference between policies in π and ρ is that policies in π are used to determine whether data can be communicated, and policies in ρ are used to describe how a data item must be used by the receiver. Example 1. Fig. 1 shows a state composed of two devices: Alice's car, and Parket's ANPR system. The figure depicts the situation after Alice's car has entered the range covered by the ANPR camera and the collection of her data has already occurred.
The database in Alice's state (ν Alice ) contains a data item of type number plate plate Alice whose value is GD-042-PR. The policy base in Alice's device (π Alice ) contains two policies: (Alice, p Alice ) representing a policy that Alice defined, and (Parket , p Parket ) which represents a policy p Parket sent by Parket. We assume that p Alice and p Parket are the policies applying to data items of type number plate.
Parket's state contains the same components as Alice's state with, in addition, a set of received data (ρ Parket ). The latter contains the data item plate Alice collected from Alice and the PILOT policy p Parket that must be applied in order to handle the data. Note that p Parket was the PILOT policy originally defined by Parket. In order for Alice's privacy to be preserved, it must hold that p Parket is more restrictive than Alice's PILOT policy p Parket , which is denoted by p Parket ⊑ p Alice . 5 This condition can easily be enforced by comparing the policies before data is collected. The first element in (Alice, plate Alice , p Parket ) indicates that the data comes from Alice's device. Finally, Parket's policy base has one policy: its own policy p Parket , which was communicated to Alice for data collection.

System events
In this section we describe the set of events E in our abstract execution model. We focus on events that ensure that the exchange of data items is done according to the PILOT policies of DSs and DCs.
Events. The set of events E is composed by the following the events: request, send , transfer and use.
The events request, send and transfer model valid exchanges of policies and data among DCs and DSs. The event use models correct usage of the collected data by DCs. In what follow we explain each event in detail. 5 See Appendix A for the formal definition of ⊑. Figure 1: Example System State request(sndr , rcv , t , p) models request of data from DCs to DSs or other DCs. Thus, sndr is always a DC device, and rcv may be a DC or DS device. A request includes the type of the data that is being requested t and a PILOT policy p. As expected, the PILOT policy is required to refer to the datatype that is requested, i.e., p = (t , _, _). As a result of executing request, the pair (rcv , p) is added to π rcv . Thus, rcv is informed of the conditions under which sndr will use the requested data.
send (sndr , rcv , i) represents the collection by the DC rcv of a data item i from the DS sndr . In order for send to be executed, the device sndr must check that π sndr contains: i) an active policy defined by sndr , p sndr , indicating how sndr allows DCs to use her data, and ii) an active policy sent by rcv , p rcv , indicating how she plans to use the data. A policy is active if it applies to the data item to be sent, to rcv 's entity, the retention time has not yet been reached, and its condition holds. 6 Data can only be sent if p rcv is more restrictive than p sndr (i.e., p rcv ⊑ p sndr ), which must be checked by sndr . We record the data exchange in ρ rcv indicating: the sender, the data item and rcv 's PILOT policy, (sndr , i, p rcv ). We also update rcv 's database with the value of i in sndr 's state, ν(rcv , i) = ν(sndr , i).
transfer (sndr , rcv , i) is executed when a DC (sndr ) transfers a data item i to another DC (rcv ). First, sndr checks whether π sndr contains an active policy, from rcv , p rcv . Here we do not use a PILOT policy from sndr , instead we use the PILOT policy p sent along with the data-defined by the owner of i. Thus, sndr must check whether there exists an active transfer rule (tr) in the set of transfers rules of the PILOT policy p. As before, sndr must check that the policy sent by rcv is more restrictive than those originally sent by the owner of the data, i.e., p rcv ⊑ p tr where p tr is a policy with the active transfer tr in the place of the data communication rule and with the same set of transfers as p. Note that data items can be transferred more than once to the entities in the set of transfers as long as the retention time has not been reached. This is not an issue in terms of privacy as data items are constant values. In the resulting state, we update ρ rcv with the sender, the data item and rcv 's PILOT policy, (sndr , i, p rcv ). Note that, in this case, the owner of the data item is not sndr since transfers always correspond to exchanges of previously collected data, owner(i) = sndr . The database of rcv is updated with the current value of i in ν sndr .
use(dev , i, pur ) models the use of a data item i by a DC device dev for purpose pur . Usage conditions are specified in the data usage rule of the policy attached to the data item, denoted as p i , in the set of received data of dev , ρ dev . Thus, in order to execute use we require that: i) the purpose pur is allowed by p i , and ii) the retention time in p i has not elapsed.

Risk Analysis
As described in the introduction, an effective way to enhance informed consent is to raise user awareness about the risks related to personal data collection. Privacy risks may result from different sorts of misbehavior such as the use of data beyond the allowed purpose or the transfer of data to unauthorized third parties [16].
In order to assess the risks related to a given privacy policy, we need to rely on assumptions about potential risk sources, such as: • Entities e i that may have a strong interest to use data of type t for a given purpose pur .
• Entities e i that may have facilities and interest to transfer data of type t to other entities e j .
In practice, some of these assumptions may be generic and could be obtained from databases populated by pairs or NGOs based on history of misconducts by companies or business sectors. Others risk assumptions can be specific to the DS (e.g., if she fears that a friend may be tempted to transfer certain information to another person). Based on these assumptions, a DS who is wondering whether she should add a policy p to her current set of policies can ask questions such as: "if I add this policy p: • Is there a risk that my data of type t is used for purpose pur ?
• Is there a risk that, at some stage, entity e gets my data of type t? " In what follows, we first introduce our approach to answer the above questions (Section 4.1); then we illustrate it with the example introduced in Section 2.3 (Section 4.2) and we present a user-friendly interface to define and analyze privacy policies (Section 4.3).

Automatic Risk Analysis with SPIN
In order to automatically answer questions of the type described above, we use the verification tool SPIN [15]. SPIN belongs to the family of verification tools known as model-checkers. A model-checker takes as input a model of the system (i.e., an abstract description of the behavior of the system) and a set of properties (typically expressed in formal logic), and checks whether the model of the system satisfies the properties. In SPIN, the model is written in the modeling language PROMELA [15] and properties are encoded in Linear Temporal Logic (LTL) (e.g., [4]). We chose SPIN as it has successfully been used in a variety of contexts [20]. However, our methodology is not limited to SPIN and any other formal verification tool such as SMT solvers [5] or automated theorem provers [26] could be used instead.
Our approach consists in defining a PROMELA model for the PILOT events and privacy policies, and translating the risk analysis questions into LTL properties that can be automatically checked by SPIN. For example, the question "Is there a risk that Alice's data is used for the purpose of profiling by ParketWW?" is translated into the LTL property "ParketWW never uses Alice's data for profiling". Devices are modeled as processes that randomly try to execute events defined as set forth in Section 3.2.
In order to encode the misbehavior expressed in the assumptions, we add "illegal" events to the set of events that devices can execute. For instance, consider the assumption "use of data beyond the allowed purpose". To model this assumption, we introduce the event illegal _use, which behaves as use, but disregards the purpose of the DS policy for the data.
SPIN explores all possible sequences of executions of events (including misbehavior events) trying to find a sequence that violates the LTL property. If no sequence is found, the property cannot be violated, which means that the risk corresponding to the property cannot occur. If a sequence is found, the risk corresponding to the property can occur, and SPIN returns the sequence of events that leads to the violation. This sequence of events can be used to further clarify the cause of the violation. Inria

Case Study: Vehicle Tracking
We illustrate our risk analysis technique with the vehicle tracking example introduced in Section 2.3. We first define the PROMELA model and the assumptions on the entities involved in this example. The code of the complete model is available in [25].
Promela Model. We define a model involving the three entities identified in Section 2.3 with, in addition, the car insurance company CarInsure which is identified as a potential source of risk related to ParketWW , i.e., E = {Alice, Parket , ParketWW , CarInsure}. Each entity is associated with a single device: D = E and entity(x) = x for x ∈ {Alice, Parket , ParketWW , CarInsure}. We focus on one datatype T = {number_plate} with its set of values defined as V number_plate = {GD-042-PR}. We consider a data item plate Alice of type number_plate for which Alice is the owner. Finally, we consider a set of purposes P = {commercial _offers, profiling}.
Risk assumptions on entities. In this case study, we consider two risk assumptions: 1. ParketWW may transfer personal data to CarInsure disregarding the associated DS privacy policies.
2. CarInsure has strong interest in using personal data for profiling.
In practice, these assumptions, which are not specific to Alice, may be obtained automatically from databases populated by pairs or NGOs for example.
Set of events. The set of events that we consider is derived from the risk assumptions on entities. On the one hand, we model events that behave correctly, i.e., as described in Section 3.2. In order to model the worst case scenario in terms of risk analysis, we consider that: the DCs in this case study (i.e., Parket, ParketWW and CarInsure) can request data to any entity (including Alice), the DCs can collect Alice's data, and the DCs can transfer data among them. On the other hand, the risk assumptions above are modeled as two events: ParketWW may transfer data to CarInsure disregarding Alice's policy, and CarInsure may use Alice's data for profiling even if it is not allowed by Alice's policy. Let DC , DC ′ ∈ {Parket , ParketWW , CarInsure}, the following events may occur: • request(DC , Alice, number_plate, p) -A DC requests a number plate from Alice and p is the PILOT policy of the DC.
• request(DC , DC ′ , number_plate, p) -A DC requests data items of type number plate from another DC and p is the PILOT policy of the requester DC.
• send (Alice, DC , i) -Alice sends her item i to a DC.
• transfer (DC , DC ′ , i) -A DC transfers a previously received item i to another DC.
• illegal _transfer (ParketWW , CarInsure, i) -ParketWW transfers a previously received item i to CarInsure disregarding the associated PILOT policy defined by the owner of i.
• illegal _use(CarInsure, i, profiling) -CarInsure uses data item i for profiling disregarding the associated privacy policy defined by the owner of i.
Alice's policies. In order to illustrate the benefits of our risk analysis approach, we focus on the following two policies that Alice may consider.  Table 1 summarizes some of the results of the application of our SPIN risk analyzer on this example. The questions in the first column have been translated into LTL properties used by SPIN (see [25]). The output of SPIN appears in columns 2 to 5. The green boxes indicate that the output is in accordance with Alice's policy while red boxes correspond to violations of her policy. Columns 2 and 3 correspond to executions of the system involving correct events, considering respectively p_trans Alice and p_no_trans Alice as Alice's policy. As expected, all these executions respect Alice's policies.

Results of the Risk Analysis
Columns 3 and 4 consider executions involving illegal _transfer and illegal _use. These columns show the privacy risks taken by Alice based on the above risk assumptions. Rows 3 and 6 show respectively that CarInsure may get Alice's data and use it for profiling. In addition, the counterexamples generated by SPIN, which are not pictured in the table, show that this can happen only after ParketWW executes illegal _transfer .
From the results of this privacy risk analysis Alice may take a better informed decision about the policy to choose. In a nutshell, she has three options: 1. Disallow Parket to use her data for commercial offers, i.e., choose to add neither p_trans Alice nor p_no_trans Alice to her set of policies (Parket will use the data only for billing purposes, based on contract). Therefore, if Alice wants to receive commercial offers but does not want to take the risk of being profiled by an insurance company, she should take option two.

Usability
In order to show the usability of the approach, we have developed a web application to make it possible for users with no technical background to perform risk analysis as outlined in Section 4.2 for the ANPR system. The web application is available online at: http://pilot-risk-analysis.inrialpes.fr/. Fig. 2 shows the input forms of the web application. First, DSs have access to a user-friendly form to input PILOT policies. In the figure we show an example for the policy p_trans Alice . Then DSs can choose the appropriate risk assumptions from the list generated by the system. Finally, they can ask questions about the potential risks based on these assumptions. When clicking on "Verify!", the web application runs SPIN to verify the LTL property corresponding to the question. The text "Not Analyzed" in grey is updated with "Yes" or "No" depending on the result. The figure shows the results of the three first questions with p_trans Alice and no risk assumption chosen (first column in Table 1).
The web application is tailored to the ANPR case study we use throughout the report. The PROMELA model and the policies defined in Section 4.2 are implemented in the application. This prototype can be generalized in different directions, for example by allowing users to enter specific risk assumptions on third parties. The range of questions could also be extended to include questions such as "Can X use Y 's data for other purpose than pur ? The code of the web application is available at [25].

Benefits of PILOT for the implementation of the GDPR
In this Section, we sketch the benefits of the use of PILOT in the context of the GDPR. First, we believe that the adoption of a language like PILOT would contribute to reduce the imbalance of powers between DCs and DSs without introducing prohibitive costs or unacceptable constraints for DCs. In addition, it can be used as a basis for a more effective consent mechanism as advocated by the WP29 in his opinion on the IoT. 7 As demonstrated in [8], DSs consents expressed through PILOT privacy policies can be produced automatically by a privacy agent implementing the abstract execution model sketched in Section 3. This agent can interact with the DS only in the cases not foreseen by his privacy policy, for example a request from an unknown type of DC or for an unknown type of purpose. This makes it possible to reduce user fatigue while letting DSs in control of their choices.
In addition to the risk analysis described in the previous section, PILOT can be associated with verification tools to detect certain forms of non-compliance of DC privacy policies with respect to the GDPR. Examples of non-compliance include inconsistencies between the retention time and the purpose or between the purpose and the type of data. This would require the availability of a database of standard purposes and associated data types and retention times. In Europe, such a database could be provided, for 7 "In practice, today, it seems that sensor devices are usually designed neither to provide information by themselves nor to provide a valid mechanism for getting the individual's consent. Yet, new ways of obtaining the user's valid consent should be considered by IoT stakeholders, including by implementing consent mechanisms through the devices themselves. Specific examples, like privacy proxies and sticky policies, are mentioned later in this document." [29] Inria example, by Data Protection Authorities or by the European Data Protection Board. It is also possible to detect privacy policies involving sensitive data 8 for which a stronger form of consent is required [28].
Since PILOT is defined through a precise execution model, the enforcement of privacy policies can also be checked by a combination of means : • A priori (or "static") verification of global properties such as "No collection of data can take place by a DC if the DS has not previously received the required information from this DC" [8]. This particular property expresses a requirement of the GDPR regarding informed consent.
• On the fly (or "dynamic") verification of properties such as transfers to authorized third parties only.
• A posteriori verification of properties in the context of audits. This type of verification can be implemented as an analysis of the execution traces (or logs) of the DC to support accountability.
It should be clear however, that we do not claim that all requirements of the GDPR regarding information and consent can be checked or even expressed using a privacy policy language like PILOT. For example, the general rules about imbalance of power or detriment stated by the WP29 [28] are not prone to formal definition or automatic verification: they need to be assessed by human beings. Similarly, the fact that the DCs authorized to receive the data and the allowed purpose are specific enough is a matter of appreciation rather than formal proof, even though an automatic verifier could rely on predefined databases of standard purposes and datatypes. The only claim that we make here is that most of what can be encapsulated into a machine-readable language and checked by a computer is included in PILOT. We compare more precisely PILOT with previous work in the next section.

Related Work
Several languages or frameworks dedicated to privacy policies have been proposed. A pioneer project in this area was the "Platform for Privacy Preferences" (P3P) [24]. P3P makes it possible to express notions such as purpose, retention time and conditions. However, P3P is not really well suited to the IoT as it was conceived as a policy language for websites. Also, P3P does not offer support for defining data transfers. Other languages close to P3P have been proposed, such as the "Enterprise Policy Authorization Language" (EPAL) [2] and "An Accountability Policy Language" (A-PPL) [3]. The lack of a precise execution model for these languages may also give rise to ambiguities and variations in their implementations.
Recently, Gerl et al. proposed LPL [13] which is inspired by GDPR requirements. However, the lack of conditions and its centralized architecture makes LPL not suitable for IoT environments which are inherently distributed.
None of the above works include tools to help users understand the privacy risks associated with a given a policy, which is a major benefit of PILOT as discussed in Section 4. In the same spirit, Joyee De et al. [17] have proposed a methodology where DSs can visualize the privacy risks associated to their privacy settings. Here the authors use harm trees to determine the risks associated with privacy settings. The main difference with PILOT is that harm trees must be manually defined for a given application whereas we our analysis is fully automatic. 9 Another line of work is that of formal privacy languages. Languages such as S4P [7] and SIMPL [18] define unambiguously the behavior of the system-and, consequently, the meaning of the policies-by means of trace semantics. The goal of this formal semantics is to be able to prove global correctness properties such as "DCs always use DS data according to their policies". While this semantics is wellsuited for its intended purpose, it cannot be directly used to develop policy enforcement mechanisms. In contrast, we provide a PROMELA model in Section 4-capturing the execution model of PILOT (cf. Section 3)-that can be used as a reference to implement a system for the enforcement for PILOT policies. In addition, these languages, which were proposed before the adoption of the GDPR, were not conceived with its requirements in mind.
Other languages have been proposed to specify privacy regulations such as HIPAA, COPAA and GLBA. For instance, CI [6] is a dedicated linear temporal logic based on the notion of contextual integrity. CI has been used to model certain aspects of regulations such as HIPAA, COPPA and GLBA. Similarly, PrivacyAPI [20] is an extension of the access control matrix with operations such as notification and logging. The authors also use a PROMELA model of HIPAA to be able to verify the "correctness" and better understand the regulation. PrivacyLFP [9] uses first-order fixed point logic to increase the expressiveness of previous approaches. Using PrivacyLFP, the authors formalize HIPAA and GLBA with a higher degree of coverage than previous approaches. The main difference between PILOT and these languages is their focus. PILOT is focused on modeling DSs and DCs privacy policies and enhancing DSs awareness whereas these languages focus on modeling regulations.
Some access control languages such as XACML [1] and RBAC [27] have been used for the specification of privacy policies. Typically, policies include the datatype to which they apply, and a set of agents with privileges to perform certain actions-such as accessing the data. Some extensions such as GeoX-ACML [19] include conditions depending on geolocation information. However, none of these languages captures concepts such as retention time, purpose or transfers.
Usage control (UCON) [22,23] appeared as an extension of access control to express how the data may be used after being accessed. To this end, it introduces obligations, which are actions such as "do not transfer data item i". The Obligation Specification Language (OSL) [14] is an example of enforcement mechanism through digital right management systems. However, UCON does not offer any support to compare policies and does not differentiate between DSs and DCs policies, which is a critical feature in the context of privacy policies. For DSs to provide an informed consent, they should know whether DCs policies comply with their own policies. Some work has also been done on privacy risk analysis [16], in particular to address the needs of the GDPR regarding Privacy Impact Assessments. We should emphasize that the notion of risk analysis used in this report is different in the sense that it applies to potential risks related to privacy policies rather than systems or products. Hence, the risk assumptions considered here concern only the motivation, reputation and potential history of misbehavior of the parties (but not the vulnerabilities of the systems, which are out of reach and expertise of the data subjects).

Conclusion
In this report, we have presented the privacy policy language PILOT, and a novel approach to analyzing privacy policies which is focused on enhancing informed consent. An advantage of a language like PILOT is the possibility to use it as a basis to implement "personal data managers", to enforce privacy policies automatically, or "personal data auditors", to check a posteriori that a DC has complied with the DS policies associated with all the personal data that it has processed. Another orthogonal challenge in the context of the IoT is to ensure that DSs are always informed about the data collection taking place in their environment and can effectively communicate their consent (or objection) to the surrounding sensors. Different solutions to this problem have been proposed in [8] relying on PILOT as a privacy policy language used by DCs to communicate their policies and DSs to provide their consent. These communications can either take place directly or indirectly (through registers in which privacy policies can be stored). Inria The work described in this report can be extended in several directions. First, the risk analysis model used here is simple and could be enriched in different ways, for example by taking into account risks of inferences between different types of data. The evaluation of these risks could be based on past experience and research such as the study conducted by Privacy International. 10 The risk analysis could also involve the history of the DS (personal data already collected by DCs in the past). On the formal side, our objective is to use a formal theorem prover to prove global properties of the model. This formal framework could also be used to implement tools to verify that a given enforcement system complies with the PILOT policies.

A Policy subsumption
We formalize the notion of policy subsumption as a relation over PILOT policies. We start by defining subsumption of data usage and data communication rules, which is used to define PILOT policy subsumption.
Intuitively, a data usage rule is more restrictive than another if: i) the set of allowed purposes is smaller; and ii) the data can be used for a shorter period of time.
A data communication rule is more restrictive than another if: i) its conditions are stronger; ii) the entity is more specific, i.e., less entities are included; and iii) the usage rule is more restrictive. For instance, consider the data communication rules for Parket in the policies (1), denoted as dcr 1 , and (2), denoted as dcr 2 , in Section 2.3. They have the same data usage rules and entity. They only differ in the conditions. The condition car_location = Lyon is clearly stronger than tt (denoted as tt ⊢ car_location = Lyon), since we use tt to impose no conditions (i.e., the rule is always active) and car_location = Lyon to make the rule active when the collecting camera is placed in Lyon. Therefore, dcr 2 subsumes dcr 1 -i.e., dcr 2 DCR dcr 1 .
A PILOT policy is more restrictive than another if: i) the datatype is more specific; ii) the data communication rule is more restrictive; and iii) the set of allowed transfers is smaller.
As an example, consider the policies (1), denoted as p 1 , and (2), denoted as p 2 , in the example of Section 2.3. It is easy to see that p 2 subsumes p 1 , i.e., p 2 ⊑ p 1 , since they apply to the same datatype, the data communication rule in p 2 is more restrictive than that of p 1 and p 2 allows for no transfers whereas p 1 allows for transferring data to ParketWW.

B Active Policies and Transfer rules
Here we formally define when PILOT policies and transfer rules are active. Let eval(ν, d, ϕ) denote an evaluation function for conditions. It takes as an input a formula (ϕ) and it is parametrised by the valuation function (ν) and device (d). This function returns a boolean value, {true, false}, indicating whether a condition holds, or the undefined value, ⊥, when the information to evaluate the condition is missing in the local state of the device. eval(ν, d, ϕ) is defined as described in Table 2. We use a function time(e) : E → N to assign a timestamp-represented as a natural number N-to each event of a trace. . We useĉ,f and * to denote the interpretation of constants, functions and binary predicates, respectively. We assume that these interpretations are the same across all devices.
Active policy. PILOT policies, stored in a particular device, may be active depending on the state of the system and the data item to be sent. In order to determine whether a policy is active we use a boolean function activePolicy(p, send(sndr , rcv , i), st ) which returns true if policy p is active when item i is sent from device sndr to device rcv in state st . Formally, where p = (t , ϕ, e, _, rt , _) and st = ν, _, _ . Intuitively, given p = (t , ϕ, e, P, rt , TR), we check that: i) the type of the data to be sent corresponds to the type of data the policy is defined for (type(i) ≤ T t ); ii) the condition of the policy evaluates to true (eval(ν, sndr , ϕ)); iii) the retention time for the receiver has not expired (time(send (sndr , rcv , i)) < rt); and iv) the entity associated with the receiver device is allowed by the policy (entity(rcv) ≤ E e). Active transfer rule. Similarly, we use a boolean function activeTransfer( tr, p, transfer ( sndr , rcv , i), st ) which determines if transfer rule tr from policy p is active when item i is transferred from device sndr to device rcv in state st. In order for a transfer rule to be active, the above checks are performed on the transfer rule tr, and, additionally, it is required that the retention time for the sender has not elapsed (time(transfer ( sndr , rcv , i)) < rt).

C Policy join
We present a join operator for PILOT policies and prove that the resulting policy is more restrictive than the policies used to compute the join. We first define join operators for data usage rules and data communication rules, and use them to the join operator for PILOT policies. Let min(e, e ′ ) = e if e ≤ X e ′ e ′ otherwise be a function that, given two elements e, e ′ ∈ X returns the minimum in the corresponding partial order ≤ X . Let ⋓ denote the intersection keeping the minimum of comparable elements in the partial order of purposes. Formally, given P, P ′ ∈ P, P ⋓ P ′ (P ∩ P ′ ) ∪ P ′′ where P ′′ = {p ∈ P | ∃p ′ ∈ P ′ s.t. p < p ′ }.

C.1 Privacy Preserving Join
We say that an join operation is privacy preserving if the resulting policy is more restrictive than both operands. Formally, Definition 9 (Privacy Preserving Join). We say that ⊔ is privacy preserving iff ∀p, q ∈ PP · (p ⊔ q) ⊑ p ∧ (p ⊔ q) ⊑ q.
Intuitively, it means that it satisfies the preferences of both policies. If p and q correspond to the policies of DS and DC, respectively, the resulting PILOT policy is more restrictive than that of the DS and DC, thus: i) it is not allowed by the DC to use the DS data in any way not expressed in the DS policy; and ii) the DC will be able to enforce the policy-since it is more restrictive than the one she proposed, the policy will not contain anything that the DC have not foreseen.
In what follows we prove that the operation ⊔ is privacy preserving, Lemma 3. First, we prove two lemmas required for the proof of Lemma 3. Lemma 1. Given two data usage rules dur 1 , dur 2 ∈ DUR it holds that dur 1 ⊔ DUR dur 2 DUR dur 1 and dur 1 ⊔ DUR dur 2 DUR dur 2 . Proof. We split the proof into the two conjuncts of Lemma 1. dur 1 ⊔ DUR dur 2 DUR dur 1 -We split the proof into the elements of data usage rules, i.e., purposes and retention time.