Information Flow Tracking for Side-Effectful Libraries

. Dynamic information ﬂow control is a promising technique for ensuring conﬁdentiality and integrity of applications that manipulate sensitive information. While much progress has been made on increasingly powerful programming languages ranging from low-level machine languages to high-level languages for distributed systems, surprisingly little attention has been devoted to libraries and APIs. The state of the art is largely an all-or-nothing choice: either a shallow or deep library modeling approach. Seeking to break out of this restrictive choice, we formalize a general mechanism that tracks information ﬂow for a language that includes higher-order functions, structured data types and references. A key feature of our approach is the model heap , a part of the memory, where security information is kept to enable the interaction between the labeled program and the unlabeled library. We provide a proof-of-concept implementation and report on experiments with a ﬁle system library. The system has been proved correct using Coq.


Introduction
While useful, access control is not enough: it is crucial what applications do with the data after access has been granted [25].Information flow control tracks the propagation of data in programs, thus enforcing confidentiality and integrity policies.Due to the widespread use of highly dynamic languages, such as JavaScript, there has been a growing interest in dynamic information flow control.There are two basic kinds of flows to consider: explicit and implicit [5], related to the notions of data flow and control flow.Dynamic information flow is tracked at runtime by extending the data with security labels, which are propagated and checked against a security policy during execution.The detection of potential security violations cause program execution to halt.
While much progress has been made on increasingly powerful programming languages ranging from low-level machine languages to high-level languages for distributed systems, surprisingly little attention has been devoted to libraries and APIs 3 .The main challenge is when the library is not written in the language itself, and thus not compatible with the labeled semantics of the program.There are mainly two situations where this occurs: 1) when the library is part of the standard execution environment, and 2) when the library is brought into the language using some form of foreign function interface (FFI).In such cases, values passing between the program and the library must be translated.The process of translating values from one programming language to another is known as marshaling.
Marshaling of labeled values additionally entails that security labels must be removed from the values being passed from the program to the library, and reattached on the values returned from the library to the program.We refer to those steps as unlabeling and relabeling of the values, and the description of how it should be done as a library model.The main difference between standard marshaling and marshaling of labeled values is the latter removes information from the values passed to the library.To be able to correctly relabel values going from the library to the program, the labels removed during the unlabeling process must be used, since the returned value contains no security information.This means that the library models are inherently stateful -the removed labels are stored in a model state used when relabeling.
Library models can be split into two categories: deep and shallow models [14].Deep models track information flow inside the library, requiring precise modeling of the execution of the library, while shallow models are limited to the security labels on the boundary of the library.Often, deep models necessitate reimplementation of parts of the library functionality within the model, making them difficult to create and maintain.Shallow models, on the other hand, are significantly more lightweight, but possibly too imprecise.In this work, we are interested in the boundary between deep and shallow models.
Current state of the art in dynamic information-flow tracking does not fit this classification entirely, in part due to ad-hoc handling of libraries.To the extent addition of new libraries is supported, the models used tend towards shallow models.This is true for, e.g., FlowFox [13], and experimental extensions of JSFlow [15].On the other hand, JSFlow and FlowFox both use deep models to provide fine grained information-flow tracking for built in libraries.JSFlow, e.g., implements the full ECMA-262 version 5 standard using what is best considered a deep approach.
In recent work, Hedin et al. [17] initiate a framework for tracking information flow in libraries.The setting is a labeled program and an unlabeled library that share the same core semantics (split semantics) in order to limit the marshaling to security labels only.Their work targets a focused functional language with higher-order functions (which allows for both callbacks and promises to exist), and structured data in terms of lists.It does not, however, handle side effects, which means that many libraries cannot be modeled in a satisfactory way.As an example, it is unavoidable for a standard file system library to maintain state to keep track of open files, stream positions and buffers.The success of a function read(path, success, fail) is dependent both on the file path and the state of the library which must be reflected by security models for the library.
The combination of state and higher-order functions significantly complicates the library models and the model state over the ones used by Hedin et al.If the state is first-class (i.e., it can be sent around as values, as in languages with mutable references, records or objects) the situation is further complicated.This is the setting we are interested in handling, as it captures the essence of many of the problems found when modeling real libraries.To this end we introduce a model heap, allowing library values to be tied to a mutable model state, which allows for secure modeling of the interaction between firstclass state and higher-order functions.Consider the file system example, depicted in Figure 1.When the program calls the library function read, the library function is first lifted into the program using the corresponding function model defined by the library model, LModel.The lifting (illustrated by the dotted arrow in the figure) is done by means of wrapping and results in an unlabeled function that can be called by the program.When the wrapper is called with labeled arguments, a new call model state, CModel, is created and used to hold the labels of the arguments, since the underlying library function requires unlabeled values.As can be seen in the figure, the call model state is connected to the library model state and together they define the model state that the function model of read interacts with.Any other values, including higher-order functions and first-class state, defined in the library share the same library model state, which guarantees that they have the same view of the library state, even in the presence of mutability.
There are two main benefits of our work over ad-hoc modeling of libraries.First, it lowers the modeling effort significantly, and, second, given that the models properly describe the library, it guarantees noninterference.Both benefits stem from expressing the models in a simplified model language that controls the marshaling process, thus sidestepping the need to reimplement it repeatedly.
Considering the dimension of shallow and deep models, our work can be seen as exploring the boundary.Shallow models are expressed solely in terms of the boundary labels, while our work gains access to intermediate labels when models for lazy marshaling, higher-order functions and first-class state are triggered.In addition, it is relatively easy to extend our system to allow models to use the runtime values allowing for dependent models [17].Compared to fully deep models, our work is limited to the information passing between the program and the library at the point of passing.Thus, intermediate values and labels that do not participate in cross-boundary activity is without reach.While deep models in theory have access to more information and therefore have the potential to be more precise, it is unclear if the added precision is significant in practice, in particular in the light of the added implementation cost.

Contributions
The main contributions of this paper are: -We have created a language containing three cornerstones of library modeling: higher-order functions, first-class state, and structured values (the syntax and semantics are presented in Section 2 and Section 3, respectively, while Section 6 discusses correctness).-We have implemented a prototype and used it to explore the interaction between the different features of the language (examples that illustrate our mechanism are reported in Section 4).-We have conducted a case study on a file system library, inspired by the file system library in node.js[10], showing that our language is able to handle stateful libraries (the case study is reported in Section 5).-We have formalized the language and its correctness proof in Coq [19].
The scope of the prototype is to experimentally verify applicability of models, not to assess performance in a full-scale implementation.The prototype serves as a complement to the formal proof to create a system that is both correct and useful.The full version of the paper, along with the formalization in Coq and the proof-of-concept prototype can be found at [27].

Syntax
The language we present is a small functional language with split semantics and lazy marshaling.The syntax of the language is defined as follows, where n denotes numbers and x denotes identifiers.The syntax of the language is entirely standard apart from the x lib construction that lifts a library value to a program value, and upg e that gives the result of the expression a given label, ::" L | H.For simplicity, we identify sets with the meta variables ranging over them.Let X range over lists of X for any set X, where r s denotes the empty list and ¨denotes the cons operator.An application in the language is a triple pd p , d l , mq, where the first component is the labeled program, the second component is the unlabeled library and the third component is the library model.Throughout the rest of this paper, we use program when referring to the labeled part, and library when referring to the unlabeled part.The top-level definitions, d, allow for named definitions of functions and values d ::" fun f pxq " e | let x " e.The top-level model definitions, m, allow for named definitions of models and labels m ::" mod x :: γ | lbl x :: κ, where γ denotes relabel models and κ denotes label terms.The label terms, κ ::" | α | κ 1 \ κ 2 are terms that evaluate to labels in a given model state and consist of labels, , label variables, α, and the least upper bound of two label terms.The relabel models, γ, used to relabel library values, are defined as follows where ϕ denotes unlabel models, used to unlabel program values, and ζ denotes effect constraints defined below.All values are given a label by a label term, and the relabeling of structured values follows the structure of the value.To relabel a function, we must know how to unlabel the argument, how to relabel the result, and how the function interacts with the model state.To relabel a reference we must know how to unlabel the values written and how to relabel the values read.The unlabel models, ϕ, are defined as follows.
Unlabeling of values is performed by storing the label of the value in the corresponding label variable in the model state.As for relabeling, unlabeling of structured values follows the structure of the value.Unlabeling of functions and references introduces an abstract name, #α, used by library functions to tie any interaction to their model state in the effect constraints, ζ.
In the order of definition: a library function that reads a labeled reference defines how to unlabel the read value, a library function that writes to a labeled reference defines the security context in which the write occurs and how to relabel the value to be written, a library function that calls a labeled function defines the security context in which the call occurs, how to relabel the parameter and how to unlabel the result, and finally, a library function that modifies the library state defines the security context of the update and how the security model changes.

Semantics
We define the semantics step-wise in three parts.The first part defines the labeled values, and the execution environment.The second part defines the evaluation relation and how the function representations of the values are created and used in the semantics.Finally, the third part defines how values are marshaled between the program and the library.For space reasons, parts of the semantic definitions have been left out.We refer the reader to the full version of this paper [27] for the missing definitions.

Values
In order to differentiate between the labeled semantics and the unlabeled semantics, we use X to denote an entity in the labeled semantics corresponding to the entity X in the unlabeled semantics.We only give the labeled values.
The unlabeled values are defined analogously.The values in the language, v, are integers n, tuples, higher-order functions F , lists p Ĥ, T q, references p R, Ŵ q, and records Ô, where higher-order functions, lists, references and records are represented as (pairs of) functions in order to simplify the marshaling.
The labels, , form a two-point upper semi-lattice L Ď H, where L denotes low (public) and H denotes high (private).Let 1 \ 2 denote the least upper bound of 1 and 2 , and let v 2 " v 1\ 2 for v " v 1 .The execution environment is a triple pς, Γ, Σq of the security context, ς, the stack, and the heap.The security context ς ranges over labels .The stack Γ is a triple of stacks pρ, ρ, : ρq, containing pointers to the labeled frames, the unlabeled frames and the model frames, respectively.The heap Σ is a triple of heaps, pσ, σ, : σq, consisting of the labeled heap, the unlabeled heap and the model heap.The labeled and unlabeled heaps can contain values (for implementing references), and frames, whereas the model heap only contains frames.The labeled and unlabeled frames, ω and ω, are maps from identifiers to values, and the model frames, : ω are maps from identifiers to model items.Each frame represents a scope, and together with the corresponding stacks they form scope chains.The model items, : ι ::" | γ | ζ, consists of labels, relabel models and effect constraints.

Evaluation relations
The evaluation relation for program execution is of the form ς, Γ |ù pΣ 1 , eq Ñ pΣ 2 , vq, read "expression e evaluates in the environment consisting of the security context, ς, the stack, Γ , and the heap, Σ 1 , resulting in the updated heap Σ 2 and value v".Similarly, library execution is of the form ς, Γ |ù pΣ 1 , eq ù pΣ 2 , vq, where the unlabeled semantics is parameterized over the security context to model that the context is global and always available to the marshaling functions 4 .
Figure 2 contains a selection of the semantic rules of the program semantics related to the marshaling of values.
The rules of the core language are standard.Whenever an integer is created (int), it is always originally labeled L. Variables are retrieved from the labeled heap using lookupL in var.If-statements (if-true and if-false) evaluate the conditional expression and based on the result select which branch to take.The branch taken is evaluated in a security context of ς \ and the returned value is raised to , where is the label of the result of the conditional expression.
Function closures are represented as functions, F : pς, Γ, Σ 1 , vq Ñ pΣ 2 , vq, created by lclos (fun) in the following way.lclospρ 1 , x, eq " λpς, pρ, ρ, : ρq, pσ 1 , σ 1 , : σ 1 q, v1 q .pΣ, v2 q where σ2 " σ1 rρ Þ Ñ tx Þ Ñ vς 1 us, ρ fresh and ς, pρ ¨ρ 1 , ρ, : ρq |ù ppσ 2 , σ 1 , : σ 1 q, eq Ñ pΣ, v2 q The function closure will, when interacted with, create a new pointer to a labeled frame containing the mapping of the parameter name x and the actual value v1 , which is raised to the current security context.The function expression e is then evaluated, using the newly created pointer along with the updated heap.When applying a function closure (app), the body of the function is executed in the program semantics, under the elevated context consisting of the current security context raised to the label of the function closure.Creation and application of library closures, F : pς, Γ, Σ 1 , vq Ñ pΣ 2 , vq, is analogous.Safe implementation of marshaling of references requires the ability to trap and modify reads and writes in order to marshal the values passed by the interaction.For this reason, references are represented as pairs of functions, one Fig. 2: Selected labeled semantics function for reading the reference, R : pς, Γ, Σ 1 q Ñ pΣ 2 , vq, and one function for updating the reference, Ŵ : pς, Γ, Σ 1 , vq Ñ Σ 2 .This allows us to marshal references by wrapping the read and the write functions in functions that perform the marshaling of the values at the time of interaction, similar to lazy marshaling of lists [17].Most languages do not support the creation of functions that are triggered on interaction with values such as references or objects, which means they cannot support marshaling of first-class mutable state.A notable exception to this is JavaScript that allows methods to be tied to different aspects of object interaction via the use of Proxy objects [22].
Creation of references given a fresh pointer into the labeled heap is defined by lread and lwrite as follows.
Note that the value that the reference is referring to may be labeled differently, due to the distinction between reference as a value and the value the reference is referring to.Dereferencing (deref) uses the read function of the reference to get the value to be read, while assignment (assign) uses the write function.Creation and use of library references, R : pς, Γ, Σ 1 q Ñ pΣ 2 , vq and W : pς, Γ, Σ 1 , vq Ñ Σ 2 is analogous.
It is worthwhile to point out the no-sensitive upgrade (NSU) check in lwrite, which demands that the context, which the label of the reference is a part of, is lower or equal to the label of the referenced value, ς Ď .Allowing labels of values to change freely leads to an unsound system, due to the possibility of implicit flows into the labels themselves [1,28].
Disregarding the encoding of functions and references into functions, up to this point, the labeled and unlabeled semantics are equivalent to their standard formulations.The essence of this paper is in the marshaling of values between the program and the library, performed by the unlabeling and relabeling functions, defined in the following section.

Marshaling
All interaction between the program and the library is initiated by lifting named library values into the program.This is done (lib) by looking up the value, and the corresponding relabel model used to relabel the value.Interaction with the relabeled value may cause further marshaling.Unlabeling of a value is done w.r.t. an unlabel model, ϕ, which defines how to store the removed label(s) in the model state.Relabeling of a value is done w.r.t. a relabel model, γ, which defines how to compute the label in terms of the model state.Formally, unlabeling is a function of the form v Ó ς,Γ,Σ1 ϕ " pΣ 2 , vq taking a labeled value v, an environment, ς, Γ, Σ 1 and an unlabel model ϕ and returning an updated heap, Σ 2 , and an unlabeled value v. Similarly, relabeling is a function of the form v Ò Γ,Σ γ " v, taking an unlabeled value, v, an environment, Γ, Σ, and a relabel model, γ, and returning a labeled value v.The only modified part of the heap for both unlabeling and relabeling is the model heap.
There are six types of values: integers, tuples, lists, records, higher-order functions and references.In the rest of this section we describe how to evaluate label terms (used when relabeling) and how to marshal higher-order functions and references.We refer the reader to the full version of this paper [27] for the treatment of the other constructs.
Label terms Evaluation of label terms is done w.r.t. a model state, where lookupM is used to traverse the model scope chain to find the first label corresponding to a given label variable.
Higher-Order Functions Marshaling of higher-order functions involves both marshaling the functions as values as well as ensuring the parameter and return value are properly marshaled.
Unlabeling Unlabeling a program closure removes and stores the label and returns a library closure created by wrapping the program closure.The library closure is tied to the abstract name, π, used by the wrapper to relabel the parameters before the call and unlabel the result after the call.
where κ $ γ Ñ ϕ " lookupMpΓ, Σ 1 , πq The translation of a program closure, F , into an library closure is performed by u-lclos, that takes the program closure, the label of the program closure and the abstract name.When the library closure returned by u-lclos is applied the following occurs.First, the function call model bound to the abstract name is fetched using lookupM.The function call model contains a label term representing the security context of the application, how to relabel the parameter and how to unlabel the return value.Second, the relabel model, γ, is used to relabel the parameter, v 1 .Third, the program closure is called in the security context of the call raised to the label of the closure and the evaluation of the context label term, κ.The result of the call is a labeled value, v2 .Finally, v2 is unlabeled which gives the result, v 2 , of the application of the unlabeled closure.Notice that all relabeling and unlabeling is done with respect to the model state of the caller.
Relabeling Relabeling a library closure is done by labeling the program closure created by wrapping the library closure.The wrapper unlabels the arguments before the call and relabels the result of the call.
l-uclospF, : ρ 2 , pϕ Ñ γ, ζqq " λpς, pρ, ρ, : ρ 1 q, pσ, σ, : σq, v1 q. pΣ 4 , v2 q where Σ 1 " pσ, σ, : σr: ρ Þ Ñ Hsq, : ρ fresh pΣ 2 , v 1 q " v1 Ó ς,p ρ,ρ, : The translation of the library closure, F , into a program closure is performed by l-uclos, which takes the library closure, the current model frame stack, the unlabel model for the parameters, ϕ, the relabel model for the return value, γ, and the effect constraints, ζ.When called the program closure produces a fresh frame pointer, pointing to a new model frame in the model heap.The parameter to the library function is unlabeled based on the unlabel model, ϕ, and the effect constraints, ζ, are evaluated to update the model state accordingly.After that, the library function is called with the unlabeled parameter in the security context, ς, of the call.The result of the function call is relabeled with the relabel model, γ, and returned to the program.Note that all labeling and unlabeling is done w.r.t. the model frame stack of the unlabeled closure.Also note that the order is important; if the unlabeling of the parameter occurs after evaluating the effect constraints, the label of the parameter cannot be used when updating the model state with the side effects.
Effect constraints Effect constraints define how a library function interacts with unlabeled program functions and references and how the library function changes the model state.state changes are effectuated on call to the library function whereas effect constraints that define interaction with unlabeled program functions and references are stored in the model state.When a library function or reference is interacted with, the abstract name will tie the interaction to the corresponding effect constraint in the model state of the interaction.The meaning of the effect constraints is defined as follows where defineM binds the name α to its corresponding model value in the top model frame, if α is not defined in that model frame, updateM updates the label pointed to by α in the scope chain, or inserts it if it is not present, and lookupM returns the model value that is the first to match the name α in the scope chain.
References Marshaling of references shares some similarities with marshaling of higher-order functions.Calling a function passes the argument and the return value in opposite directions, similar to reading and writing to a reference.
Unlabeling Unlabeling a program reference removes and stores the label, and the read and write functions are wrapped to create library counterparts.
u-lreadp R, , #πq " λpς, Γ, Σ 1 q .pΣ 3 , vq where ϕ " lookupMpΓ, Σ 1 , πq pΣ 2 , vq " Rpς \ , Γ, Σ 1 q pΣ 3 , vq " v Ó ς\ ,Γ,Σ2 ϕ The program read function, R is translated by u-lread, which takes the read function, the label of the reference and the abstract name.When the resulting library read function is interacted with, the program read function is used to get the labeled value of the reference.This value must be unlabeled before being returned, which is done by looking up a program reference read model, ϕ, in the model state of the interaction.It is the model of the caller, i.e., a library function model that provides the read model for the references it reads.
The program write function, Ŵ is translated by u-lwrite, which takes the write function, the label of the reference and the abstract name.When the resulting library write function is used, the associated program reference write model, κ $ γ, is fetched in the current model state.This model defines both how to relabel the written unlabeled value, and the context in which the write occurs.Then the unlabeled value, v is relabeled before being written using the labeled write function in a context consisting of the current security context of the call raised to the reference label and the evaluation of the context label term, κ.
Relabeling Relabeling a library reference is done by translating the read and write functions into program counterparts and relabeling the result.pR, W q Ò Σ,p ρ,ρ, : ρq ref pϕ, γq κ " pl-ureadpR, : ρ, γq, l-uwritepW, : ρ, γ, ϕqq The read and the write functions are translated independently w.r.t. the relabel model, ref pϕ, γq κ .l-ureadpR, : ρ 2 , γq " λpς, pρ, ρ, : ρ 1 q, Σ 1 q .pΣ 2 , vq where pΣ 2 , vq " Rpς, pρ, ρ, : The library read function, R, is translated by l-uread, which takes the read function, the current model frame stack, and the relabel model, γ.When the resulting program read function is interacted with, the unlabeled read function is used to fetch the unlabeled value of the reference.The result is relabeled using the relabel model in the model state of the reference and the result is returned.l-uwritepW, : ρ 2 , γ, ϕq " λpς, pρ, ρ, : ρ 1 q, Σ 1 , vq .Σ 3 where " rrlbltermpγqss p ρ,ρ, : The library write function W is translated by l-uwrite, which takes the write function, the current model frame stack, the relabel model, γ, and the unlabel model, ϕ.The reason l-uwrite takes the relabel model in addition to the unlabel model is that it is used to calculate the label against which the NSU check is made.The label of the stored value is represented by the label term of the relabel model, extracted by the lblterm function, defined in the obvious way by pattern matching.If the write is allowed, the labeled value to be written to the library reference is raised to the context ς, before being unlabeled using the unlabel model, ϕ.Finally, the unlabeled value is written to the library reference, using the unlabeled write function.
Interaction with the model heap To see how higher-order functions and references interact with the model heap, consider the code snippet below to the right.The program calls the library function f, which takes a parameter, and creates a reference r initially set to the value of the parameter.f returns a pair, where the first element is a function that, given any argument, will dereference the reference and the second element is the actual reference.This pair is stored as (g, r).Thereafter, r is assigned the value 15 H , before g is called with the parameter 20 L .
The following occurs w.r.t.relabeling and unlabeling in the program, where the initial setting can be seen in Figure 3. Fig. 3: Initial structure When f is lifted to the program, l-uclos is used to relabel the library closure, which will copy the model frame stack to the wrapped f and store the function model x Ñ py Ñ l, rq.In the example, the resulting program closure is applied to 10 L , which causes a new model frame to be allocated on the model heap, into which the argument is unlabeled, causing L to be stored in the new model frame as the label for x, and the pointer to the new model frame is stored in the model frame stack.After this, the actual unlabeled function is called, which results in the returned pair being relabeled.The relabeling of the pair results in l-uclos being used to relabel \y.!r with the model y Ñ l, and l-uread and l-uwrite being used to relabel r with the reference model r.The key here is that the relabeling occurs in the same model state, which means that the produced program function and reference will be bound to the same model frame stack.This causes writes to the reference to modify the model frame shared with the function, ensuring that they have the same view of the model of the reference.The entire process is highlighted in Figure 4.When the program writes to the reference (r := upg 15 H), the closure from l-uwrite is triggered, causing l in the shared model frame to be updated to H, which can be seen in Figure 5.Note that the pointer to the model frame created from the call to the wrapped f is removed from the model frame stack.This ensures any subsequent calls to the wrapped f, as well as any created wrappers will not be able to use that model frame, as it belongs only to the first call to  Fig. 6: Calling g the wrapped f and the created wrappers within the call.When the function g is called, it will trigger its l-uclos wrapper and, as can be seen in Figure 6, the model y Ñ l is used in the l-uclos wrapper for g, with l being used to relabel the result.Since l was modified by the writing to the reference (Figure 5), the shared view of the library model state, will make the function g return a secret value.

Examples
In the following section we provide some examples to highlight how the language would interact with common programming techniques.The language used in this section is an extended version of the language of the paper.The major differences are the addition of records, functions with multiple arguments, a limited form of pattern matching, and optional unlabeling.The extensions are all present as experimental features in the implementation.In all examples, the code above %% is the program and the code below is the library.
Writebacks Returning two or more results from a function can be done in two ways: 1) tupling the result, or 2) by using writebacks.When using writebacks for, e.g., reading a file, the read function is provided a pointer to a place in the memory where the contents of the file should be stored instead of returning a pointer to the data.In our language, writebacks can be modeled by passing program references to the library as shown to the right.In the example, the program variable buf is a program reference.The reference is passed to the library function action that writes the result to the buffer.When interacting with a program reference, the reference is given an abstract name (b for buffer in this case) that the function interacting with the buffer uses to relabel the interaction.
In case the function used the writeback under secret control, represented by the model mod action :: #b -> L {| H |-#b <-H |}, the example would fail due to NSU.The reason being the value the reference buf is pointing to is public, and is not allowed to change label under secret control.Modifying the declaration of buf to be let buf = ref (upg 0 H) solves this, as the reference will point to a secret value.from JavaScript RegExp [23].
The example to the right shows how state can be used to store error information.In the example, the function action may fail depending on the value of the parameter.The reason it failed is stored into the library reference errno, which is modeled by a security label used to relabel program reads and writes of the reference.Since the update of errno is conditional, it means that the value of errno is dependent of the argument of the action function.To model this, the argument label is stored in the model variable a, which is used to update the security label of errno.Note that the update of the security label is independent on whether the operation fails or not.This is needed to ensure that the label of errno is independent of secrets.The label of errno indicates that the error code is public.Consider the case where an action sets the error code under secret control, represented by the following model mod action :: a -> L {| H |-l <-a |}.If such an action was used our system would halt execution, since the update of the error code would trigger NSU.The one-place buffer In the previous example, the library state is exposed to the program, which can freely read and write to errno.Frequently it is good practice to hide the internal state of the library and only allow the program to access it indirectly via the functions of the library.We exemplify this by implementing a simple one-place buffer, seen to the right.While simple, the example captures the essence of, e.g., buffered file access.
Since there is no model for buf, it is not accessible from the program.Instead, the state of the library is modeled using the label l.This label is used by the operations that give the program access to the buffer contents.When setting the mod rmdirSync :: a -> l + a {| l <-a |} mod rmdir :: (a , # cb ) -> L {| l <-a , # cb ( l + a ) -> b |} We use the name a to represent the path and the abstract name cb to represent the callback.From a modeling standpoint, we need to ensure that the level of the path is propagated to the state, since removing the folder influences the file system state.We can see this in the effect constraint l <-a, where the label of the path is propagated to the label of the state.The success of the operation is depending on the library state and the security label of the path, l + a.Where rmdirSync returns the result, rmdir communicates the result to the callback as an argument, #cb (l + a).The immediate return value of the latter is undefined, regardless of the outcome of the operation and hence labeled L.
A more complex function in the API is createWriteStream that returns a record.Calling createWriteStream with a path and an optional argument that defines options (e.g. the encoding) returns a WriteStream.The WriteStream has four parts; the fields path and bytesWritten, as well as the events open and close.For the model of the returned record, the property path is modeled by the argument a, which is the label of the path.The property bytesWritten, which corresponds to the amount of bytes written so far, is modeled as the least upper bound of a, b and l, i.e., the path, the options and the current library state.The events are modeled as functions that accept (and store) callbacks -the event handler -as modeled by the properties open and close.When the stream is opened or closed, the path, the options and the current library state all influence the parameter to those callbacks.
To contrast the case study with the examples, note that Section 4 makes the assumption that the source code of the library is available (albeit not supporting the labeled semantics) whereas this section makes the assumption it is not.Both cases are common, and can be modeled in our approach.In case the source code is indeed available an interesting line of future work is to look at the possibilities of automatically deducing models, e.g., using something similar to summary functions [26].

Correctness
The correctness of the language is complicated by the fact that it is parameterized over a library model that defines how to marshal values between the program and the library.Since we make no assumption on the implementation language of the library or the availability of the source code we cannot reason about the correctness of the model w.r.t. the library.Instead we assume the correctness of library models in terms of three hypotheses used in the noninterference proof.The low-equivalence definition, the model hypotheses and more information on the proof can be found in the full version of the paper [27].
We prove noninterference assuming that the library model correctly models the library as the preservation of a low-equivalence relation under execution.Apart from covering a larger language, the proof improves over [17] in two important aspects: 1) it significantly weakens the model hypothesis, and 2) the proof has been formalized in Coq [19].

Related work
Bielova and Rezk present a comprehensive taxonomy of information flow monitors [4].Some monitors [16,15,14,3] and secure multi-execution [13,12,24,6,20] mechanisms have been integrated in a browser.Bichhawat et al. instrumented the WebKit JavaScript interpreter [3].While taking advantage of the current optimizations in the interpreter, it loses the differentiation between the program and library execution.FlowFox [13], which implements secure multi-execution (SME) [6], modifies the SpiderMonkey engine in two ways: 1) augmenting the internal objects representing the JavaScript context with a current execution level, as well as a boolean indicating if SME is active, and 2) augmenting the internal representation of JavaScript values with a security level.Unfortunately, API calls are only treated as I/O actions.JSFlow [16] is an information-flow aware JavaScript interpreter, augmented with security labels on the JavaScript values.In order to allow for libraries in JSFlow, deep hand-written models must be used, with reimplementation of the libraries as a result [15].To allow for scaling, JSFlow attempts to automatically wrap libraries, albeit in an ad-hoc manner.While the correctness of simple examples are easy to see, the correctness and scalability when passing, e.g., functions to and from the library remain unclear.Bauer et al. [2] developed a light-weight coarse-grained run-time monitor for Chromium, using taint tracking, to help reasoning about information flow in a fully fledged browser.In this work, formal models of, e.g., cookies, history and the document object model (DOM) are defined, as well as event handlers, to model the browser internals and help prove noninterference.Heule et al. [18] provided a theoretical foundation for a language-based approach for coarse-grained dynamic information flow control, that can be applied to any programming language where external effects can be controlled.A first step for handling libraries in environments where dynamic information flow control is not possible was taken by Hedin et al. [17], falling short by not supporting references, and thereby not allowing for first-class mutable state in combination with higher-order functions.Findler and Feleisen's higher-order contracts [9] address the problem of checking contracts at the boundary between statically type-checked and dynamically type-checked code.The problem relates to the problem of interfacing with libraries where it is impossible to check dynamic information flow control.In particular, when considering function values crossing the boundary, the compliance of such function values with their respective contracts is undecidable.Findler and Feleisen proposed to wrap the function and check the contract at the point where the function is called.This is comparable to how we handle structured data, including references and function values.A question for future work is if we can remove our abstract identifiers for function values and references, and instead inject the unlabeling/relabeling functionality using proxies, similar to how it is done in higher-order contract checking [8].If a contract is violated, the proper assignment of blame must be given [7,11].In static information flow checking, the assignment of blame has been investigated by King et al. for information flow violations [21].Although our work can be seen as an application of dynamic higher-order contract checking for information flow contracts, we do not consider assigning blame.Indeed, runtime detection of a library which does not obey the specified contract (i.e. the given model) is not possible in this work.

Conclusion
Based on a central idea of a model heap, we have developed a foundation for information flow tracking in the presence of libraries with side effects in a language with higher-order functions, first-class state and lazy-marshaling -three cornerstones of practical libraries.We have implemented a prototype to verify the examples and performed a larger case study that shows that the language is able to model key parts of a real file system library.In addition, we have formalized the language and its correctness proof in Coq.
Future work includes support for model abstraction and application, and dependent models.Thanks to the three cornerstones, we believe modeling JavaScript objects does not require development of new theory, indicating that it is possible to use this technique in tools like JSFlow.