Diet Modelling: Combining Mathematical Programming Models with Data-Driven Methods

. Mathematical programming has been the principal workhorse behind most diet models since the 1940s. As a predominantly hypothesis-driven modelling paradigm, its structure is mostly deﬁned by a priori information, i


Introduction
Current societies are confronted with major challenges.The confluence of population, economic development, and environmental pressures resulting from globalization and industrialization reveals an increasingly resource-constrained world in which predictions point to the need to do more with less and in a more efficient way [1].
Although global food production of calories has kept pace with population growth, more than 820 million people have insufficient food and many more consume low-quality diets that cause micronutrient deficiencies and contribute to a substantial rise in the incidence of diet-related obesity and diet-related noncommunicable diseases [2], such as cardiovascular disease, type 2 diabetes, and various types of cancer.As a matter of fact, most NCDs have a root cause in an unhealthy diet [3,4]; a diet that does not fulfill energy and nutrient requirements for healthy growing and aging [5].
The concept of sustainable diets presents an opportunity to successfully advance commitments to sustainable development and the elimination of poverty, food and nutrition insecurity, and poor health outcomes [1].Sustainable diets can be defined as those with low environmental impacts that contribute to food and nutrition security and to healthy life for present and future generations [8].
Designing and promoting sustainable diets is a complex task.Mathematical and computational models that can capture the complexity of the problem, and devise a sustainable nutritional strategy are needed [9].Moreover, such models can help to set priorities for interventions/policy measure, which might result in a more sustainable consumption pattern that would act as a driver of sustainable production, since current diet and production patterns are among the most important drivers of environmental pressure [10].
Capturing diet modelling complexity is a twofold challenge.Firstly, diet modellers gain their knowledge over the course of many years of education and professional experience, ultimately constructing an immensely convoluted model of what and when a specific diet can or can not be considered healthy.Translating such an intricate biological knowledge system into a set of computer instructions is a nontrivial challenge.Secondly, the goal of diet models is to change -or at least strive towards changing -consumers' dietary patterns, for which they need, at least to some extent, appeal to their preferences.Consumers rarely explicitly state their preferences, let alone explain the reasons behind them.However, in today's data-loaded world, their actions (e.g.recorded supermarket transactions) speak for themselves.The necessity of turning data into actionable insights bodes ill for current diet models, given their inability to do so.
Machine learning, a sub-field of artificial intelligence specialized for automated pattern recognition/feature extraction/(lossy) data compression [11], is a likely candidate for providing algorithmic solutions current diet models could benefit from, namely because of their ability to reshape vast amounts of data into useful information.
In this paper, we consider two machine learning paradigms, and three instances thereof for which we argue to be valuable additions to current diet models.We focus on the two aspects of diet modelling, namely nutrient importance weighting and consumer preferences.For nutrient importance weighting, where the goal is to assign importance weight to every nutrient so as to estimate the overall diet health score, we consider supervised learning approach (binary classification), and suggest two computational methods that could facilitate the translation of expert knowledge into a set of correlated importance weights -the Principle of maximum entropy and gradient boosted decision trees.For estimating consumer preferences, we consider the concept of recommendation systems, and suggest one possible instance thereof, namely the Top-N recommendation system based on mutual information and entropy weighting.

Current diet modelling paradigms
The concept of diet modelling dates back to at least 1940s, when the American economist and Nobel laureate Georges Stigler utilized mathematical programming (mathematical optimization) to solve the "diet problem" -a problem of finding the least costly combination of food items that satisfies all nutrient requirements [12].
Mathematical programming remains the principal workhorse behind the majority of today's diet models and can be characterized by the three main components: decision variables, an objective function, and a set of constraints [13].It also comes in many forms, depending on the types of decision variables (integers, reals), and the functional form of the objective function.Some of the most prevalent mathematical programming paradigms for diet modelling are linear programming, quadratic programming, mixed-integer programming, and goal-programming [14].
During the optimization process, the mathematical programming model aims to search for values for the decision variables that optimize (maximize or minimize) the objective function, while adhering to the preset constraints.In the context of nutrition, diet models based on mathematical programming aim to find the set of food item quantities that optimize a specific objective, (e.g. total diet cost, environmental objective, total deviation from an observed diet), while satisfying constraints such as nutritional recommendations, total energy intake, etc. [14].

Nutrient importance and consumer preferences
Diet models based on mathematical programming can be classified as hypothesisdriven methods [15], meaning that most of their components are selected based on a priori information.That is, the modeller tries to translate his/her expert knowledge into an appropriate model structure.
For instance, in case of optimizing diet healthiness where the goal is to minimize deviations between observed nutrient intake values and their respective recommended targets, each deviation will be assigned a specific weight, reflecting its relative importance.A total sum of weighted deviations is then equivalent to the overall diet health score, i.e. diet healthiness.
Selecting a set of weights that accurately reflect reality is a challenging task, while almost all diet modellers resort to a uniform set of weights, regardless of the setting [16,17].We argue that it would be beneficial for diet modellers to have a way of adjusting the weights in case of their disagreement with the uniform set.
Furthermore, in order to generate acceptable optimized diets, diet models have to somehow be able to model the concept of "acceptability", i.e. somehow model preferences of each consumer.This is currently done by taking into account food consumption distribution in the constraints, or by deriving the average observed diet if the aim is to stay as close as possible to current dietary habits [14].The former approach generally takes a lot of hard-coding, rules of thumb, and has to be re-implemented manually every time the setting changes, whereas the latter might be "too personalized", in the sense that the diet model puts a lot of focus on the observed food items, while disregarding the rest.Moreover, it has been observed that in order to meet nutritional targets, observed individual diets generally have to be expanded with new food items, which tends to be done in an arbitrary fashion [14].Having a more generic, possibly data-driven approach that can collect and process consumer data from multiple sources, and subsequently deliver personalized food item recommendations, could add a lot of value to the already existing diet models, and possibly bridge the gap between hypothesis-driven and data-driven methods.
3 Data-driven approaches to diet modelling

Inferring function weights
As described in the previous section, while modelling diets via deviations from the optimal diet, objective function weights represent the "importance" of each deviation.Currently, most diet models resort to a uniform set of weights, regardless of the setting (e.g.different sub-populations and their health weights).Indeed, the weights can be changed manually, however given the multidimensional nature of the problem, it is arguably very hard for the modeller to accurately translate its own beliefs into a set of correlated weights.
We consider two possible methods for dealing with this challenging task, namely the Principle of maximum entropy (MaxEnt), and Gradient boosted decision trees.MaxEnt is relatively easy to implement, converges to an exact solution (convex optimization), and is highly usable, because once implemented, it requires almost no prerequisite knowledge or technical skills, apart from nutritional expertise.On the other hand, gradient boosted decision trees tend to be exceptionally successful while dealing with tabular data, giving out state-ofthe-art results on many standard classification benchmarks [18].Both methods are explained in the subsequent sections.
Principle of maximum entropy MaxEnt is a general method for estimating probability distributions from data.The core principle behind MaxEnt is that when nothing is known, the distribution should be as uniform as possible, that is, have maximal information entropy.As bits of information are becoming available, the distribution is updated in a way that adheres to the constraints imposed by the new information, while maximizing the entropy [19].
Essentially, where current models end, MaxEnt begins.That is, starting with a uniform set of weights, diet modellers are able to interact with MaxEnt via pairwise comparisons of diets, i.e. they are presented with diet pairs, and are instructed to select a "better" (e.g.healthier, more sustainable, etc.) diet, according to their expert knowledge.Their classification results get translated into a set of inequalities which will constrain MaxEnt's search space, in order to find a set of weights that are in line with the expert knowledge.
Gradient boosted decision trees Decision tree learning is a type of predictive modelling approach commonly used within the machine learning/statistics domain, both for regression and classification problems.In order to make predictions, decision trees stratify the predictor space into a number of simple regions, while the splitting rules used for stratification can be represented as a directed acyclic graph whose child nodes can only have a single parent, i.e. a tree, hence their name.In their basic form, they tend not to be as accurate as some other prediction models, however their prediction accuracy can be increased significantly if coupled with some other machine learning methods, such as gradient boosting [20,18].Furthermore, decision trees tend to be highly interpretable (e.g. each decision can easily be manually inspected), robust (e.g.do not require significant data preprocessing), scalable, and suitable for parallel and distributed computation [20].
Because of their interpretability and high predictive power, we consider a decision tree-based method as a viable option for inferring objective function weights, namely gradient boosted decision trees.We cast the problem of inferring function weights as a binary classification problem, where the setting is the same as with the maximum entropy principle -diet modellers are presented with diet pairs and are instructed to select the one they prefer more, with respect to the objective function.By doing so, they generate input-output pairs on which gradient boosted decision trees are then trained/validated.Upon convergence, we select the best performing decision tree, and rank input features according to their importance, which in this case is synonymous with their total information gain [20].We further normalize the results, so as to obtain a proper probability distribution.
Besides inferring objective function weights, the method can provide diet modellers with a variety of other information.For instance, features that appear together in a traversal path are interacting with one another (since the condition of a child node is predicated on the condition of the parent node), which gives modellers the ability to discover interactions among features [18].

Inferring consumer preferences
Diet acceptability modelling has been deemed interesting and important both as a constraint, and an objective, depending on the modelling approach.It is generally modelled as the total deviation between the optimal food item intakes, and the food item intakes of the current diet.The assumption is that if two diets do not deviate significantly in terms of food items and their quantities, they can be considered similar, and almost equally acceptable.This definition of acceptability implicitly takes into account a variety of different factors that might contribute to the diversity of dietary patterns, for instance cultural and lifestyle differences [14].
However, modelling diet acceptability solely through total deviation leaves a lot to be desired.For instance, such approach does not provide the option to search for likely preferable food items that have not been recorded in a consumer's diet, which puts a severe limitation on diet model flexibility.As mentioned before, in order to meet nutritional targets, observed individual diets generally have to be expanded with new food items, which, if done in an arbitrary fashion, could have detrimental effects on diet acceptability.The same thing applies in case a modeller would like to diversify the observed diets by including new food items.
In the following section, we consider a data-driven modelling paradigm based on the concept of recommendation systems [21], namely the Top-N recommendation system based on mutual information and entropy weighting, that can analyze consumer historical data, and turn it into actionable insights that can complement the already existing diet models.By doing so, we are establishing one of potentially many links between the hypothesis-driven methods, and more datadriven machine learning methods.
Top-N recommendation system based on mutual information and entropy weighting In order to meet all nutritional constraints, consumers' diets often have to be expanded with new, previously unobserved food items.Most current diet models base their selection of new food items on some form of the majority vote, that is selecting those unobserved foods that are present in e.g.≥ 50% of diets in the sample [14].Although better than random selection, such an approach leaves enough room for improvement.
We consider a recommendation system that can process user data from potentially multiple sources, and leverage algorithmically derived information on consumers' routine behavioral patterns so as to subsequently deliver personalized product recommendations.The recommendation system consists of two computational steps -the preprocessing and pairwise similarity computation.
Preprocessing of the input data can significantly facilitate the extraction of information [22].It comes in many flavors, ranging from very simple procedures such as normalization and standardization, various kernel functions, weighting functions [23], (non)linear dimensionality reduction methods, to more sophisticated and automatic feature extractors such as deep neural networks [22].The selection of one or more preprocessing steps will depend on a variety of factors, including, but not limited to, available computing power, and the amount and type of data.Given that the majority of consumer behavior data is not publicly available, diet modellers predominantly leverage relatively scarce data sets obtained via questionnaires [14], hence why we consider a handcrafted weighting function based on the concept of information entropy, which has been shown to perform significantly better than most handcrafted weighting functions in some tasks [24].
As the name suggests, similarity computation step serves for computing either pairwise food item or consumer similarities.Selecting an appropriate similarity measure is of crucial importance.We consider the mutual information, a core information-theoretic quantity that acts as a general measure of dependence between two random variables [25], as opposed to some other commonly employed correlation metrics (e.g.Pearson correlation coefficient), which measure only linear dependence.
Similarity computation ultimately results in a fully connected similarity graph that can be queried in many ways.In case of food item similarities, such a graph can for instance support the prediction of top N likely preferable food items by consumers that have not reported consuming those food items.
For each consumer m i that has purchased a set U of food items, we compute the set C by taking the union of the k most similar food items for each item n i ∈ U .After that, we remove all food items from C that are already in U .Then, for each item c ∈ C we compute its similarity to the set U by summing the similarities between all food items n i ∈ U and c, using only the k most similar food items of n i .Lastly, the food items in C are sorted in non-increasing order with respect to their similarity to the set U , and the first N food items are selected for the recommendation.Clearly, N and k are tunable parameters, and their selection can affect the speed and quality of recommendations [26].

Conclusion
Diet models based on mathematical programming have been used extensively during the last couple of decades, and have stood the test of time.With straightforward structure, fast execution, and high usability, mathematical programming poses as an obvious first choice for diet modelling.However, being mainly hypothesis-driven, such diet models often neglect important aspects of today's world -the abundance of user data, and the availability of algorithms that can turn data into actionable insights.
In this paper, we provide just a few examples of available data-driven methods that could greatly facilitate the diet modelling process.With MaxEnt and Gradient boosted decision trees, we provide diet modellers with the means for interacting with their diet models, so as to translate their expert knowledge into a "machine-readable" format.With the Top-N recommendation system, we are enriching the existing diet models with algorithmic and much more informationrich "word-of-mouth" recommendations.Indeed, data-driven algorithms come with a few caveats.For instance, current state-of-the-art methods are still rather data-inefficient, meaning that they need a significant amount of data to obtain high generalization power.Furthermore, they can also be energy-inefficient, in the sense that large amount of data require significant computational power to be processed in a reasonable amount of time.

Fig. 1 .
Fig. 1.The majority of diet models are based on some form of mathematical programming, with the two most common objectives: health and acceptability.