Knowledge-based zooming for metabolic models

Anna Zhukova; David James Sherman

Résumé

Genome-scale metabolic models for new organisms include thousands of reactions. In most cases these reactions are automatically inferred by methods that combine databases of re- actions and pathways with genomic information and existing models for similar organisms [1]. Genomic data for the new organism is compared to the data of the reference organism, to find genomic evidence such as the presence of catalysing enzymes for the reactions conserved in the new organism. Starting from the inference of a draft model, the model refinement process includes several iterations of model analysis, error detection, and improvement [2]. The models produced at each iteration are intended for computer simulation, and so describe all the reactions thought to participate in the organism's metabolism. Although automatic model inference tools and genome comparison methods are becoming more and more advanced, they still may leave gaps in the model or add erroneous reactions. Thus, model evaluation by human experts remains important at all the iteration steps. However, because of their completeness, genome-scale models are too detailed and complicated to be easily understood by a human. The abundance of reactions in the model may hide errors. For example, if in a genome-scale model of an yeast Yarrowia lypolitica (MODEL1111190000 [3]) the enzyme EC 2.3.1.16 were missing, the whole group of Acyl-CoA:acetyl-CoA C-acyltransferase reactions participating in the Beta-oxidation of fatty acids pathway [4] would be eliminated: one for each of the six 3-oxoacyl-CoA species (3-oxodecanoyl-CoA, 3-oxohexacosanoyl-CoA, 3-oxolauroyl-CoA, 3-oxooctadecanoyl-CoA, 3-oxopalmitoyl-CoA, and 3-oxotetradecanoyl-CoA) present in the model. However, the absence of these six reactions would be hidden by the other 59 reactions in the constitutive peroxisome of Yarrowia lypolitica, and a human expert may have difficulty noticing the error. To aid human understanding of these complete models, we developed a method for knowledge- based zooming that provides a higher-level view of a model while keeping its essential structure. The zooming process groups chemical species present in the model into semantically equivalent classes, and merges them into a generalized chemical species. The ubiquitous species, that partici- pate in many reactions and are common to most of the models, e.g. water, ATP, oxygen, do not need to be generalized: Each of them forms a trivial equivalence class. The other species are divided into non-trivial equivalence classes, based on their hierarchical relationships in the ChEBI on- tology [5], and generalized accordingly. For example, 3-oxodecanoyl-CoA, 3-oxohexacosanoyl- CoA, and 3-oxolauroyl-CoA can be all generalized into 3-oxoacyl-CoA. Reactions that involve same generalized chemical species are then factored together into a generalized reaction. The zooming process is represented on figure Fig. 1. By applying this process, we can build a simplified model that focusses on the high level relation- ships. Our method obeys several consistency restrictions, such as conserving the number of dis- tinct species participating in each reaction (i.e. preserving reaction stoichiometry); and preserv- ing connectivity, i.e. for every pair of reactions sharing a reactant/product in the initial model, the "zoomed out" reactions share the "zoomed out" reactant/product. We implemented our method (in Python) and applied it to several genome-scale metabolic models.

Knowledge-based zooming for metabolic models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager