Abstract : We propose an original hybrid modeling process of urban scenes that represents 3-D models as a combination of mesh-based surfaces and geometric 3-D-primitives. Meshes describe details such as ornaments and statues, whereas 3-D-primitives code for regular shapes such as walls and columns. Starting from an 3-D-surface obtained by multiview stereo techniques, these primitives are inserted into the surface after being detected. This strategy allows the introduction of semantic knowledge, the simplification of the modeling, and even correction of errors generated by the acquisition process. We design a hierarchical approach exploring different scales of an observed scene. Each level consists first in segmenting the surface using a multilabel energy model optimized by -expansion and then in fitting 3-D-primitives such as planes, cylinders or tori on the obtained partition where relevant. Experiments on real meshes, depth maps and synthetic surfaces show good potential for the proposed approach.