L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected CRFs, ICLR, 2015.
DOI : 10.1109/tpami.2017.2699184

URL : http://arxiv.org/abs/1606.00915

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1371/journal.pcbi.0040027

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.6629

S. Fidler, R. Mottaghi, A. Yuille, and R. Urtasun, Bottom-Up Segmentation for Top-Down Detection, 2013 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2013.423

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.296.7948

C. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, DSSD: Deconvolutional single shot detector, p.3, 2017.

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

URL : http://arxiv.org/abs/1311.2524

S. Gould, T. Gao, and D. Koller, Region-based segmentation and object detection, NIPS, 2009.

B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, Semantic contours from inverse detectors, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126343

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.801

K. He, X. Zhang, S. Ren, and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
DOI : 10.1109/CVPR.2016.90

URL : http://arxiv.org/abs/1512.03385

D. Kingma and J. Ba, Adam: A method for stochastic optimization, ICLR, 2015.

I. Kokkinos, UberNet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory, CVPR, 2017. 1

Y. Lecun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard et al., Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, vol.1, issue.4, pp.541-551, 1989.
DOI : 10.1007/BF00133697

Y. Li, K. He, and J. Sun, Object detection via region-based fully convolutional networks, NIPS, 2007.

T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan et al., Feature pyramid networks for object detection, CVPR, 2017.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed et al., SSD: Single Shot MultiBox Detector, ECCV, 2007.
DOI : 10.1109/CVPR.2008.4587597

URL : http://arxiv.org/pdf/1512.02325

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298965

URL : http://arxiv.org/pdf/1411.4038

R. Mottaghi, X. Chen, X. Liu, N. Cho, S. Lee et al., The Role of Context for Object Detection and Semantic Segmentation in the Wild, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.119

A. Newell, K. Yang, and J. Deng, Stacked Hourglass Networks for Human Pose Estimation, ECCV, 2007.
DOI : 10.1109/ICCV.2015.178

URL : http://arxiv.org/abs/1603.06937

H. Noh, S. Hong, and B. Han, Learning Deconvolution Network for Semantic Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), p.3
DOI : 10.1109/ICCV.2015.178

URL : http://arxiv.org/abs/1505.04366

G. Papandreou, L. Chen, K. P. Murphy, and A. L. Yuille, Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.203

P. O. Pinheiro, T. Lin, R. Collobert, and P. Dollár, Learning to Refine Object Segments, ECCV, 2016.
DOI : 10.5244/C.30.15

URL : http://arxiv.org/abs/1603.08695

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.
DOI : 10.1109/CVPR.2016.91

URL : http://arxiv.org/abs/1506.02640

J. Redmon and A. Farhadi, YOLO9000: better, faster, stronger. arXiv preprint, 2016.

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS, 2015. 1, p.7
DOI : 10.1109/TPAMI.2016.2577031

URL : http://arxiv.org/abs/1506.01497

O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, p.7, 2015.
DOI : 10.1007/978-3-319-24574-4_28

URL : http://arxiv.org/abs/1505.04597

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2003.

M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, Multinet: Real-time joint semantic reasoning for autonomous driving

J. Yao, S. Fidler, and R. Urtasun, Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation, CVPR, 2012. 1