A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems, pp.1097-1105, 2012.
DOI : 10.1145/3065386
URL : http://dl.acm.org/ft_gateway.cfm?id=3065386&type=pdf

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, ICLR, 2015.

C. Szegedy, W. Liu, and Y. Jia, Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition, vol.2015, pp.1-9
DOI : 10.1109/cvpr.2015.7298594
URL : http://arxiv.org/pdf/1409.4842

K. He, X. Zhang, and S. Ren, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.
DOI : 10.1109/cvpr.2016.90
URL : http://arxiv.org/pdf/1512.03385

S. Ren, K. He, and R. Girshick, Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, pp.91-99, 2015.
DOI : 10.1109/tpami.2016.2577031
URL : http://arxiv.org/pdf/1506.01497

T. Lin, P. Dollár, and R. Girshick, Feature pyramid networks for object detection, 2016.
DOI : 10.1109/cvpr.2017.106
URL : http://arxiv.org/pdf/1612.03144

A. Razavian, H. Azizpour, and J. Sullivan, CNN features off-the-shelf: an astounding baseline for recognition[C]//Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on. IEEE, pp.512-519, 2014.
DOI : 10.1109/cvprw.2014.131
URL : http://arxiv.org/pdf/1403.6382.pdf

A. Razavian, J. Sullivan, and S. Carlsson, Visual instance retrieval with deep convolutional networks
DOI : 10.3169/mta.4.251
URL : https://www.jstage.jst.go.jp/article/mta/4/3/4_251/_pdf

, ITE Transactions on Media Technology and Applications, vol.4, issue.3, pp.251-258, 2016.

G. Tolias, R. Sicre, and H. Jégou, Particular object retrieval with integral max-pooling of CNN activations, ICLR, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01842218

Y. Ng, J. , Y. F. Davis, and L. , Exploiting local features from deep networks for image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.53-61, 2015.
DOI : 10.1109/cvprw.2015.7301272
URL : http://arxiv.org/pdf/1504.05133

Y. Kalantidis, C. Mellina, and S. Osindero, Cross-dimensional weighting for aggregated deep convolutional, pp.685-701, 2016.
DOI : 10.1007/978-3-319-46604-0_48
URL : http://arxiv.org/pdf/1512.04065

A. Babenko and V. Lempitsky, Aggregating local deep features for image retriev, Proceedings of the IEEE international conference on computer vision, pp.1269-1277, 2015.

T. Hoang, T. Do, and D. Tan, Selective Deep Convolutional Features for Image Retrieval, 2017.
DOI : 10.1145/3123266.3123417
URL : http://arxiv.org/pdf/1707.00809

A. Veit, M. Wilber, and S. Belongie, Residual networks behave like ensembles of relatively shallow networks, Advances in Neural Information Processing Systems, pp.550-558, 2016.

H. Liu, Y. Tian, and Y. Yang, Deep relative distance learning: Tell the difference between similar vehicles, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2167-2175, 2016.
DOI : 10.1109/cvpr.2016.238

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.
DOI : 10.1109/cvpr.2015.7298965
URL : http://arxiv.org/pdf/1411.4038

B. Hariharan, P. Arbeláez, and R. Girshick, Hypercolumns for object segmentation and fine-grained localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.447-456, 2015.
DOI : 10.1109/cvpr.2015.7298642
URL : http://arxiv.org/pdf/1411.5752

X. Liu, W. Liu, and T. Mei, A deep learning-based approach to progressive vehicle reidentification for urban, pp.869-884, 2016.
DOI : 10.1007/978-3-319-46475-6_53

Y. Yuan, K. Yang, and C. Zhang, Hard-Aware Deeply Cascaded, 2017.
DOI : 10.1109/iccv.2017.94
URL : http://arxiv.org/pdf/1611.05720

Y. Bai, F. Gao, and Y. Lou, Incorporating Intra-Class Variance to Fine-Grained Visual Recognition, 2017.