Skip to Main content Skip to Navigation
Conference papers

Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms

Abstract : In the era of big data, cross-media and multi-modal data are expanding, and data processing methods fail to meet corresponding functional requirements. Aiming at the characteristic of large expression gap of multi-model data, This paper proposes a multimodal data fusion method based on deep learning, which combines the advantages of deep learning in the field of image detection, text sequence prediction, and the multi-attention mechanism. The BLEU algorithm is used to calculate the similarity of four levels of description statements of model output and image. Training and testing were conducted in the Flickr8K data set. Comparing with the traditional single mode state image description method, the experiments show that under the BLEU index, the multi-AM model can achieve better results.
Document type :
Conference papers
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

https://hal.inria.fr/hal-02197765
Contributor : Hal Ifip <>
Submitted on : Tuesday, July 30, 2019 - 5:00:19 PM
Last modification on : Tuesday, July 30, 2019 - 5:12:32 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Jian Yang, Zuqiang Meng. Image Semantic Description Based on Deep Learning with Multi-attention Mechanisms. 10th International Conference on Intelligent Information Processing (IIP), Oct 2018, Nanning, China. pp.356-362, ⟨10.1007/978-3-030-00828-4_36⟩. ⟨hal-02197765⟩

Share

Metrics

Record views

40