Review Spam Detection Using Word Embeddings and Deep Neural Networks

Aliaksandr Barushka; Petr Hajek

doi:10.1007/978-3-030-19823-7_28

Communication Dans Un Congrès Année : 2019

Review Spam Detection Using Word Embeddings and Deep Neural Networks

(1) , (1)

Aliaksandr Barushka

Fonction : Auteur
PersonId : 1033474

Institute of System Engineering and Informatics [University of Pardubice]

Petr Hajek

Fonction : Auteur
PersonId : 992409

Institute of System Engineering and Informatics [University of Pardubice]

Résumé

Review spam (fake review) detection is increasingly important taking into consideration the rapid growth of internet purchases. Therefore, sophisticated spam filters must be designed to tackle the problem. Traditional machine learning algorithms use review content and other features to detect review spam. However, as demonstrated in related studies, the linguistic context of words may be of particular importance for text categorization. In order to enhance the performance of review spam detection, we propose a novel content-based approach that considers both bag-of-words and word context. More precisely, our approach utilizes n-grams and the skip-gram word embedding method to build a vector model. As a result, high-dimensional feature representation is generated. To handle the representation and classify the review spam accurately, a deep feed-forward neural network is used in the second step. To verify our approach, we use two hotel review datasets, including positive and negative reviews. We show that the proposed detection system outperforms other popular algorithms for review spam detection in terms of accuracy and area under ROC. Importantly, the system provides balanced performance on both classes, legitimate and spam, irrespective of review polarity.

Mots clés

Review spam Skip-gram Word2vec Word embedding Neural network

Domaines

Informatique [cs]

Fichier principal

483292_1_En_28_Chapter.pdf (400.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02331287

Soumis le : jeudi 24 octobre 2019-12:49:31

Dernière modification le : jeudi 19 novembre 2020-13:04:16

Archivage à long terme le : samedi 25 janvier 2020-14:47:50

Dates et versions

hal-02331287 , version 1 (24-10-2019)

Licence

Paternité

Identifiants

HAL Id : hal-02331287 , version 1
DOI : 10.1007/978-3-030-19823-7_28

Citer

Aliaksandr Barushka, Petr Hajek. Review Spam Detection Using Word Embeddings and Deep Neural Networks. 15th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), May 2019, Hersonissos, Greece. pp.340-350, ⟨10.1007/978-3-030-19823-7_28⟩. ⟨hal-02331287⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP IFIP-AICT IFIP-TC IFIP-WG IFIP-TC12 IFIP-AIAI IFIP-WG12-5 IFIP-AICT-559

282 Consultations

151 Téléchargements

Review Spam Detection Using Word Embeddings and Deep Neural Networks

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager