Skip to Main content Skip to Navigation
New interface
Conference papers

Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets

Abstract : Twitter is the most popular social platform where common people reflect their personal, political and business views that obliquely build an active online repository. The data presented by users on social networking sites are usually composed of sensitive or private data that is highly potential for cyber threats. The most frequently presented sensitive private data is analyzed by collecting real-time tweets based on benchmarked cyber-keywords under personal, professional and health categories. This research work aims to generate a Topic Keyword Extractor by adapting the Automatic Acronym - Abbreviation Replacer which is specially developed for social media short texts. The feature space is modeled using the Latent Dirichlet Allocation technique to discover topics for each cyber-keyword. The user’s context and intentions are preserved by replacing the internet jargon and abbreviations. The originality of this research work lies in identifying sensitive keywords that reveal Tweeter’s Personally Identifiable Information through the novel Topic Keyword Extractor. The potential sensitive topics in which the social media users frequently exhibit personal information and unintended information disclosures are discovered for the benchmarked cyber-keywords by adapting the proposed qualitative topic-wise keyword distribution approach. This experiment analyzed cyber-keywords and the identified sensitive topic keywords as bi-grams to predict the most common sensitive information leaks happening in Twitter. The results showed that the most frequently discussed sensitive topic was ‘weight loss’ with the cyber-keyword ‘weight’ of the health tweet category.
Document type :
Conference papers
Complete list of metadata
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Thursday, November 18, 2021 - 2:20:09 PM
Last modification on : Thursday, November 18, 2021 - 2:32:17 PM
Long-term archiving on: : Saturday, February 19, 2022 - 7:10:29 PM


 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2023-01-01

Please log in to resquest access to the document


Distributed under a Creative Commons Attribution 4.0 International License



R. Geetha, S. Karthika. Sensitive Keyword Extraction Based on Cyber Keywords and LDA in Twitter to Avoid Regrets. 3rd International Conference on Computational Intelligence in Data Science (ICCIDS), Feb 2020, Chennai, India. pp.59-70, ⟨10.1007/978-3-030-63467-4_5⟩. ⟨hal-03434777⟩



Record views