Optional realization of the French negative particule (ne) on Twitter: Can big data reveal new sociolinguistic patterns?

Abstract : From the outset, sociolinguistics has taken the question of data seriously (Labov, 1975). It is thus not surprising that the field recently joined the movement of computational social sciences (Lazer et al., 2009) that results from the ability to collect and model vast digital datasets concerning the behavior of individuals in collective contexts. The emerging field of computational sociolinguistics (Nguyen et al., 2016) works on data resulting from the use of sensors (proximity sensors, wearable recorders) or the digital communication that permits automatic, ongoing and unsupervised recording through the collection of traces on the web, social media or portable terminals. This paper aims at illustrating how large datasets including language and social links reveal sociolinguistic patterns that could remain invisible with smaller samples. More precisely, the dataset includes 100 million of tweets authored by 1 million of users, combined with the follower links between them. The tweets are written in French and the sample represents 10% of the production in the GMT+1 time zone between June 2014 and July 2016. We examine (ne), a sociolinguistic variable of French: optional realization of the first morpheme of the negation (Je fume pas vs. Je ne fume pas, I do not smoke) for three reasons : (ne) is a well-documented sociolinguistic marker of spoken French (Armstrong et Smith, 2002, inter alia) ; realization and omission of (ne) are visible in the written tweets; (ne) is always realized in the standard writing, which allows an assessment of the adherence of the users to the writing norm. We will present the empirical procedures for extracting the tweets that include a negative construction and for constructing a social network based on the reciprocal mentions between users. We will then focus on three results: 1/ The overall score of (ne) realization and its regional variation in France (approx. 16% in the North and 28% in the South); 2/ A never before seen pattern showing a very regular variation of (ne) realization according to the time of day, every day in the week (increase in the morning, decrease during the night); 3/ The observation that users with high scores interact frequently with each other. The discussion focusses on the sociolinguistic meaning of the results, including the close examination of the risk of bias. Finally, we will defend that thick data should combine with big data in order to explain such patterns (Wang, 2013).
Type de document :
Communication dans un congrès
ICLAVE 9 2017 - International Conference on Language Variation in Europe , Jun 2017, Malaga, Spain. 〈http://www.iclave9.uma.site/〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01456302
Contributeur : Jean-Pierre Chevrot <>
Soumis le : samedi 4 février 2017 - 17:22:02
Dernière modification le : jeudi 15 juin 2017 - 09:08:42

Identifiants

  • HAL Id : hal-01456302, version 1

Collections

Citation

Paul Mangold, Yannick Léo, Jean-Pierre Chevrot, Eric Fleury, Màrton Karsai, et al.. Optional realization of the French negative particule (ne) on Twitter: Can big data reveal new sociolinguistic patterns? . ICLAVE 9 2017 - International Conference on Language Variation in Europe , Jun 2017, Malaga, Spain. 〈http://www.iclave9.uma.site/〉. 〈hal-01456302〉

Partager

Métriques

Consultations de la notice

507