Skip to Main content Skip to Navigation
New interface
Master thesis

Gender Discrimination in Data Analysis: a Socio-Technical Approach

Riccardo Corona 1, 2 
1 VALDA - Value from Data
DI-ENS - Département d'informatique - ENS Paris, Inria de Paris
Abstract : Technology characterizes and facilitates our daily lives, but its pervasive use can result in the introduction or the exacerbation of social problems. Because of their intrinsic complexity, these issues require to be addressed from different but complementary perspectives, which are provided to us by two disciplines of very different nature: data science and sociology. Specifically, this thesis would like to be a bridge between the technical field of data analysis and a specific category of social problems, namely that of discrimination, and, in particular, gender discrimination. To move within this context, we use an approach that has data analysis as its starting point, and which finds in sociology a useful supporting instrument, as well as a source of requirements. We investigate in depth the sociological reasons behind gender discrimination in the specific society of our interest – the American one – introducing and exploring what is commonly referred as ‘gender gap’, and we carry out several experiments on data related to U.S. employees, focusing on the economic perspective (gender pay gap) but taking into account the different other facets of the problem. The main contributions of this thesis derive from the application of preprocessing techniques and the use of tools created with the aim of detecting bias in data, with which we try to understand which design choices have the greatest impact on the so-called ‘fairness’ of the results, and of which we highlight strengths and weaknesses, emphasizing the importance of a multidisciplinary approach to problems of this kind, that is essential to obtain information on the complex context in which data are embedded.
Complete list of metadata
Contributor : Pierre Senellart Connect in order to contact the contributor
Submitted on : Monday, October 11, 2021 - 9:58:54 PM
Last modification on : Friday, December 2, 2022 - 5:50:05 PM
Long-term archiving on: : Wednesday, January 12, 2022 - 8:48:17 PM


Files produced by the author(s)


  • HAL Id : hal-03374130, version 1



Riccardo Corona. Gender Discrimination in Data Analysis: a Socio-Technical Approach. Databases [cs.DB]. 2021. ⟨hal-03374130⟩



Record views


Files downloads