HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Impact of Data Cleansing for Urban Bus Commercial Speed Prediction

Gauthier Lyan 1, 2 David Gross-Amblard 3 Jean-Marc Jézéquel 1 Simon Malinowski 4
1 DiverSe - Diversity-centric Software Engineering
Inria Rennes – Bretagne Atlantique , IRISA-D4 - LANGAGE ET GÉNIE LOGICIEL
4 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Public Transportation Information Systems (PTIS) are widely used for public bus services amongst cities in the world. These systems gather information about trips, bus stops, bus speeds, ridership, etc. This massive data is an inviting source of information for machine learning predictive tools. However, it most often suffers from quality deficiencies, due to multiple data sets with multiple structures, to different infrastructures using incompatible technologies, to human errors or hardware failures. In this paper, we consider the impact of data cleansing on a classical machine-learning task: predicting urban bus commercial speed. We show that simple, transport specific business and quality rules can drastically enhance data quality, whereas more sophisticated rules may offer little improvements despite a high computational cost.
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/hal-03220449
Contributor : Gauthier Lyan Connect in order to contact the contributor
Submitted on : Friday, May 7, 2021 - 11:13:47 AM
Last modification on : Friday, April 8, 2022 - 4:08:03 PM
Long-term archiving on: : Sunday, August 8, 2021 - 6:27:08 PM

File

Impact_of_data_cleaning_on_bus...
Files produced by the author(s)

Identifiers

Citation

Gauthier Lyan, David Gross-Amblard, Jean-Marc Jézéquel, Simon Malinowski. Impact of Data Cleansing for Urban Bus Commercial Speed Prediction. SN Computer Science, Springer, 2021, pp.1-11. ⟨10.1007/s42979-021-00966-1⟩. ⟨hal-03220449⟩

Share

Metrics

Record views

102

Files downloads

190