Mon 14 - Fri 18 November 2022 Singapore
Thu 17 Nov 2022 11:30 - 11:50 at ERC SR 11 - Session 2 Chair(s): Beatriz Bretones Cassoli, Nicolas Jourdan

Detection of poor quality data is crucial for enhancing data-driven systems’ quality. Although there is a lot of research on data validation, the topic of potential data quality issues is still underexplored. Such latent issues of data smells can stay undetected for long periods but might lead to a poor and error-prone future performance of data-intensive systems. Detecting data smells is not trivial and requires knowledge about their causes and consequences. In this paper, we present the preliminary findings on the causes and severity of data smells based on a study of a real-world business travel data set and the data processing pipeline behind it. The results show that data smells exist in this data set and cause severe problems. Moreover, although many data smells already occur in raw data, some smells are created during the transformation and enrichment stages of the data processing pipeline. These findings indicate the importance of the data pipeline itself for future research on data smells. Thus, this article proposes potential future work in this area.

Thu 17 Nov

Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

11:00 - 12:30
Session 2SEA4DQ at ERC SR 11
Chair(s): Beatriz Bretones Cassoli TU Darmstadt, Nicolas Jourdan Technical University of Darmstadt
