Preliminary Findings on the Occurrence and Causes of Data Smells in a Real-World Business Travel Data Processing Pipeline
Detection of poor quality data is crucial for enhancing data-driven systems’ quality. Although there is a lot of research on data validation, the topic of potential data quality issues is still underexplored. Such latent issues of data smells can stay undetected for long periods but might lead to a poor and error-prone future performance of data-intensive systems. Detecting data smells is not trivial and requires knowledge about their causes and consequences. In this paper, we present the preliminary findings on the causes and severity of data smells based on a study of a real-world business travel data set and the data processing pipeline behind it. The results show that data smells exist in this data set and cause severe problems. Moreover, although many data smells already occur in raw data, some smells are created during the transformation and enrichment stages of the data processing pipeline. These findings indicate the importance of the data pipeline itself for future research on data smells. Thus, this article proposes potential future work in this area.
Thu 17 NovDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
11:00 - 12:30 | Session 2SEA4DQ at ERC SR 11 Chair(s): Beatriz Bretones Cassoli TU Darmstadt, Nicolas Jourdan Technical University of Darmstadt | ||
11:00 30mLong-paper | Data Quality as a Microservice - an ontology and rule based approach for quality assurance of sensor data in manufacturing machines SEA4DQ | ||
11:30 20mShort-paper | Preliminary Findings on the Occurrence and Causes of Data Smells in a Real-World Business Travel Data Processing Pipeline SEA4DQ Valentina Golendukhina University of Innsbruck, Harald Foidl University of Innsbruck, Michael Felderer University of Innsbruck, Rudolf Ramler Software Competence Center Hagenberg | ||
11:50 30mLong-paper | Effect of Time Patterns in Mining Process Invariants for Industrial Control Systems: An Experimental Study SEA4DQ Muhammad Azmi Umer Codex LLC, Karachi, Aditya Mathur Singapore University of Technology and Design, Muhammad Taha Jilani PAF Karachi Institute of Economics and Technology |