SEA4DQ Workshop 2021

About SEA4DQ

Please note that this workshop will take place online.

Cyber-physical systems (CPS) have been developed in many industrial sectors and application domains in which the quality of the data acquired and used for decision support is a common factor. Data quality can deteriorate due to factors such as sensor faults and failures due to operating in harsh and uncertain environments.

How can software engineering and artificial intelligence (AI) help manage and tame data quality issues in CPS?

This is the question we aim to investigate in this workshop SEA4DQ. Emerging trends in software engineering need to take data quality management seriously as CPS are increasingly data-centric in their approach to acquiring and processing data along the edge-fog-cloud continuum. This workshop will provide researchers and practitioners a forum for exchanging ideas, experiences, understanding of the problems, visions for the future, and promising solutions to the problems in data quality in CPS.

Tweets by sea4dq

Topics of Interest

Software/hardware co-design and architectures and frameworks for data quality management in CPS
Software engineering and AI to detect anomalies in CPS data
Software engineering and AI to repair erroneous CPS data
Software tools for data quality management, testing, and profiling
Public sensor datasets from CPS (manufacturing, digital health, energy,...)
Distributed ledger and blockchain technologies for quality tracking
Quantification of data quality hallmarks and uncertainty in data repair
Sensor data fusion techniques for improving data quality and prediction
Augmented data quality
Case studies that have evaluated an existing technique or tool on real systems, not only toy problems, to manage data quality in cyber-physical systems in different sectors.
Certification and standardization of data quality in CPS
Approaches for secure and trusted data sharing, especially for data quality, management, and governance in CPS
Trade-offs between data quality and data security in CPS

Schedule

24th of August 2021 - All times are in CET

Start - End	Duration	Topic	Presenters
08:50 - 09:05	0:15	Virtual Meeting Setup	Phu Nguyen
09:05 - 09:15	0:10	Welcome, Objectives and Agenda	Sagar Sen and Per Myrseth
09:15 - 10:15	1:00	Keynote: Data quality focus as a competitive advantage	Per Myrseth
10:15 - 10:45	0:30	The Development of Data Quality Management System for Ship IoT Data - Perspective of Ship Owner and Operator	Putu Hangga and Shogo Yamada
10:45 - 11:00	0:15	Short Break
11:00 - 11:30	0:30	Representative Sampling	Frank Westad and Torbjørn Pedersen
11:30 - 12:00	0:30	The Preliminary Results of A Systematic Review of Data Quality for CPS, IoT, or Industry 4.0 Applications	Phu Nguyen, Sagar Sen, Enrique Garcia-Ceja, Arda Goknil, Karl John Pedersen, Abdillah Suyuthi, Dimitra Politaki, Harris Niavis and Amina Ziegenbein
12:00 - 12:30	0:30	Anomaly Detection in Manufacturing Time Series Data	Dimitra Politaki, Panos Protopapas and Ibad Kureshi
12:30 - 13:30	1:00	Long Break
13:30 - 14:00	0:30	Injection molding supervision and plastic part quality assurance	Ronan Le Goff and Nils Marchal
14:00 - 14:30	0:30	How the International Data Spaces´ approach for secure and trusted data sharing contributes to ensuring data quality	Sonia Jimenez
14:30 - 15:30	1:00	Panel Discussion	Sagar Sen, Frank Westad, Per Myrseth and Dimitra Politaki
15:30 - 15:35	0:05	Closing Statement	Sagar Sen and Per Myrseth

Registration can be done at the FSE conference website. Details to login to the online SEA4DQ workshop can be found in the ESEC/FSE 2021 Program.

Keynotes

Per Myrseth

Service Lead: Data Management, Data Science and Assurance of Digital Assets,
DNV AS, Norway

Title: "Data Quality Focus as a Competitive Advantage"

To improve production processes data are created, refined, merged and shared to meet the need of business processes and automation. Thus business depend on the existence of data and it’s ability to contribute to operational excellence and control of cost and risks. This keynote will highlight:

How Return On Investment (ROI) models can be used to demonstrate the contribution of data quality and data management to business processes and automation
How different stakeholders can collaborate and communicate to agree on roles and responsibilities in a data maturity journey
How multiple companies incl. subcontractors can set up a common data quality roadmap, architecture and culture.

Dr. Andreas Metzger*

*is not able to attend due to a personal reason.

Head of Adaptive Systems and Big Data Applications,
University of Duisburg-Essen, Germany

Title: "Online Reinforcement Learning for Self-adaptive Systems"

A self-adaptive system can modify its own structure and behaviour at runtime based on its perception of the environment, of itself and of its requirements. By adapting itself at runtime, the system can maintain its requirements in the presence of dynamic environment changes. Examples are elastic cloud systems, intelligent IoT systems as well as proactive process management systems. To develop a self-adaptive system, software engineers must encode when and how the system should adapt itself. However, in doing so, software engineers face the challenge of design time uncertainty. Among other concerns, this requires anticipating the potential environment situations the system may encounter at runtime to define when the system should adapt itself. Yet, anticipating all potential environment situations is in most cases infeasible due to incomplete information at design time. As a further concern, the precise effect of an adaptation action may not be known at design time and thus accurately determining how the system should adapt itself is difficult. This talk will explore the opportunities but also challenges that modern machine learning algorithms offer in building self-adaptive systems in the presence of design time uncertainty. It will focus on online reinforcement learning as an emerging approach to realize self-adaptive systems. Online reinforcement learning means that during operation the system learns from interactions with its environment, thereby effectively leveraging data only available at run time.

Accepted Talks

Anomaly Detection in Manufacturing Time Series Data
Dimitra Politaki, Panos Protopapas and Ibad Kureshi | Show/Hide Abstract

The importance of anomaly detection in manufacturing, resulting in good quality end-products while reducing process downtime and consequently increasing efficiency, is undeniable. In this spirit, we compare the predicting power of a bouquet of (anomaly-detecting) artificial intelligence models and propose their combined use in order to (i) make use of their combined predicting power while (ii) minimizing the chance of an individual model leaves some anomaly types undetected. Specifically, we study the in-parallel performance of ARIMA, LSTM and Dense-based autoencoders, as well as GAN models on the virtual CNC milling machine in the System-level Manufacturing & Automation Tested (SMART), comparing their performance in detecting tool wear anomalies using both qualitative and quantitative criteria. This work is in-progress; in the future we aim to apply said models on actual manufacturing data provided by two large industry actors, focusing on specific user cases and accounting for well-defined data quality issues in manufacturing. Moreover, these techniques will be extended to multivariate approaches in order to serve the complex requirements of the manufacturing industry.
How the International Data Spaces´ approach for secure and trusted data sharing contributes to ensuring data quality
Sonia Jimenez | Show/Hide Abstract

The International Data Spaces Association (IDSA) is a coalition of more than 130 member companies that share a vision of a world where all companies self-determine usage rules and realize the full value of their data in secure, trusted, equal partnerships; and we are making that vision a reality.

Our goal is nothing less than a global standard for international data spaces (IDS) and interfaces, as well as fostering the related technologies and business models that will drive the data economy of the future across industries.

Data is rapidly becoming business’s most valuable asset — but it can only deliver on its full value when you put it into use and share it in ways that you, as the data provider, determine and control. International data spaces (IDS) are where this kind of trustworthy, self-determined exchange can happen, and our Reference Architecture Model (IDS-RAM) sets the standard for building data-driven ecosystems, products and services.

Because of the correlation between good data quality and maximizing the value of data as an economic good, the International Data Spaces explicitly addresses the aspect of data quality. As described in the IDS Reference Architecture Model, several mechanisms contribute to ensuring that the data being shared in an IDS ecosystem meets the expected quality requirements.
Injection molding supervision and plastic part quality assurance
Ronan Le Goff and Nils Marchal | Show/Hide Abstract

Geometrical and appearance quality requirements set the limits of the current industrial performance in injection molding. Even if Injection molding is a quite stable process, quality requirements are increasing and introduction of recycled or bio-based material need to adjust the process settings according to material variability. The presentation describes different strategies to supervise injection molding process. Dataset comes from inline industrial measurements using embedded sensors in the process and product quality control. Thermographic images, measured right after production are specially used whose relevancy to predict final part geometry is proven by Generative Adversarial Networks. Then the presentation compares prediction performances of diverse regression algorithms with a neural networks one. The pros and cons of the approaches used are discussed. Then perspectives are given.
Representative Sampling
Frank Westad and Torbjørn Pedersen | Show/Hide Abstract

Representative sampling is an important basis for acquiring high quality data. Depending on the actual type of process and the objectives with the subsequent data analysis, sampling has many facets. Design of Experiments (DoE) plays an important role in planning the most effective sampling scheme, where the optimal balance between time, effort and representative data is pursued.

In the case of providing quantitative predictions in a system, one main challenge is to acquire representative samples both in time and for the given physical dimensions of raw materials, intermediate products from various steps in the process, and final products. The quality of the reference data is imperative for deploying precise and accurate models in such environments

For condition monitoring and predictive maintenance, how to represent future (more or less known) out-of-control situations from a subset of the vast amount of data is one of the key issues. Other important aspects are strategies for sampling in continuous, semi-batch and batch processes respectively, herein optimal sampling frequency and alignment of data in the case of sensor fusion and multiblock models.

Sampling is also the basis for setting up correct schemes for model validation. The often-used approach of dividing randomly into a training and test set has shown to be suboptimal in most practical applications. The main reason is that there will for all real processes be reasons to stratify the samples due to time, batches of raw material, sensors, production lines and differences in operation of the process (the human factor). Thus. the validation scheme must be conservative for not being too optimistic in the training phase. Unfortunately, the true performance of the model can only be assessed after being deployed for a representative period of time. To rephrase a well-known proverb: “The proof of the model lies in the prediction”. Typical examples are prediction error and classification accuracy for quantitative and qualitative models, respectively. Also at this stage, representative sampling is critical for correct assessment of the model’s performance.

The presentation will give examples of the many challenges pertaining to sampling for various types of processes and the subsequent deployment of models for prediction.
The Development of Data Quality Management System for Ship IoT Data - Perspective of Ship Owner and Operator
Putu Hangga and Shogo Yamada | Show/Hide Abstract

Future ship owners and operators will use more sensors collected from the Vessel Performance Management System to measure their operations and use the data to enhance their operations. These vessels' IoT data will be extracted, loaded, transformed, and ultimately made available to end-users via a data serving pipeline. As NYK collects this data, it understands that data quality and reliability are essential, as many applications have been or will be built using this data, both internally and for customers. As data quality issues appeared frequently, the function of the data quality management system becomes of utmost necessity. Therefore, NYK and its group company MTI have developed a prototype of the NYK-DQMS (Data Quality Management System), which incorporates the business objective into a system that can be used by internal stakeholders, to assess, monitor, analyze and communicate to solve the issues while referring to various international standards and practical guidelines. The DQMS is positioned to replace ad-hoc data quality monitoring activities on 200 SIMS-equipped ships and is the backbone of NYK's digital quality assurance vision, which is already physically realized in the form of a remote diagnostic center. By introducing the DQMS to our land-based system, we hope to quickly identify anomalies in data quality indicators and shorten the response time to prevent problems from recurring. The system prototype being trialed has already identified some interesting issues and is a compelling tool for raising awareness of the importance of IoT data quality management at all stakeholder levels.
The Preliminary Results of A Systematic Review of Data Quality for CPS, IoT, or Industry 4.0 Applications
Phu Nguyen, Sagar Sen, Enrique Garcia-Ceja, Arda Goknil, Karl John Pedersen, Abdillah Suyuthi, Dimitra Politaki, Harris Niavis and Amina Ziegenbein | Show/Hide Abstract

Context: IoT, cyber-physical systems (CPS) have been developed in many industrial sectors and application domains in which the quality of the data acquired and used for decision support is a common factor. Data quality can deteriorate due to factors such as sensor faults and failures due to operating in harsh and uncertain environments. Data quality management must be taken seriously because IoT, CPS are increasingly data-centric in their approach to acquiring and processing data along the edge-fog-cloud continuum. Goals: We aim to assess the existing efforts to address the data quality issues of the data-centric IoT, CPS industrial applications. Method: We have conducted a systematic literature review (SLR) to identify 50 primary studies from thousands of relevant publications for in-depth analysis. Results: We have extracted and synthesized data from the primary studies to answer our predefined research questions. Some highlights from the results of our SLR show that data quality is often neglected even though it is very important, e.g., in data preparation for the analysis or utilisation of data sets in a machine learning framework. Many of the reviewed papers propose methods that indirectly handle data quality, although they generally covered project results, and do not necessarily reflect industry best practice in a wider sense. Data quality requirements only discussed in several papers and were predefined or used reference data, whilst quite a few of the papers applied statistical monitoring of data to discover data quality issues. The majority of the identified use cases and research are conducted in a controlled research environment rather than implemented and validated in a real industrial setting. Conclusions: Based on the results, we suggest some potential research directions to address the gaps found.

Important Dates

Presentation Abstract Submission: 20. July 2021
Notification of Acceptance: 26. July 2021
Workshop: 24. August 2021

Organization Committee

Phu Nguyen (Main Contact)

Publicity Chair

SINTEF, Norway
phu.nguyen@sintef.no

Sagar Sen (Main Contact)

Co-Program Chair

SINTEF, Norway
sagar.sen@sintef.no

Mikel Armendia
Co-General Chair
Tekniker, Spain

Odd Myklebust
Co-General Chair
SINTEF, Norway

Per Myrseth
Co-Program Chair
DNV, Norway

Beatriz Cassoli
Co-Web Chair
TU Darmstadt, Germany

Nicolas Jourdan
Co-Web Chair
TU Darmstadt, Germany

Program Committee*

Andreas Metzger, University of Duisburg-Essen, Germany
Donghwan Shin, University of Luxembourg, Luxembourg
David Lo, Singapore Management University, Singapore
Jean-Yves Tigli, Université Côte d’Azur, France
Frank Alexander Kraemer, NTNU, Norway
Hong-Linh Truong, Aalto University, Finland
Dumitru Roman, SINTEF / University of Oslo, Norway
Enrique Garcia-Ceja, SINTEF, Norway
Felix Mannhardt, KIT-AR, Germany
Dimitra Politaki, INLECOM, Greece
Amina Ziegenbein, Technische Universität Darmstadt, Germany
Flavien Peysson, PREDICT, France
Karl John Pedersen, DNV AS, Norway
Helge Spieker, Simula Research Laboratory, Norway
Dusica Marijan, Simula Research Laboratory, Norway
Marc Roper, University of Strathclyde, UK
Jan Nygård, Cancer Registry of Norway, Norway
Freddy Munoz, Compass Inc., USA
Stefano Borgia, Holonix, Italy
Hugo Bruneliere, IMT-Atlantique, France
Katinka Wolter, Free University of Berlin, Germany
Sudipto Ghosh, Colorado State University, USA
Luke Todhunter, University of Nottingham, UK
Debmalya Biswas, Darwin Digital, Switzerland

* PC members list is in an arbitrary order.

Call for Presentation Abstracts

SEA4DQ 2021 accepts the following types of contributions:

Presentation Abstracts that will be around 250-500 words long (and presentation files/short papers if possible) and authors will give presentations at the workshop about (i) research results that are either already published or early research results not yet published; and (ii) industrial talks. This new track aims at stimulating the participation of industrial practitioners - who will be able to present the practices used in their contexts - as well as researchers - who may be interested in receiving feedback from the research community on early ideas.

Please submit the abstracts (and if possible, papers/presentation files in PDF format) on EasyChair.

The abstract will only be reviewed by the program committee for relevance, and will not be included in the SEA4DQ proceedings, but the abstracts (and presentations) will be made available on the website of the workshop.

The SEA4DQ 2021 Workshop is sponsored by the research projects InterQ and DAT4.Zero that are funded by the European Union’s Horizon 2020 Research and Innovation programme.