SEA4DQ Workshop 2024

About SEA4DQ

Modern software systems are centered on data, using data on an increasing scale and in novel and intelligent ways. Key drivers for increased data availability include the Internet of Things (IoT), data-sharing platforms, as well as open data portals. Data quality is crucial, as the data acquired and used by modern software systems strongly impacts on the reliability, robustness, efficiency, and trustworthiness of these systems.

How can software engineering and artificial intelligence (AI) help manage and tame data quality issues?

This is the question we aim to investigate in the workshop SEA4DQ. The SEA4DQ 2024 workshop is the fourth workshop of the series and provides a venue for researchers and practitioners to exchange and discuss trending views, ideas, state-of-the-art, work in progress, and scientific results highlighting aspects of software engineering and AI to address the problem of data quality in modern systems.

Tweets by sea4dq

Topics of Interest

Advancing Requirements Engineering for Optimal Data Quality
Architectural Frameworks in Software for Enhanced Data Quality Management
AI/LLM and Software Strategies for Data Ingestion and Acquisition
AI/LLM-Driven Approaches for Data Pre-processing and Cleaning
Software Tools for Data Quality Testing and Profiling
Quantitative Measures of Data Quality
Evaluating Data Quality Techniques: Case Studies on Real-World Systems
Balancing Data Quality and Security: Understanding the Trade-offs
Secure Data Sharing: Methods for Trust and Integrity
Standardization and Certification Processes in Data Quality
Data Engineering for AI/LLM-based Systems

Call for Papers

SEA4DQ 2024 workshop has the following tracks calling for contributions:

Research Track: (max 10 pages) Papers describing original research, including novel approaches, tools, datasets, and studies related to the workshop topics.
Ideas, Visions and Reflections Track: (max 4 pages) Papers that aim to disrupt the status quo in our discipline with radical, innovative, thought-provoking new ideas, and research directions, as well as lessons learned from the past.
Industry Track: (max 4 pages) Papers that discuss industry challenges and lessons learned from practice.

For all contributions, the SEA4DQ 2024 workshop will accompany the presentation of the paper with a 10- or 20-minute discussion session. The discussion session will actively involve the audience and will be moderated by the session chair. The session chair will prepare a set of guiding questions for the presenters and help the audience engage in the discussion. The goal of the discussion session is to foster the exchange of ideas and provide feedback to the authors on their work.

Contribution Type	Paper [page limit]	Presentation [minutes]	Discussion [minutes]
Research Track	10	20	10
Ideas, Visions and Reflections Track	4	10	20
Industry Track	4	15	10

All submissions must be in English and in PDF format. Submission Format: Follow the guideline on "How to Submit" at FSE 2024 website. Note that SEA4DQ employs a Single-Anonymous peer review process. Therefore, authors are not required to conceal their identity in the submission.

Each submission will be reviewed by at least three members of the program committee. Following the review submission, there will be an online discussion period. The PC chairs together with the general chair will make the final decision on acceptance. Submitted papers must not have been previously published and must not be under review or submission for review anywhere at the time of submission.

The accepted papers will be published in the workshop's proceedings (will be proposed for publication in the ACM digital library). As a published ACM author, you and your co-authors are subject to all ACM Publications Policies, including ACM's new Publications Policy on Research Involving Human Participants and Subjects. At least one author of each accepted paper must register and present the paper in person at SEA4DQ 2024 to have the paper appear in the FSE companion proceedings.

Special Issue

All papers accepted at SEA4DQ 2024 will be invited to be revised and extended for consideration in a special issue of the Springer Empirical Software Engineering Journal (EMSE). Please note that the extended papers will be subject to a new review process. The official call for papers for the special issue is available at https://emsejournal.github.io/special_issues/2024_SI_SEA4DQ.html.

Keynote

Prof. Denys Poshyvanyk

Chancellor Professor, IEEE Fellow
William & Mary, Williamsburg, VA, USA

Towards an Interpretable Science of Deep Learning for Software Engineering:
A Causal Inference View

Neural Code Models (NCMs) are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions or whether practitioners trust NCMs' outcomes. In this talk, I will introduce doCode, a post hoc interpretability framework specific to NCMs that can explain model predictions. doCode is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of doCode are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. doCode can generate causal explanations based on Abstract Syntax Tree information and software engineering-based interventions. To demonstrate the practical benefit of doCode, I will present empirical results of using doCode for detecting confounding bias in NCMs.

Show/Hide Bio

Denys Poshyvanyk is a Chancellor Professor and a Graduate Director in the Computer Science Department at William & Mary. He currently serves as a Guest Editor-in-Chief of the AI-SE Continuous Special Section at the ACM Transactions on Software Engineering and Methodology (TOSEM) and a Program Co-Chair for FSE'25. He is a recipient of multiple ACM SIGSOFT Distinguished paper awards, the NSF CAREER award (2013). He is an IEEE Fellow and an ACM distinguished member.

Dr. Qinghua Lu

Principal Research Scientist
CSIRO's Data61, Australia

Responsible AI Engineering from A Data Perspective

The rapid advancements in AI, particularly with the emergence of large language models (LLMs) and their diverse applications, have attracted huge global interest and raised significant concerns on responsible AI and AI safety. While LLMs are impressive examples of AI models, it is the compound AI systems, which integrate these models with other key components for functionality and quality/risk control, that are ultimately deployed and have real-world impact. These AI systems, especially autonomous LLM agents and those involving multi-agent interacting, require system-level engineering to ensure responsible AI and AI safety. On the other hand, data is the lifeblood of AI systems, cross-cutting different components in AI systems. There are various challenges associated with the data collected, used, and generated by AI systems, as well as their engineering processes. In this talk, I will introduce a responsible AI engineering approach to address system-level responsible AI challenges. This includes engineering/governance methods, practices, tools, and platforms to ensure responsible AI and AI safety. Specially, I will focus on how the responsible AI engineering approach tackles data challenges within the context of responsible AI.

Show/Hide Bio

Dr. Qinghua Lu is a principal research scientist and leads the Responsible AI science team at CSIRO's Data61. She is the winner of the 2023 APAC Women in AI Trailblazer Award and is part of the OECD.AI’s trustworthy AI metrics project team. She received her PhD from University of New South Wales in 2013. Her current research interests include responsible AI, software engineering for AI, and software architecture. She has published 150+ papers in premier international journals and conferences. Her recent paper titled "Towards a Roadmap on Software Engineering for Responsible AI" received the ACM Distinguished Paper Award. Her new book, “Responsible AI: Best Practices for Creating Trustworthy AI Systems”, was published by Pearson Addison-Wesley in December 2023.

Schedule

July 15, 2024 - All times are in Porto de Galinhas local time (GMT-3).
SEA4DQ 2024 will be held in parallel with the PROMISE 2024 workshop.

Start - End	Topic	Presenters
09:00 - 09:05	Opening	Organization Committee
09:05 - 10:00	SEA4DQ 2024 Keynote: Towards an Interpretable Science of Deep Learning for Software Engineering: A Causal Inference View	Prof. Denys Poshyvanyk
10:00 - 10.15	PROMISE 2024 Paper Presentation: Graph Neural Network vs. Large Language Model: A Comparative Analysis for Bug Report Priority and Severity Prediction
10:15 - 10.30	SEA4DQ 2024 Paper Presentation: A Hitchhiker's Guide to Jailbreaking ChatGPT via Prompt Engineering
10:30 - 11:00	Break
11:00 - 12:00	PROMISE 2024 Keynote: The Ever-Evolving Promises of Data in Software Ecosystems: Models, AI, and Analytics	Prof. Raula Gaikovina Kula
12:00 - 12:15	PROMISE 2024 Paper Presentation: Smarter Project Selection For Software Engineering Research
12:15 - 12:30	SEA4DQ 2024 Paper Presentation: Evaluating the Quality of Open Source Ansible Playbooks: An Executability Perspective
12:30 - 14:00	Lunch Break
14:00 - 15:00	SEA4DQ 2024 Keynote: Responsible AI Engineering from A Data Perspective	Dr. Qinghua Lu
15:00 - 15:15	PROMISE 2024 Paper Presentation: Sociotechnical Dynamics in Open Source Smart Contract Repositories: An Exploratory Data Analysis of Curated High Market Value Projects
15:15 - 15:30	PROMISE 2024 Paper Presentation: A Curated Solidity Smart Contracts Repository of Metrics and Vulnerability
15:30 - 16:00	Break
16:00 - 16:15	PROMISE 2024 Paper Presentation: MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery
16:15 - 16:30	SEA4DQ 2024 Paper Presentation: A Pilot Study in Surveying Data Challenges of Automatic Software Engineering Tasks
16:30 - 16:45	PROMISE 2024 Paper Presentation: Prioritising GitHub Priority Labels
16:45 - 17:00	PROMISE 2024 Paper Presentation: Predicting Fairness of ML Software Configurations
17:00 - 17:05	Closing	Organization Committee

Organization Committee

Tim Menzies (Main Contact)
General Co-Chair
North Carolina State University, USA
tjmenzie@ncsu.edu

Bowen Xu (Main Contact)
General Co-Chair
North Carolina State University, USA
bxu22@ncsu.edu

Hong Jin Kang
Program Co-Chair
University of California, Los Angeles, USA

Jie M. Zhang
Program Co-Chair
King's College London, UK

Jiri Gesi
Industrial Co-Chair
Amazon Science, USA

Sagar Sen
Industrial Presentation Co-Chair
SINTEF, Norway

Beatriz Cassoli
Industrial Presentation Co-Chair
TU Darmstadt, Germany

Nicolas Jourdan
Web Co-Chair
TU Darmstadt, Germany

Jieke Shi
Web Co-Chair
Singapore Management University, Singapore

Phu Nguyen
Publicity Co-Chair
SINTEF, Norway

Valentina Golendukhina
Publicity Co-Chair
University of Innsbruck, Austria

Program Committee*

Ezequiel Scott, University of Tartu, Estonia
Fabian Gilson, University of Canterbury, New Zealand
Helena Holmström Olsson, Malmö University, Sweden
Heng Li, Polytechnique Montréal, Canada
Jan Nygård, Cancer Registry of Norway, NIPH, Norway
Katinka Wolter, Freie Universität Berlin, Germany
Lei Ma, University of Tokyo, Japan & University of Alberta, Canada
Mehdi Mirakhorli, University of Hawaii at Manoa, USA
Mohamed Soliman, Paderborn University, Germany
Sami Hyrynsalmi, LUT University, Finland
Sudipto Ghosh, Colorado State University, USA

* PC members list is in alphabetical order.

Topics of Interest

Special Issue

Prof. Denys Poshyvanyk

Chancellor Professor, IEEE Fellow William & Mary, Williamsburg, VA, USA

Towards an Interpretable Science of Deep Learning for Software Engineering:A Causal Inference View