About SEA4DQ

Modern software systems are centered on data, using data on an increasing scale and in novel and intelligent ways. Key drivers for increased data availability include the Internet of Things (IoT), data-sharing platforms, as well as open data portals. Data quality is crucial, as the data acquired and used by modern software systems strongly impacts on the reliability, robustness, efficiency, and trustworthiness of these systems.

How can software engineering and artificial intelligence (AI) help manage and tame data quality issues?

This is the question we aim to investigate in the workshop SEA4DQ. The SEA4DQ 2024 workshop is the fourth workshop of the series and provides a venue for researchers and practitioners to exchange and discuss trending views, ideas, state-of-the-art, work in progress, and scientific results highlighting aspects of software engineering and AI to address the problem of data quality in modern systems.

Topics of Interest

  • Advancing Requirements Engineering for Optimal Data Quality
  • Architectural Frameworks in Software for Enhanced Data Quality Management
  • AI/LLM and Software Strategies for Data Ingestion and Acquisition
  • AI/LLM-Driven Approaches for Data Pre-processing and Cleaning
  • Software Tools for Data Quality Testing and Profiling
  • Quantitative Measures of Data Quality
  • Evaluating Data Quality Techniques: Case Studies on Real-World Systems
  • Balancing Data Quality and Security: Understanding the Trade-offs
  • Secure Data Sharing: Methods for Trust and Integrity
  • Standardization and Certification Processes in Data Quality
  • Data Engineering for AI/LLM-based Systems

Call for Papers

SEA4DQ 2024 workshop has the following tracks calling for contributions:

  • Research Track: (max 10 pages) Papers describing original research, including novel approaches, tools, datasets, and studies related to the workshop topics.
  • Ideas, Visions and Reflections Track: (max 4 pages) Papers that aim to disrupt the status quo in our discipline with radical, innovative, thought-provoking new ideas, and research directions, as well as lessons learned from the past.
  • Industry Track: (max 4 pages) Papers that discuss industry challenges and lessons learned from practice.

For all contributions, the SEA4DQ 2024 workshop will accompany the presentation of the paper with a 10- or 20-minute discussion session. The discussion session will actively involve the audience and will be moderated by the session chair. The session chair will prepare a set of guiding questions for the presenters and help the audience engage in the discussion. The goal of the discussion session is to foster the exchange of ideas and provide feedback to the authors on their work.

Contribution Type Paper [page limit] Presentation [minutes] Discussion [minutes]
Research Track 10 20 10
Ideas, Visions and Reflections Track 4 10 20
Industry Track 4 15 10

All submissions must be in English and in PDF format. Submission Format: Follow the guideline on "How to Submit" at FSE 2024 website. Note that SEA4DQ employs a Single-Anonymous peer review process. Therefore, authors are not required to conceal their identity in the submission.

Each submission will be reviewed by at least three members of the program committee. Following the review submission, there will be an online discussion period. The PC chairs together with the general chair will make the final decision on acceptance. Submitted papers must not have been previously published and must not be under review or submission for review anywhere at the time of submission.

The accepted papers will be published in the workshop's proceedings (will be proposed for publication in the ACM digital library). As a published ACM author, you and your co-authors are subject to all ACM Publications Policies, including ACM's new Publications Policy on Research Involving Human Participants and Subjects. At least one author of each accepted paper must register and present the paper in person at SEA4DQ 2024 to have the paper appear in the FSE companion proceedings.

Special Issue

All papers accepted at SEA4DQ 2024 will be invited to be revised and extended for consideration in a special issue of the Springer Empirical Software Engineering Journal (EMSE). Please note that the extended papers will be subject to a new review process. The official call for papers for the special issue is available at https://emsejournal.github.io/special_issues/2024_SI_SEA4DQ.html.

Keynote



...

Chancellor Professor, IEEE Fellow
William & Mary, Williamsburg, VA, USA

Towards an Interpretable Science of Deep Learning for Software Engineering:
A Causal Inference View

Neural Code Models (NCMs) are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions or whether practitioners trust NCMs' outcomes. In this talk, I will introduce doCode, a post hoc interpretability framework specific to NCMs that can explain model predictions. doCode is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of doCode are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. doCode can generate causal explanations based on Abstract Syntax Tree information and software engineering-based interventions. To demonstrate the practical benefit of doCode, I will present empirical results of using doCode for detecting confounding bias in NCMs.

Denys Poshyvanyk is a Chancellor Professor and a Graduate Director in the Computer Science Department at William & Mary. He currently serves as a Guest Editor-in-Chief of the AI-SE Continuous Special Section at the ACM Transactions on Software Engineering and Methodology (TOSEM) and a Program Co-Chair for FSE'25. He is a recipient of multiple ACM SIGSOFT Distinguished paper awards, the NSF CAREER award (2013). He is an IEEE Fellow and an ACM distinguished member.



...

Principal Research Scientist
CSIRO's Data61, Australia

Responsible AI Engineering from A Data Perspective

The rapid advancements in AI, particularly with the emergence of large language models (LLMs) and their diverse applications, have attracted huge global interest and raised significant concerns on responsible AI and AI safety. While LLMs are impressive examples of AI models, it is the compound AI systems, which integrate these models with other key components for functionality and quality/risk control, that are ultimately deployed and have real-world impact. These AI systems, especially autonomous LLM agents and those involving multi-agent interacting, require system-level engineering to ensure responsible AI and AI safety. On the other hand, data is the lifeblood of AI systems, cross-cutting different components in AI systems. There are various challenges associated with the data collected, used, and generated by AI systems, as well as their engineering processes. In this talk, I will introduce a responsible AI engineering approach to address system-level responsible AI challenges. This includes engineering/governance methods, practices, tools, and platforms to ensure responsible AI and AI safety. Specially, I will focus on how the responsible AI engineering approach tackles data challenges within the context of responsible AI.

Dr. Qinghua Lu is a principal research scientist and leads the Responsible AI science team at CSIRO's Data61. She is the winner of the 2023 APAC Women in AI Trailblazer Award and is part of the OECD.AI’s trustworthy AI metrics project team. She received her PhD from University of New South Wales in 2013. Her current research interests include responsible AI, software engineering for AI, and software architecture. She has published 150+ papers in premier international journals and conferences. Her recent paper titled "Towards a Roadmap on Software Engineering for Responsible AI" received the ACM Distinguished Paper Award. Her new book, “Responsible AI: Best Practices for Creating Trustworthy AI Systems”, was published by Pearson Addison-Wesley in December 2023.

Schedule

July 15, 2024 - All times are in Porto de Galinhas local time (GMT-3).
SEA4DQ 2024 will be held in parallel with the PROMISE 2024 workshop.

Start - End Topic Presenters
09:00 - 09:05 Opening Organization Committee
09:05 - 10:00 SEA4DQ 2024 Keynote: Towards an Interpretable Science of Deep Learning for Software Engineering: A Causal Inference View Prof. Denys Poshyvanyk
10:00 - 10.15 PROMISE 2024 Paper Presentation: Graph Neural Network vs. Large Language Model: A Comparative Analysis for Bug Report Priority and Severity Prediction
10:15 - 10.30 SEA4DQ 2024 Paper Presentation: A Hitchhiker's Guide to Jailbreaking ChatGPT via Prompt Engineering
10:30 - 11:00 Break
11:00 - 12:00 PROMISE 2024 Keynote: The Ever-Evolving Promises of Data in Software Ecosystems: Models, AI, and Analytics Prof. Raula Gaikovina Kula
12:00 - 12:15 PROMISE 2024 Paper Presentation: Smarter Project Selection For Software Engineering Research
12:15 - 12:30 SEA4DQ 2024 Paper Presentation: Evaluating the Quality of Open Source Ansible Playbooks: An Executability Perspective
12:30 - 14:00 Lunch Break
14:00 - 15:00 SEA4DQ 2024 Keynote: Responsible AI Engineering from A Data Perspective Dr. Qinghua Lu
15:00 - 15:15 PROMISE 2024 Paper Presentation: Sociotechnical Dynamics in Open Source Smart Contract Repositories: An Exploratory Data Analysis of Curated High Market Value Projects
15:15 - 15:30 PROMISE 2024 Paper Presentation: A Curated Solidity Smart Contracts Repository of Metrics and Vulnerability
15:30 - 16:00 Break
16:00 - 16:15 PROMISE 2024 Paper Presentation: MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery
16:15 - 16:30 SEA4DQ 2024 Paper Presentation: A Pilot Study in Surveying Data Challenges of Automatic Software Engineering Tasks
16:30 - 16:45 PROMISE 2024 Paper Presentation: Prioritising GitHub Priority Labels
16:45 - 17:00 PROMISE 2024 Paper Presentation: Predicting Fairness of ML Software Configurations
17:00 - 17:05 Closing Organization Committee

Organization Committee

...
Tim Menzies (Main Contact)
General Co-Chair
North Carolina State University, USA
tjmenzie@ncsu.edu
...
Bowen Xu (Main Contact)
General Co-Chair
North Carolina State University, USA
bxu22@ncsu.edu
...
Hong Jin Kang
Program Co-Chair
University of California, Los Angeles, USA
...
Jie M. Zhang
Program Co-Chair
King's College London, UK

...
Jiri Gesi
Industrial Co-Chair
Amazon Science, USA
...
Sagar Sen
Industrial Presentation Co-Chair
SINTEF, Norway
...
Beatriz Cassoli
Industrial Presentation Co-Chair
TU Darmstadt, Germany
...
Nicolas Jourdan
Web Co-Chair
TU Darmstadt, Germany
...
Jieke Shi
Web Co-Chair
Singapore Management University, Singapore
...
Phu Nguyen
Publicity Co-Chair
SINTEF, Norway
...
Valentina Golendukhina
Publicity Co-Chair
University of Innsbruck, Austria


Program Committee*

  • Ezequiel Scott, University of Tartu, Estonia
  • Fabian Gilson, University of Canterbury, New Zealand
  • Helena Holmström Olsson, Malmö University, Sweden
  • Heng Li, Polytechnique Montréal, Canada
  • Jan Nygård, Cancer Registry of Norway, NIPH, Norway
  • Katinka Wolter, Freie Universität Berlin, Germany
  • Lei Ma, University of Tokyo, Japan & University of Alberta, Canada
  • Mehdi Mirakhorli, University of Hawaii at Manoa, USA
  • Mohamed Soliman, Paderborn University, Germany
  • Sami Hyrynsalmi, LUT University, Finland
  • Sudipto Ghosh, Colorado State University, USA
* PC members list is in alphabetical order.