-
Workshop description:
The bottleneck of modern computational approaches to visual image analysis is the inability of many techniques to independently detect when they are making incorrect or overconfident predictions. This issue, which falls under the general heading of Reliable AI, has been addressed by various approaches that either model the uncertainties that pollute the problem (e.g., Bayesian Neural Networks, Deep Gaussian Processes, etc.), or provide tools to gain insight into the model’s decision-making process (e.g., GradCam, explainable AI, prototypical networks, etc.).
Although many of these techniques are Bayesian, others of a non-probabilistic nature have recently emerged because of their intuitiveness or ease of use, such as topological uncertainty.
Uncertainty-aware techniques allow users not only to classify the prediction as reliable or unreliable (require careful handling and/or human intervention), but also to detect the so-called out-of-distribution elements (i.e., inputs that do not belong to the estimated distribution of the training set and thus provide unrealistic predictions).
Following the importance and actuality of uncertainty-aware techniques, the workshop will focus on recent advances and modern applications of reliable AI in image analysis and high-stakes domains.
MAIN CONFERENCE:
ICIAP 2025
https://sites.google.com/view/iciap25
WHERE:
Department of Computer Science, Sapienza University of Rome (Viale Regina Elena 295, 00185 Rome)
WHEN:
16 September 2025 (9-13)
REGISTRATION:
https://sites.google.com/view/iciap25/registration
CONTACT:
claudia.caudai@isti.cnr.it
giulio.delcorso@isti.cnr.it
-
Workshop Program:
The RelAI workshop has been coupled with TRUEGEN-VIS 2025 (2nd International Workshop on Trustworthy Generative AI for Image and Video Synthesis: Challenges, Ethics, and Applications; https://sites.google.com/view/truegen-vis-2025/home). The following program describes contributions both from RelAI and from TRUEGEN-VIS.
[9.05-9.35] Prof. Ercan E. Kuruoglu (Invited)
Uncertainty Quantification with Noise Injection in Neural NetworksModel uncertainty quantification involves measuring and evaluating the uncertainty linked to a model’s predictions, helping assess their reliability and confidence. Noise injection is a technique used to enhance the robustness of neural networks by introducing randomness. We establish a connection between noise injection and uncertainty quantification from a Bayesian standpoint. We demonstrate via analytical derivation that injecting noise into the weights of a neural network is equivalent to Bayesian inference on a deep Gaussian process. Consequently, we introduce a Monte Carlo Noise Injection (MCNI) method, which involves injecting noise into the parameters during training and performing multiple forward propagations during inference to estimate the uncertainty of the prediction. Through simulation and experiments on regression and classification tasks, our method demonstrates superior performance compared to the MC Dropout and other baseline methods.To-be-announced
[9.35-10.00] Oscar Papini:(RelAI)
Remind me of something? Zero-shot learning for trustworthy image comparison in rolling stockABSTRACT: This paper discusses the need for trustworthy AI in urban mobility, focusing on high-stakes security applications such as anomaly detection in public transportation. Because the accuracy required to identify potentially dangerous objects often surpasses the capabilities of current models, there is an unavoidable incidence of false positives. We suggest a “learning to defer” approach as a solution. Our technique uses the deep features and label relative importance of a pre-trained classifier (DenseNet/ImageNET-1k) to create a unique item “fingerprint”. We then employ a zero-shot meta-learning approach to calibrate the system, enabling it to distinguish between normal background items and genuine anomalies by assigning a similarity score. This method significantly reduces the false “new object” alarms that would otherwise overwhelm human operators. Our proof-of-concept demonstrates that the system is computationally light and can be easily adapted to specific environments and integrated into existing classification modules.
[10.00-10.25] Giada Anastasi:(RelAI)
Trustworthy Segmentation in Digital Breast Tomosynthesis: A Preliminary Study on Uncertainty-aware Attention UNet EnsemblesABSTRACT: Breast cancer is the leading cause of cancer-related mortality in women worldwide (2.3 million new cases with over 600,000 deaths in 2022). While accurate segmentation of radiological images is crucial for early diagnosis, real-world deployment also requires knowing when a model’s prediction can be trusted. This preliminary study explores the integration of trustworthiness into lesion segmentation for 3D Digital Breast Tomosynthesis (DBT), using an ensemble of Attention-UNet models to estimate pixel-wise reliability and generate interpretable confidence maps. An unprecedented dataset of annotated DBT (81 women, 86 lesions, 2970 images) is used to train an attention-based U-net model on 2D slices, using 5-fold cross-validation and stratified patient splits. To model predictive uncertainty, an ensemble of five independently trained networks is introduced, aggregating predictions through the pixel-wise median and computing standard deviation as a proxy for reliability. This enables the segmentation to be partitioned into high- and low-confidence zones. The Attention U-Net presents a valuable performance (74.1% Dice Score) and a high degree of precision (85.1%). Reliability maps reveal structured uncertainty, primarily at lesion boundaries, enabling confidence-based filtering. Notably, segmentation accuracy remains stable even for small lesions. Attention-based models can be a valuable addition to the semi-supervised segmentation process in DBT. This work presents a proof of concept for incorporating reliability into deep learning segmentation pipelines. Ensemble-based confidence estimation improves interpretability and allows clinicians to identify both accurate and uncertain regions. These insights are crucial for the clinical translation of AI tools in breast imaging.
[10.25-10.50]Hang Sun:(RelAI)
Bayesian Attention Fusion for Multimodal Multiple Sclerosis PredictionABSTRACT: Multimodal deep learning is a powerful tool for predicting disability progression in Multiple Sclerosis (MS), but mainstream fusion layers still assign deterministic attention weights—failing to account for modality-specific noise, artefacts or outright absence. We introduce Bayesian Attention Fusion(BAF), a light-weight variational layer that places a Gaussian distribution over inter-modality logits, learns quality-aware priors, and propagates uncertainty to the out- put via Monte-Carlo sampling. The design unifies the key topics of our Bayesian Learning course—variational Bayes, KL-annealing, contextual priors and predictive risk decomposition—within a clinically meaningful application. Evaluated on a real-world registry of 300 patients and 4,208 visits, BAF improves the AUROC from 0.84 to 0.87 (+3pp), boosts AUPRC from 0.80 to 0.83, and halves the Expected Calibration Error (0.048 → 0.028) relative to the deterministic multimodal baseline of Zhanget al. Robustness tests show only a 0.02 AUROC drop under heavy MRI noise, versus 0.07 for the baseline, and an OOD-AUROC of 0.86 on patch-shuffled scans. Extensive ablations, reliability diagrams and variance heat-maps demonstrate that every architectural choice—quality-aware prior, temperature scaling, KL schedule—contributes to better calibrated and more interpretable predictions.
[11.15-11.40] Giuseppe Mazzola:(TRUEGEN-VIS)
Benchmarking Multiclass Attribution of AI-Generated ImagesThe proliferation of AI-generated images has raised the need for reliable attribution methods capable of identifying not only whether an image is synthetic, but also which generative model produced it. This paper explores whether standard CNN architectures. without model-specific adaptations. can address the multiclass attribution task using only image-level content. We propose a benchmark including four popular neural architectures (ResNet-50, InceptionV3, DenseNet121, EfficientNetB0), tested on AGIQA-3K, a challenging dataset originally designed for image quality assessment. Two experimental setups are considered: a fine-grained 10-class configuration and a simplified 6-class version. We also examine how perceived quality affects attribution, showing that stylistic consistency plays a key role. Our results establish a simple, scalable benchmark to support future forensic research in AI-generated image attribution.
[11.40-12.05] Erica Perseghin:(TRUEGEN-VIS)
GenAI Paradox: Balancing Creative Innovation with Authenticity, Ethics and Legal ComplianceBalancing Creative Innovation with Authenticity, Ethics and Legal Compliance: Despite ongoing debate about the potential effects of the adoption of generative artificial intelligence (GenAI) in modern societies, there is still a lack of clarity on issues such as copyright and authorship of AI-generated content. This paper analyzes the implications of GenAI with particular attention to authenticity, ethics, and legal compliance, and outlines directions for future research. Although the use of GenAI as a collaborative tool is already widespread, it requires humans to develop appropriate skills for responsible use. The study also examines ethical and regulatory concerns related to visual content protection and presents an end-to-end lifecycle model as a framework for managing AI-generated materials in a trustworthy manner.
[12.05-12.30] Salvatore Capuozzo:(TRUEGEN-VIS)
Stable Diffusion for Imbalanced Detection Datasets: a Trustworthy Approach to Generate Guided Synthetic Biomedical Image SamplesThe emergence of generative models has enabled the creation of synthetic data to augment existing datasets. Although this practice is generally considered safe, it can pose significant risks in high-stakes domains such as biomedicine. Artificial Intelligence (AI) systems trained on synthetic datasets, particularly those generated without rigorous safeguards, can inadvertently contribute to incorrect diagnoses or clinical decisions involving patients and animals. To mitigate such risks, the European Union (EU) has introduced the Ethics Guidelines for Trustworthy AI, emphasizing that trustworthy AI must be lawful, ethical, and robust. Consequently, the datasets used to train such models must also be reliable and well-validated. In this context, we propose a standardized framework for the generation and validation of synthetic datasets for object detection in the biomedical domain. The proposed methodology is structured into two primary pipelines: one dedicated to data generation using Stable Diffusion (SD) and harmonization models, and the other focused on validation through likelihood scoring, detection models, and structured checklists. Based on the results of experiments conducted within a microscopy-specific use case, our results support the effectiveness of this approach as a reliable solution to augment imbalanced datasets in accordance with EU regulatory principles.
Presentations will be allocated in 25-minute slots:
– 18 minutes for the talk (in English)
– 7 minutes for Q&A
Technical Program Committee
- Claudia Caudai
- Sara Colantonio
- Giulio Del Corso
- Davide Moroni
- Fabrizio Ruggeri
- Giacomo Ignesti
- Oscar Papini
Workshop Topics:
- Reliable AI in visual tasks:
The reliability of ML models is often conveyed through the use of a variety of additional techniques that allow the prediction to be associated with an estimate of confidence and robustness, or provide insight into the mechanism governing the model’s decision process. These estimates, when properly integrated, expand the possibilities for use in high-stakes domains (such as autonomous driving or biomedical signal analysis) in which low margins of error are acceptable. - Out-of-distribution detection:
ML and DL models are typically unable to detect predictions made outside the training parameters. Detecting elements out-of-(training)-distribution is therefore a critical tool in high-stakes applications to increase confidence in ML models and to determine when human intervention is required. - Uncertainty quantification methods and applications:
Many modern DL techniques are based on classical uncertainty quantification methods that aim to integrate uncertainties directly into the model (intrusive methods) or to produce probabilistic/non-probabilistic estimates of model reliability. Many classical techniques have already been adapted to novel image analysis DL frameworks (such as Deep Gaussian Processes), but the modern literature still lacks a modern counterpart, opening the way to interesting research topics. - Efficient Reliability Estimates:
The computational cost and implementation difficulties of many uncertainty-aware AI techniques make their use difficult. Particularly in visual analysis, the high computational cost makes the use of post-hoc and surrogate techniques, which require reduced implementation effort and allow efficient computation of these scores, an increasingly relevant topic. - Open source reliability software: Modern techniques for reliability estimation and out-of-distribution detection, particularly for image analysis, are not available as easy-to-use open source software, limiting the potential scientific output. Contributions that provide software that is easily applicable and transferable to other application domains are particularly welcome.
- Learning to defer (L2D) in imaging:
One of the main applications of reliable AI is the ability to correctly identify the unreliable results and defer the choice to a secondary model or to human intervention. Both Bayesian approaches and non probabilistic methods share the ability to provide multiple alternatives to the experimenter and can be adapted to different image processing domains. - Application to biomedical imaging:
Among the high-stakes domains, biomedical imaging requires models capable of correctly detecting OOD elements and identifying those cases that require medical intervention. - Semantic Segmentation Reliability:
Uncertainty-aware methods can also be useful for segmentation tasks with blurred and/or low-contrast contours. In addition, many approaches can integrate multiple labels to understand a segmentation that incorporates a reliability score for the more complex zones. - Object Detection Under Uncertainty:
In many applications of image processing for anomaly detection, it is critical that the detection report only occurs when a certain level of confidence has been reached. Examples of this include the use of machine learning methods for video surveillance, where human intervention is only required when the method is actually confident of what has been detected. - Uncertainty in Domain Adaptation and Generalization:
When a network is used outside the domains for which it was originally trained, as in the case of domain adaptation, the presence of reliability scores makes it possible to assess the extent to which the domains are different and whether any fine-tuning carried out has produced satisfactory results. - Adversarial Robustness:
One of the derivatives of reliable AI is the ability to withstand data perturbations, or at least to identify when minimal changes can alter the outcome in unpredictable ways. This applies in particular to the use of adversarial methods, which aim to identify these network vulnerabilities in order to provide more stable solutions instead. - Variational approaches to handle Imbalanced Datasets:
Variational and uncertainty-aware methods offer several strategies for dealing with small and/or unbalanced data sets. This is particularly relevant for certain application domains (such as medical imaging) that are characterized by a high degree of imbalance and heterogeneity.