SyntIA - Efficient Utility and Privacy Estimations for Synthetic Data
In the era of big data and artificial intelligence, the demand for high-quality data has never been greater. However, obtaining relevant real-world data often presents significant challenges, including privacy concerns, data scarcity, and high acquisition costs. Synthetic data emerges as a potentially powerful solution to these issues.
The research project SyntIA aims to develop technological solutions for assessing synthetic data with regards to its levels of utility and privacy protection before it is used in complex and expensive processes like big data analytics or the training of AI models. Accurate and efficiently computable estimation measures for both utility and privacy, taking into account the specific characteristics of the dataset under assessment, will be developed to enable quantitative evaluations of both properties early after data generation. We will furthermore develop techniques to optimize the generation process of synthetic data based on these estimation measures to reduce the costs and impact on sustainability associated with the generation itself.
The results of SyntIA are expected to help generate high-quality synthetic data more quickly, efficiently, and sustainably. They will therefore benefit all businesses and organizations that may profit from using synthetic data to accelerate their adoption of data-driven technologies and, hence, to innovate more quickly with AI- and ML-based solutions.
The project is carried out in cooperation with CGI and ICA Gruppen.