Automating Data Governance with Generative AI
Organizations now manage complex data systems, and meeting privacy laws such as the GDPR has become both vital and expensive. We examined how large language models (LLMs) can support data governance by generating warnings about data access decisions in decentralized data systems.
In the past, data governance relied on checklists, spreadsheets, or domain-specific languages for access control. With advances in natural language processing, data marketplaces within decentralized systems can now help decision-makers manage data sharing more effectively.
Laws such as the EU General Data Protection Regulation, the California Consumer Privacy Act, and the EU AI Act require strict control over personal and sensitive data. Organizations must balance innovation with compliance under these rules.
Data Marketplaces and AI-Assisted Data Governance
In data mesh architectures, domain teams manage their own data products within a federated environment. Each product defines its usage rules and guarantees through a data contract, which states what data can be shared and under what conditions. To coordinate these contracts, an enterprise data marketplace records, reviews, and approves or rejects access requests between data products.
At the center of this marketplace is Governance AI, an LLM-powered tool that checks whether a data access request complies with the provider’s data contract, company policies, and legal requirements such as the GDPR. Governance AI does not make final decisions. Instead, it issues structured warnings and suggestions for correction to guide human experts.
Generating Realistic Testing Datasets
Testing compliance systems is difficult. Real-world datasets contain confidential information, and manually creating synthetic test cases takes time. To evaluate the system, we used LLMs to generate realistic data access requests, building on the work of Herdel et al. (2024). Each test case included metadata about data products, privacy policies, and data contracts to create plausible access scenarios. For example, a marketing team requesting customer purchase data for campaign optimization.
Domain experts from the insurance and e-commerce sectors reviewed the generated requests. Most were judged realistic, and some closely matched real-world use cases; the rest were discarded.
Evaluating Computational Governance
In the evaluation, Governance AI was compared with domain experts who assessed 110 access requests across both sectors.
Governance AI issued 3.6 times more warnings than human experts. It did not miss any case where experts raised a compliance concern. After a secondary review, experts judged 80% of the AI’s warnings to be correct.
This cautious approach may slow some data-sharing workflows, but it supports a key compliance principle: prevent breaches first, optimize later. The AI also offered actionable suggestions, which proved more useful in e-commerce than in highly regulated fields such as insurance.
Findings and Implications
The study highlights several insights for organizations pursuing AI-assisted data governance. Governance AI can handle complex policy reasoning without missing critical cases, showing that automated assistance is both feasible and effective. Synthetic access requests generated by LLMs can realistically simulate real-world governance scenarios, providing a practical way to test systems at scale. Despite the AI’s cautious approach, human oversight remains essential to ensure contextual and legal accuracy. In privacy-sensitive contexts, a stricter approach is often safer, as over-warning carries less risk than under-warning. Differences across sectors also matter: e-commerce scenarios allowed flexible mitigation, such as anonymization, while insurance required precise legal compliance. Beyond compliance, we explored the potential for continuous governance, where AI systems dynamically test and monitor policy adherence as data landscapes evolve.
This is a project in collaboration with Arif Wider from HTW Berlin as well as Simon Harrer, who co-founded Entropy Data and built the data marketplace.
Linus W. Dietz, Arif Wider, & Simon Harrer. Automating Data Governance with Generative AI. AAAI / ACM Conference on Artificial Intelligence, Ethics, and Society 2025.