Automating Data Governance with Generative AI

3 minute read

Organizations now manage complex data systems, and meeting privacy laws such as the GDPR has become both vital and expensive. We examined how large language models (LLMs) can support data governance by generating warnings about data access decisions in decentralized data systems.

In the past, data governance relied on checklists, spreadsheets, or domain-specific languages for access control. With advances in natural language processing, data marketplaces within decentralized systems can now help decision-makers manage data sharing more effectively.

Laws such as the EU General Data Protection Regulation, the California Consumer Privacy Act, and the EU AI Act require strict control over personal and sensitive data. Organizations must balance innovation with compliance under these rules.

Data Marketplaces and AI-Assisted Data Governance

In data mesh architectures, domain teams manage their own data products within a federated environment. Each product defines its usage rules and guarantees through a data contract, which states what data can be shared and under what conditions. To coordinate these contracts, an enterprise data marketplace records, reviews, and approves or rejects access requests between data products.

At the center of this marketplace is Governance AI, an LLM-powered tool that checks whether a data access request complies with the provider’s data contract, company policies, and legal requirements such as the GDPR. Governance AI does not make final decisions. Instead, it issues structured warnings and suggestions for correction to guide human experts.

Generating Realistic Testing Datasets

Testing compliance systems is difficult. Real-world datasets contain confidential information, and manually creating synthetic test cases takes time. To evaluate the system, we used LLMs to generate realistic data access requests, building on the work of Herdel et al. (2024). Each test case included metadata about data products, privacy policies, and data contracts to create plausible access scenarios. For example, a marketing team requesting customer purchase data for campaign optimization.

Domain experts from the insurance and e-commerce sectors reviewed the generated requests. Most were judged realistic, and some closely matched real-world use cases; the rest were discarded.

Evaluating Computational Governance

In the evaluation, Governance AI was compared with domain experts who assessed 110 access requests across both sectors.

Governance AI issued 3.6 times more warnings than human experts. It did not miss any case where experts raised a compliance concern. After a secondary review, experts judged 80% of the AI’s warnings to be correct.

This cautious approach may slow some data-sharing workflows, but it supports a key compliance principle: prevent breaches first, optimize later. The AI also offered actionable suggestions, which proved more useful in e-commerce than in highly regulated fields such as insurance.

Findings and Implications

The study highlights several insights for organizations pursuing AI-assisted data governance. Governance AI can handle complex policy reasoning without missing critical cases, showing that automated assistance is both feasible and effective. Synthetic access requests generated by LLMs can realistically simulate real-world governance scenarios, providing a practical way to test systems at scale. Despite the AI’s cautious approach, human oversight remains essential to ensure contextual and legal accuracy. In privacy-sensitive contexts, a stricter approach is often safer, as over-warning carries less risk than under-warning. Differences across sectors also matter: e-commerce scenarios allowed flexible mitigation, such as anonymization, while insurance required precise legal compliance. Beyond compliance, we explored the potential for continuous governance, where AI systems dynamically test and monitor policy adherence as data landscapes evolve.

This is a project in collaboration with Arif Wider from HTW Berlin as well as Simon Harrer, who co-founded Entropy Data and built the data marketplace.

Linus W. Dietz, Arif Wider, & Simon Harrer. Automating Data Governance with Generative AI. AAAI / ACM Conference on Artificial Intelligence, Ethics, and Society 2025.

Automating Data Governance with Generative AI

Data Marketplaces and AI-Assisted Data Governance

Generating Realistic Testing Datasets

Evaluating Computational Governance

Findings and Implications

You May Also Enjoy

Understanding the Potential of Urban Parks to Promote Well-being

Final Year Data Science Projects 2025/26

The Experience of Running: Recommending Routes Using Sensory Mapping in Urban Environments

Understanding the Influence of Data Characteristics on the Performance of Point-of-Interest Recommendation Algorithms