Data‑Centric AI Governance for Ethical and Transparent Welfare Systems

Authors

  • Samuel D. Penbrook University of Auckland, New Zealand

Keywords:

AI Governance, Data‑Centric AI, Transparency, Bias Mitigation

Abstract

The rapid proliferation of artificial intelligence (AI) systems across governance, welfare, and public sector decision‑making has generated a pressing need for frameworks that ensure transparency, accountability, bias control, and policy compliance. Traditional model‑centric AI governance paradigms have emphasized algorithmic performance metrics, often at the expense of data integrity, representativeness, and socio‑ethical alignment. This article presents a comprehensive examination of data‑centric AI governance models, arguing for a paradigm shift that places data quality, documentation standards, metadata transparency, and institutional accountability at the core of ethical AI governance in welfare management. Drawing on multidisciplinary literature from AI ethics, dataset documentation, bias mitigation, and public policy, we construct an integrated theoretical framework that foregrounds trustworthy data practices as foundational to governance outcomes. Key constructs such as dataset nutrition labels (Holland et al., 2018), datasheets for datasets (Gebru et al., 2021), and ML‑ready metadata formats (Akhtar et al., 2024) are critically analyzed as tools to operationalize data‑centric governance. We synthesize insights on risks associated with large‑scale language models and AI systems (Bender et al., 2021; Bommasani et al., 2021), discussing the implications for welfare systems where decisions directly impact vulnerable populations. Methods for bias control, transparency enforcement, and continuous compliance monitoring are elaborated, with reference to policy proposals such as California’s generative AI training data transparency legislation (Irwin J, 2024) and the NIST AI Risk Management Framework (NIST AI RMF, 2024). Finally, we explore limitations, challenges, and future directions for data‑centric governance research in ensuring AI systems serve equitable welfare outcomes.

References

Navigli, R., Conia, S., & Ross, B., 2023. Biases in large language models: origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15(2), pp.1-21.

Holland, S., Hosny, A., Newman, S., Joseph, J., & Chmielinski, K., 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. arXiv:1805.03677.

Buchanan, B., 2020. The AI triad and what it means for national security strategy. Center for Security and Emerging Technology.

Akhtar, M., Benjelloun, O., Conforti, C., Giner-Miguelez, J., Jain, N., Kuchnik, M., Lhoest, Q., Marcenac, P., Maskey, M., Mattson, P., Oala, L., Ruyssen, P., Shinde, R., Simperl, E., Thomas, G., Tykhonov, S., Vanschoren, J., Vogler, S., & Wu, C.-J., 2024. Croissant: A Metadata Format for ML-Ready Datasets. In Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning, DEEM ’24.

Bender, E.M., Gebru, T., McMillan-Major, A., & Shmitchell, S., 2021, March. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 610-623.

Li, N., Pan, A., Gopal, A., Yue, S., Berrios, D., Gatti, A., Li, J.D., Dombrowski, A.K., Goel, S., Phan, L., & Mukobi, G., 2024. The wmdp benchmark: Measuring and reducing malicious use with unlearning. arXiv preprint arXiv:2403.03218.

Deshpande, A., Murahari, V., Rajpurohit, T., Kalyan, A., & Narasimhan, K., 2023. Toxicity in ChatGPT: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.

Fu, X., Li, S., Wang, Z., Liu, Y., Gupta, R.K., Berg-Kirkpatrick, T., & Fernandes, E., 2024. Imprompter: Tricking LLM Agents into Improper Tool Use. arXiv preprint arXiv:2410.14923.

Gupta, R., Walker, L., Corona, R., Fu, S., Petryk, S., Napolitano, J., Darrell, T., & Reddie, A.W., 2024. Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies. arXiv preprint arXiv:2409.17216.

Priyadarshi Uddandarao, D., Sravanthi Valiveti, S. S., Varanasi, S. R., Rahman, H., & Chakraborty, P., 2026. Data-Centric Governance Models Using Trustworthy AI: Strengthening Transparency, Bias Control, and Policy Compliance in Welfare Management. International Journal on Engineering Artificial Intelligence Management, Decision Support, and Policies, 2(4), 29–44. https://doi.org/10.63503/j.ijaimd.2025.200

Irwin J., 2024. AB 2013, Generative artificial intelligence: training data transparency. California Legislature.

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., & Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

NIST AI RMF, 2024. https://airc.nist.gov/docs/NIST.AI.600-1.GenAIProfile.ipd.pdf

Longpre, S., Mahari, R., Lee, A.N., Lund, C.S., Oderinwale, H., Brannon, W., Saxena, N., Obeng-Marnu, N., South, T., Hunter, C.J., & Klyman, K., 2024, January. Consent in crisis: the rapid decline of the AI data commons. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

Rawal, A., McCoy, J., Rawat, D.B., Sadler, B.M., & Amant, R.S., 2022. Recent advances in trustworthy explainable artificial intelligence: Status, challenges, and perspectives. IEEE Trans. Artif. Intell., 3(6), pp. 852–866.

Jain, N., Akhtar, M., Giner-Miguelez, J., Shinde, R., Vanschoren, J., Vogler, S., Goswami, S., Rao, Y., Santos, T., Oala, L., & Karamousadakis, M., 2024. A Standardized Machine-readable Dataset Documentation Format for Responsible AI. arXiv preprint arXiv:2407.16883.

Phuong, M., Aitchison, M., Catt, E., Cogan, S., Kaskasoli, A., Krakovna, V., Lindner, D., Rahtz, M., Assael, Y., Hodkinson, S., & Howard, H., 2024. Evaluating frontier models for dangerous capabilities. arXiv preprint arXiv:2403.13793.

Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D., & Liang, P., 2023. The foundation model transparency index. arXiv preprint arXiv:2310.12941.

He, J., Baxter, S.L., Xu, J., Xu, J., Zhou, X., & Zhang, K., 2019. The practical implementation of artificial intelligence technologies in medicine. Nature Med., 25(1), pp. 30–36.

Zhang, C., & Lu, Y., 2021. Study on artificial intelligence: The state of the art and future prospects. J. Ind. Inf. Integr., 23, Art. no. 100224.

Ré, C., Niu, F., Gudipati, P., & Srisuwananukorn, C., 2020. Overton: A data system for monitoring and improving machine-learned products. In Proc. 10th Conf. Innov. Data Syst. Res., Amsterdam, The Netherlands.

Batty, M., 2022. Planning data. Environ. Planning B, Urban Anal. City Sci., 49, pp. 1588–1592.

C. Hegde, 2022. Anomaly detection in time series data using data-centric AI. In Proc. IEEE Int. Conf. Electron., Comput. Commun. Technol. (CONECCT), pp. 1–6.

Shinde, R., Simperl, E., Thomas, G., Tykhonov, S., & Vanschoren, J., 2024. Croissant: A Metadata Format for ML-Ready Datasets.

Jain, N., Kuchnik, M., & Lhoest, Q., 2024. Machine-readable dataset documentation for responsible AI. arXiv preprint.

Mattson, P., Oala, L., & Ruyssen, P., 2024. Metadata standards for data-centric governance. Conference Proceedings.

Giner-Miguelez, J., Conforti, C., & Akhtar, M., 2024. Data quality frameworks in AI systems.

Rahman, H., Chakraborty, P., & Varanasi, S.R., 2026. Data-centric welfare AI governance: Best practices.

Maskey, M., Marcenac, P., & Benjelloun, O., 2024. AI dataset documentation: Approaches and challenges.

Obeng-Marnu, N., South, T., & Hunter, C.J., 2024. Ethics and compliance in data-driven AI governance.

Downloads

Published

2026-01-31

How to Cite

Samuel D. Penbrook. (2026). Data‑Centric AI Governance for Ethical and Transparent Welfare Systems. Research Index Library of Eijmr, 13(1), 1222–1229. Retrieved from https://eijmr.net/index.php/rileijmr/article/view/96

Issue

Section

Articles