Final Year Data Science Projects 2025/26
This academic year, I am offering a range of final-year projects for MSc Data Science students at King’s College London. The topics are motivated by current research in urban computing and data governance.
Quantifying Urban Greenspace Globally
Recent research has highlighted the role of specific spaces within urban parks in supporting physical and mental wellbeing [1]. Using the Lexicon of Health-Related Park Features [2], a dataset of 23,477 parks across 35 cities was evaluated based on their potential to promote health-related activities.
This project offers students the opportunity to apply and extend their data science skills to a global-scale environmental challenge. Building upon an existing data processing pipeline for urban park analysis [3], the aim is to expand the scope to include all cities worldwide and incorporate a broader range of urban greenspaces—such as community gardens, green corridors, and other forms of urban vegetation. The analysis will be complemented with additional environmental indicators, including urban heat stress [4], air pollution, and vegetation indices such as NDVI, primarily derived from satellite observations.
An important aspect of this work is understanding the urban context of greenspaces; their spatial location, accessibility, and surrounding built environment characteristics such as building density and land-use type. Developing such contextual indices provides valuable insights into how greenspaces are used and helps explain their broader ecological and social roles within cities [5,6].
- [1] Health-promoting Potential of Parks in 35 Cities Worldwide. Nature Cities 2025
- [2] Lexicon of Health-Related Park Features (OpenStreetMap)
- [3] Park Analysis Pipeline
- [4] Parks, S. A., Dillon, G. K., & Miller, C. (2014). A new metric for quantifying burn severity: the relativized burn ratio. Remote Sensing, 6(3), 1827-1844.
- [5] Larson, K. L., Brown, J. A., Lee, K. J., & Pearsall, H. (2022). Park equity: why subjective measures matter. Urban Forestry & Urban Greening, 76, 127733.
- [6] Kaczynski, A. T., Schipperijn, J., Hipp, J. A., Besenyi, G. M., Stanis, S. A. W., Hughey, S. M., & Wilcox, S. (2016). ParkIndex: Development of a standardized metric of park access for research and planning. Preventive medicine, 87, 110-114.
Analyzing and Visualizing the Inequalities of London’s Built Environment and Its Effect on Health
This project aims to develop a comprehensive analytical and visualization framework to quantify and understand inequalities across London’s built environment and their impact on public health. By integrating diverse spatial and socioeconomic datasets, the project will construct multi-dimensional indices of inequality that capture environmental, infrastructural, and health-related disparities across London boroughs. The framework will provide a quantifiable and scalable tool for assessing how variations in the built environment contribute to health inequities, supporting evidence-based urban policy and planning.
The built environment component investigates how the characteristics of places where people live influence their daily wellbeing. This includes examining air quality and pollution levels, noise exposure, access to green spaces and tree coverage, and neighbourhood resilience to extreme weather events such as flooding and heatwaves. Special attention is given to the vulnerabilities of specific population groups and the cumulative effects of environmental stressors. By aggregating and analysing these factors, the project explores how urban form and environmental conditions interact to shape comfort, safety, and overall quality of life across London.
The health equity component examines spatial patterns of health outcomes and access to healthcare. It analyses indicators such as obesity prevalence, GP prescription rates, and accessibility of healthcare services, alongside official measures of health deprivation. This analysis reveals where and why health inequalities persist and how they relate to environmental and infrastructural factors. Together, these insights aim to illuminate the links between the built environment and health outcomes, providing actionable intelligence for local authorities, urban planners, and public health agencies.
- Sanja Šćepanović, Ivica Obadic, Sagar Joglekar, Laura Giustarini, Cristiano Nattero, Daniele Quercia, and Xiao Xiang Zhu. 2024. MEDSAT: a public health dataset for england featuring medical prescriptions and satellite imagery. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ‘23)
- Schipperijn, J., Madsen, C.D., Toftager, M. et al. The role of playgrounds in promoting children’s health – a scoping review. Int J Behav Nutr Phys Act 21, 72 (2024). https://doi.org/10.1186/s12966-024-01618-2
- Bach, B., Freeman, E., Abdul-Rahman, A., Turkay, C., Khan, S., Fan, Y., & Chen, M. (2022). Dashboard design patterns. IEEE Transactions on Visualization and Computer Graphics, 29(1), 342-352. https://dashboarddesignpatterns.github.io
- https://data.london.gov.uk/
- https://www.gov.uk/government/collections/english-indices-of-deprivation
AI for Data Governance
The exchange of data within and between organizations is governed by company policies and data protection laws. As policies and data flows change over time, maintaining compliance in data exchange poses a complex challenge. Computational data governance refers to automating governance tasks using AI tools that support the organization and its employees to comply with policies and allow data sharing to foster the business needs. Currently, this is a disruptive field in enterprise software with many emerging solutions because all the current and future AI systems need to maintain access to data, which is only possible with highly automated and sophisticated data governance.
The project is about using popular open source software [1, 1b] and open standards [2,3] from the Linux Foundation to develop and evaluate solutions that facilitate data governance within organizations. Based on seminal work [4], further aspects of data governance will be explored, including data quality assurances, metadata tagging, query generation, query validation, AI explanations, and scaling to Big Data problems. An initial dataset of annotated data access requests is provided [5], which is to be adopted to other use cases.
Successful candidates will learn how to employ Large Language Models on a critical problem the IT industry currently faces. Gartner predicts: “By 2027, for example, 60% of organizations will fail to realize the anticipated value of their AI use cases due to incohesive data governance frameworks [6].” This is your chance to prove them wrong and become an expert in a highly-sought after field.
- [1] Data Product MCP https://github.com/entropy-data/dataproduct-mcp
- [1b] Data Contract CLI https://github.com/datacontract/datacontract-cli
- [2] Bitol, Open Data Product Standard (ODPS). Linux Foundation AI & Data, 2025. Version 1.0.0.
- [3] Bitol, Open Data Contract Standard (ODCS). Linux Foundation AI & Data, 2025. Version 3.0.1.
- [4] Dietz, L. W., Wider, A., & Harrer, S. (2025). Automating Data Governance with Generative AI. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 760-771. https://doi.org/10.1609/aies.v8i1.36587
- [5] Supplementary Material for “Automating Data Governance with Generative AI” https://github.com/LinusDietz/Automating-Data-Governance
- [6] https://www.gartner.com/en/data-analytics/topics/data-governance