Microsoft Building Footprints (MBF)

11.5. Microsoft Building Footprints (MBF)#

This is a worldwide building footprints derived from satellite imagery. Microsoft has detected 1.4B buildings from Bing Maps imagery between 2014 and 2024 including Maxar, Airbus, and IGN France imagery. The data is freely available for download and use under ODbL.

11.5.1. Data coverage#

The below map shows the data coverage across the world.

../../_images/data_cov_mcsft.png — Fig. 11.5 Microsoft Building data coverage. Source.#

This dataset offers an estimation of the building height as well. However, the height attribute has a lower coverage than the footprint dataset. See below.

../../_images/building_height_cov.png — Fig. 11.6 Microsoft Building height data coverage. Source.#

11.5.2. How data was created#

The building extraction is done in two stages:

Semantic Segmentation – Recognizing building pixels on an aerial image using deep neural networks (DNNs).
Polygonization – Converting building pixel detections into polygons.

Stage 1: Semantic Segmentation

../../_images/semantic_segmentation.png — Fig. 11.7 Semantic Segmentation step.#

Stage 2: Polygonization

../../_images/polygonization.png — Fig. 11.8 Polygonization step.#

11.5.3. Confidence Score#

Latest releases of this dataset incorporated a confidence score between 0 and 1 that can be read as a percentage confidence.

11.5.4. Evaluation Metrics#

The project provides the Precision/Recall metric for every region.

Region	Precision	Recall
Africa	94.4%	70.9%
Caribbean	92.2%	76.8%
Central Asia	97.17%	79.47%
Europe	94.3%	85.9%
Middle East	95.7%	85.4%
South America	95.4%	78.0%
South Asia	94.8%	76.7%

The above information and more details about the project can be found in the Project GitHub repository.

11.5.5. Download the data#

The below code was created following the example on the Project Github repository. The goal is to download the building footprints from the area of interest, the 50 km buffer around the earthquake epicenter.

Step 1 - Determine which tiles intersect the AOI

MBF stores the data in tiles, for each tile there is a downloading link for retrieving the data. So, the first step is to find which tiles intersect with the AOI. This step can be done with the code below or using a What the tile? tool.

import pandas as pd
import mercantile
import geopandas as gpd
from shapely import geometry, Point, Polygon
import mercantile
from tqdm import tqdm
import os
import tempfile
import random
import folium

downloading_links = pd.read_csv('https://minedbuildings.z5.web.core.windows.net/global-buildings/dataset-links.csv', dtype=str)

east, south, west, north = (36.62768265193316, 37.561544985221865, 37.76386851796805, 38.460405921314695)

quad_keys = set()
for tile in list(mercantile.tiles(east, south, west, north, zooms=9)):
    quad_keys.add(mercantile.quadkey(tile))
quad_keys = list(quad_keys)
print(f"The input area spans {len(quad_keys)} tiles: {quad_keys}")

The input area spans 6 tiles: ['122110321', '122110302', '122110303', '122110300', '122110301', '122110320']

Step 2 - Download the data

Using the tiles from above, all footprints are downloaded. After that, the entire dataset is filtered to keep only buildings that lie in the AOI.

# Obtain the Polygon for the Area of Interest
d = {'name': ['epicenter'], 'geometry': [Point(37.196, 38.011)]}
gdf = gpd.GeoDataFrame(d, crs="EPSG:4326")
gdf = gdf.to_crs('EPSG:23036')
gdf['buffer'] = gdf.geometry.apply(lambda x: x.buffer(50000))
gdf = gdf.set_geometry('buffer')
gdf = gdf.to_crs('EPSG:4326')
bounds = gdf.bounds.loc[0]
north, south, east, west = bounds.maxy, bounds.miny, bounds.minx, bounds.maxx
gdf.loc[1, 'name'] = 'epicenter_rectangle'
gdf.loc[1, 'buffer'] = Polygon(((east,north), (west, north), (west, south), (east, south)))

idx = 0
combined_gdf = gpd.GeoDataFrame()
with tempfile.TemporaryDirectory() as tmpdir:
    # Download the GeoJSON files for each tile that intersects the input geometry
    tmp_fns = []
    for quad_key in tqdm(quad_keys):
        print(quad_key)
        rows = downloading_links[downloading_links["QuadKey"] == quad_key]
        if rows.shape[0] == 1:
            url = rows.iloc[0]["Url"]

            df2 = pd.read_json(url, lines=True)
            df2["geometry"] = df2["geometry"].apply(geometry.shape)

            gdf_tile = gpd.GeoDataFrame(df2, crs=4326)
            fn = os.path.join(tmpdir, f"{quad_key}.geojson")
            tmp_fns.append(fn)
            if not os.path.exists(fn):
                gdf_tile.to_file(fn, driver="GeoJSON")
        elif rows.shape[0] > 1:
            raise ValueError(f"Multiple rows found for QuadKey: {quad_key}")
        else:
            raise ValueError(f"QuadKey not found in dataset: {quad_key}")
    # import pdb; pdb.set_trace()
    # Merge the GeoJSON files into a single file
    for fn in tmp_fns:
        gdf_tile = gpd.read_file(fn)  # Read each file into a GeoDataFrame
        gdf_tile = gdf_tile[gdf_tile.geometry.within(gdf.loc[1].buffer)]  # Filter geometries within the AOI
        gdf_tile['id'] = range(idx, idx + len(gdf_tile))  # Update 'id' based on idx
        idx += len(gdf_tile)
        combined_gdf = pd.concat([combined_gdf,gdf_tile],ignore_index=True)

Step 3 - Explore the results

It can be observed that all the records from this area do not have a confidence score, this means they are from a vintage dataset. Also, these datapoints do not have a value for the height.

The map below shows a sample of the results to consume less computation resources.

print('{} building footprints were retrieved'.format(len(combined_gdf)))

194264 building footprints were retrieved

import json # Import json library to read a string into a dictionary

combined_gdf['properties'] = combined_gdf['properties'].apply(lambda x: json.loads(x))
combined_gdf['confidence'] = combined_gdf['properties'].apply(lambda x: x['confidence'])
combined_gdf['height'] = combined_gdf['properties'].apply(lambda x: x['height'])

combined_gdf.confidence.value_counts()

confidence
-1.0    194264
Name: count, dtype: int64

combined_gdf.height.value_counts()

height
-1.0    194264
Name: count, dtype: int64

idx = list(combined_gdf.index)
idx_shuffled= random.shuffle(idx) # Shuffle indices to avoid spatial oversampling
m = combined_gdf.loc[idx[:1000]].explore(name = 'MBF')
gdf.loc[[1]].explore(m = m, color = 'pink', name = 'AOI')
folium.LayerControl().add_to(m)
m

Make this Notebook Trusted to load map: File -> Trust Notebook