Eurostat European Big Data Hackathon 2025#

Eurostat is organizing the fifth round of the European Big Data Hackathon from 6 to 11 March 2025 (including the presentation by the winners at the NTTS) in Brussels.

The purpose of the 2025 hackathon is to foster expertise in integrating Earth Observation data with official statistics for producing innovative ideas for statistical products and tools relevant for the EU policies.

The European Big Data Hackathon takes place every two years and gathers teams from all over Europe to compete for the best solution to a statistical challenge. The teams develop innovative approaches, applications and data products combining official statistics and big data that can help to answer pressing EU policy and/or statistical questions.

Source: https://cros.ec.europa.eu/2025EuropeanBigDataHackathon

How to download, visualise and run some basic statistics on ERA5 data#

Written by William Ray for the participants of the 5th European Big Data Hackathon 2025.

In this notebook you are shown how to:

  • Download ERA5 data from the Climate Data Store using API.

  • How to visualise the downloaded data using xarray.

  • Run some basic statistics such as the average temperature over a year.

  • Visualise the average temperature difference between 1950 and 2020.

Before you get started you will need to create an account at the Climate Data Store.

# Python Standard Libraries
import os
import datetime as dt

# Data Manipulation Libraries
import numpy as np
import xarray as xr

# Visualization Libraries
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs


# Climate Data Store API for retrieving climate data
import cdsapi
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 6
      3 import datetime as dt
      5 # Data Manipulation Libraries
----> 6 import numpy as np
      7 import xarray as xr
      9 # Visualization Libraries

ModuleNotFoundError: No module named 'numpy'

Downloading the Data#

First, we’ll load ERA5 data from the Climate Data Store (CDS) using the cdsapi, including the land-sea mask. To do this, save your CDS API key in the $HOME/.cdsapirc file. In addition you have to have accepted Terms of use in the CDS portal.

New to CDS? Consider reading the CDS tutorial for a detailed guide.

file_name = {}  # dictionary containing [data source : file name]

# Add the data sources and file names
file_name.update({"era5": "temperature_era5.nc"})

# Create the paths to the files
path_to = {
    source: os.path.join(f"data/{source}/", file) for source, file in file_name.items()
}

# Create necessary directories if they do not exist
for path in path_to.values():
    os.makedirs(
        os.path.dirname(path), exist_ok=True
    )  # create the folder if not available

path_to
{'era5': 'data/era5/temperature_era5.nc'}
%%time 

client = cdsapi.Client()

dataset = 'reanalysis-era5-single-levels-monthly-means'
request = {
  'product_type': ['monthly_averaged_reanalysis'],
  'variable': ['2m_temperature', 'land_sea_mask'],
  'year': ['1950', '2020'],
  'month': list(range(1, 13)),
  'time': '00:00',
  'data_format': 'netcdf',
  'download_format': 'unarchived' 
}
target = path_to['era5']

client.retrieve(dataset, request, target)
2025-02-13 07:33:55,201 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-02-13 07:33:55,202 WARNING [2024-06-16T00:00:00] CDS API syntax is changed and some keys or parameter names may have also changed. To avoid requests failing, please use the "Show API request code" tool on the dataset Download Form to check you are using the correct syntax for your API request.
2025-02-13 07:33:55,407 INFO Request ID is 06878979-6513-4d60-b490-559df4414b76
2025-02-13 07:33:55,475 INFO status has been updated to accepted
2025-02-13 07:34:03,843 INFO status has been updated to running
2025-02-13 07:34:09,065 INFO status has been updated to successful
CPU times: user 151 ms, sys: 70.2 ms, total: 222 ms
Wall time: 15.5 s
'data/era5/temperature_era5.nc'

Opening the dataset#

Now it is downloaded, we can open the dataset and inspect it using xarray.

data = xr.open_dataset('data/era5/temperature_era5.nc')
data
<xarray.Dataset> Size: 199MB
Dimensions:     (valid_time: 24, latitude: 721, longitude: 1440)
Coordinates:
    number      int64 8B ...
  * valid_time  (valid_time) datetime64[ns] 192B 1950-01-01 ... 2020-12-01
  * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    expver      (valid_time) <U4 384B ...
Data variables:
    t2m         (valid_time, latitude, longitude) float32 100MB ...
    lsm         (valid_time, latitude, longitude) float32 100MB ...
Attributes:
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
lat = data.latitude
lon = data.longitude
longitude= data.longitude-180
data = data.sortby(longitude)

temp_2m = data.t2m[0,:,:]
plt.figure(figsize=(9, 9))

ax = plt.axes(projection=ccrs.PlateCarree())
ax.add_feature(cartopy.feature.BORDERS, linestyle='-', alpha=1)
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)
ax.set_extent ((-20, 50, 30, 80), ccrs.PlateCarree())

cf = plt.contourf(longitude, lat, temp_2m, cmap='jet')
cb = plt.colorbar(cf, fraction=0.0235, pad=0.02)
cb.set_label(' \u00b0K', fontsize=15, rotation=0)

plt.show()
../_images/e7c5db8302882182984ce314cffac3087c2263095a762893f027f4e6be986e92.png

Calculate the yearly average for 1950#

start_date = "1950-01-01"
end_date = "1950-12-01"

temp_1950 = data.sel(valid_time=slice(start_date, end_date))
temp_1950
<xarray.Dataset> Size: 100MB
Dimensions:     (valid_time: 12, latitude: 721, longitude: 1440)
Coordinates:
    number      int64 8B ...
  * valid_time  (valid_time) datetime64[ns] 96B 1950-01-01 ... 1950-12-01
  * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    expver      (valid_time) <U4 192B ...
Data variables:
    t2m         (valid_time, latitude, longitude) float32 50MB ...
    lsm         (valid_time, latitude, longitude) float32 50MB ...
Attributes:
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
mean_1950 = temp_1950.t2m.sum(dim="valid_time") / 12
mean_1950
<xarray.DataArray 't2m' (latitude: 721, longitude: 1440)> Size: 4MB
array([[255.91235, 255.91235, 255.91235, ..., 255.91235, 255.91235,
        255.91235],
       [255.92   , 255.92017, 255.92033, ..., 255.91936, 255.91985,
        255.92   ],
       [255.97095, 255.97112, 255.97144, ..., 255.96916, 255.96948,
        255.96997],
       ...,
       [227.4872 , 227.48817, 227.48964, ..., 227.48444, 227.4859 ,
        227.48705],
       [227.27171, 227.27284, 227.27399, ..., 227.26927, 227.27025,
        227.27106],
       [226.43236, 226.43236, 226.43236, ..., 226.43236, 226.43236,
        226.43236]], dtype=float32)
Coordinates:
    number     int64 8B ...
  * latitude   (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
plt.figure(figsize=(9, 9))

ax = plt.axes(projection=ccrs.PlateCarree())
ax.add_feature(cartopy.feature.BORDERS, linestyle='-', alpha=1)
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)
ax.set_extent ((-20, 50, 30, 80), ccrs.PlateCarree())

cf = plt.contourf(longitude, lat, mean_1950, cmap='jet')
cb = plt.colorbar(cf, fraction=0.0235, pad=0.02)
cb.set_label(' \u00b0K', fontsize=15, rotation=0)

plt.show()
../_images/d56c34f7868972560f4b523f2191ff8236bf3c896cbe88c1ff145f3ec15b9b89.png

Calculate the yearly average for 2020 and then visualise the difference between the average temperature in 1950 and 2020#

start_date = "2020-01-01"
end_date = "2020-12-01"

temp_2020 = data.sel(valid_time=slice(start_date, end_date))
temp_2020
<xarray.Dataset> Size: 100MB
Dimensions:     (valid_time: 12, latitude: 721, longitude: 1440)
Coordinates:
    number      int64 8B ...
  * valid_time  (valid_time) datetime64[ns] 96B 2020-01-01 ... 2020-12-01
  * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    expver      (valid_time) <U4 192B ...
Data variables:
    t2m         (valid_time, latitude, longitude) float32 50MB ...
    lsm         (valid_time, latitude, longitude) float32 50MB ...
Attributes:
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
mean_2020 = temp_2020.t2m.sum(dim="valid_time") / 12
mean_2020
<xarray.DataArray 't2m' (latitude: 721, longitude: 1440)> Size: 4MB
array([[260.71088, 260.71088, 260.71088, ..., 260.71088, 260.71088,
        260.71088],
       [260.65814, 260.65912, 260.65976, ..., 260.65652, 260.65717,
        260.65765],
       [260.65555, 260.65732, 260.65878, ..., 260.65115, 260.65292,
        260.6544 ],
       ...,
       [228.85541, 228.85655, 228.85802, ..., 228.85265, 228.85443,
        228.85509],
       [228.80707, 228.80772, 228.80902, ..., 228.80446, 228.80528,
        228.80592],
       [228.39447, 228.39447, 228.39447, ..., 228.39447, 228.39447,
        228.39447]], dtype=float32)
Coordinates:
    number     int64 8B ...
  * latitude   (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
plt.figure(figsize=(9, 9))

ax = plt.axes(projection=ccrs.PlateCarree())
ax.add_feature(cartopy.feature.BORDERS, linestyle='-', alpha=1)
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)
ax.set_extent ((-20, 50, 30, 80), ccrs.PlateCarree())

cf = plt.contourf(longitude, lat, mean_2020, cmap='jet')
cb = plt.colorbar(cf, fraction=0.0235, pad=0.02)
cb.set_label(' \u00b0K', fontsize=15, rotation=0)

plt.show()
../_images/b19642c63a1d1a63e31b3a20a60f7d405bc68cb8f8cda6d54da5bbf3a0e912f5.png
plt.figure(figsize=(9, 9))

ax = plt.axes(projection=ccrs.PlateCarree())
ax.add_feature(cartopy.feature.BORDERS, linestyle='-', alpha=1)
ax.coastlines(resolution='10m')
ax.gridlines(draw_labels=True)
ax.set_extent ((-20, 50, 30, 80), ccrs.PlateCarree())

cf = plt.contourf(longitude, lat, mean_1950 - mean_2020, cmap='jet')
cb = plt.colorbar(cf, fraction=0.0235, pad=0.02)
cb.set_label(' \u00b0K', fontsize=15, rotation=0)

plt.show()
../_images/5d5630faeeb83249d024376e555ad2f12e813a5ee93eeaecf10f8856103b2498.png