Europe Reanalysis Super Resolution

The aim of the project is to create a Machine learning (ML) model that can generate high-resolution regional reanalysis data (similar to the one produced by CERRA) by downscaling global reanalysis data from ERA5.

This will be accomplished by using state-of-the-art Deep Learning (DL) techniques like U-Net, conditional GAN, and diffusion models (among others). Additionally, an ingestion module will be implemented to assess the possible benefit of using CERRA pseudo-observations as extra predictors. Once the model is designed and trained, a detailed validation framework takes the place.

It combines classical deterministic error metrics with in-depth validations, including time series, maps, spatio-temporal correlations, and computer vision metrics, disaggregated by months, seasons, and geographical regions, to evaluate the effectiveness of the model in reducing errors and representing physical processes. This level of granularity allows for a more comprehensive and accurate assessment, which is critical for ensuring that the model is effective in practice.

Moreover, tools for interpretability of DL models can be used to understand the inner workings and decision-making processes of these complex structures by analyzing the activations of different neurons and the importance of different features in the input data.

This work is funded by Code for Earth 2023 initiative.

The denoise model is released in Apache 2.0, making it usable without restrictions anywhere.

Model Details

This model corresponds to a Denoise Neural Network trained with instance normalization over bicubic interpolated inputs.

We have implemented a diffusers.UNet2DModel for a Denoising Diffusion Probabilistic Model, with different schedulers: DDPMScheduler, DDIM and LMSDiscreteScheduler.

Diagram DDPM

Model Description

<!-- Provide a longer summary of what this model is/does. --> We present the results of using Diffusion models (DM) for down-scaling (from 0.25º to 0.05º) regional reanalysis grids in the mediterranean area.

Denoise Network

For the Denoise network, we have only explored one architecture, diffusers.UNet2DModel, with differente model sizes, ranging from 3 blocks of 64, 128 and 192 out channels to the default configuration of 4 blocks of 224, 448, 672 and 896 out channels.

This network always takes:

Noise Scheduler

Different schedulers have been considered.

Training Data

The dataset used is a composition of the ERA5 and CERRA reanalysis.

The spatial coverage of the input grids (ERA5) is defined below, and corresponds to a 2D array of dimensions (60, 42):

      longitude: [-8.35, 6.6]
      latitude: [46.45, 35.50]

On the other hand, the target high-resolution grid (CERRA) correspond to a 2D matrix of dimmension (240, 160):

      longitude: [-6.85, 5.1]
      latitude: [44.95, 37]

The data samples used for training corresponds to the period from 1981 and 2013 (both included) and from 2014 to 2017 for per-epoch validation.

Normalization techniques

All of these normalization techniques have been explored during and after ECMWF Code 4 Earth.

With monthly climatologies. This corresponds to compute the historical climatologies during the training period for each region (pixel or domain), and normalize with respect to that. In our case, the monthly climatologies are considered, but it could also be disaggregated by time of day, for example.

The dependent approach is not feasible for the pixel-wise schema, because there is no direct correspondence between the input and output patch pixels. If we would be interested in doing so, there is the possibility to compute the statistics over the bicubic downsampled ERA5, and use those statistics for normalizing CERRA.

Without past information. This corresponds to normalizing each sample independently by the mean and standard deviation of the ERA5 field. This is known in the ML community as instance normalization. Here, we have to use only the distribution statistics from the inputs as the outputs will not be available during inference, but 2 different variations are possible in our use case:

The difference between these two approaches is not about calculating the statistics on the downscaled or source ERA5. The difference is that the input patch encompasses a larger area, and therefore a more different distribution. Thus, the second approach seems more correct as the downscaled area distribution will be more similar to the output distribution.


The results of this model are <ins>NOT</ins> considered <ins>ACCEPTABLE</ins>, since they are not comparable with bicubic interpolation, a simple method which is also considered as input to the model. Therefore, although more complex tests are performed, such as including other covariates (e.g. time of day), they are not detailed here because their real effect on the performance of the model cannot be determined.

In this repository, we present the best performing Diffusion Model, which is trained with the scheduler specified at scheduler_config.json with the parameters shown in config.json and instance normalization over the downsampled ERA5 inputs.

Below, the sample predicition of the 64M parameters Diffusion Model, with 1000 inference timesteps, at 00H of the January 1, 2018 compared with the CERRA reanalysis.




There is no significant difference in trainining time, or sampling quality (at maximum capabilities). The difference between schedulers may arise during influence, when DDIM or LMSDiscrete may have higher quality samples with fewer inference steps, and consequently lower computational cost.

As satisfying performance is not reaching at maximum capabilities (inference steps = number training timesteps), therea has not been any research of the schedulers efficiency during sampling, which by the scientific literature may be sufficient with 40 samples (1/25 of the current inference timesteps).

Model sizes

This is strongly related to training time. Not only because of the time it takes to run the forward & backward process of the network, but also because of the limited memory available to load the samples, and then the need for more (smaller) batches to complete each epoch.

With the limited computational resources available, and the dataset considered, the tests carried out have indicated that there is an improvement when going from tens of output channels to a few hundred, obtaining networks of between 20 and 100 million parameters, but that it is not possible to reach the default size due to failures during training (i.e. gradient explosion, etc...).

Next steps

As this factors (model size, normalization and noise schedulers) have been extensively explored, it is necessary to move the research efforts to other aspects, as the followings:

Based on scientific literature for other problems like Super Resolution in Computer Vision, where they work with larger samples -3 channels rather than 1, and more pixels- better results have to be achievable with this architecture type and DM flavour.

To tackle the most limiting factor, we think the best options are to explore options 1 and 2.

Compute Infrastructure

The use of GPUs in deep learning projects significantly accelerates model training and inference, leading to substantial reductions in computation time and making it feasible to tackle complex tasks and large datasets with efficiency.

The generosity and collaboration of our partners are instrumental to the success of this projects, significantly contributing to our research and development endeavors.


For our project, we have deployed two virtual machines (VMs), each featuring a dedicated Graphics Processing Unit (GPU). One VM is equipped with a 16GB GPU, while the other boasts a more substantial 20GB GPU. This resource configuration allows us to efficiently manage a wide range of computing tasks, from data processing to model training, and ultimately drives the successful execution of our project.


The code used to train and evaluate this model is freely available through its GitHub Repository ECMWFCode4Earth/DeepR hosted in the ECWMF Code 4 Earth organization.


