Model description

Autoencoder model trained to compress information from sentinel-2 satellite images using Vision Transformer (ViT) as encoder backbone to extract features. The latent space of the model is given by 1024 neurons which can be used to generate embeddings from the sentinel-2 satellite images.

The model was trained using bands RGB (2, 3 and 4) (Red, Green and Blue) of the Sentinel-2 satellites and using 81 municipalities of Colombia with most dengue cases.

The input shape of the model is 224, 224, 3. To extract features you should remove the last layer.

The model can be read as (example in jupyer):

!git lfs install
!git clone https://huggingface.co/MITCriticalData/Sentinel-2_ViT_Autoencoder_RGB_full_Colombia_Dataset
    
import tensorflow as tf
from transformers import TFViTModel


model = tf.keras.models.load_model('Sentinel-2_ViT_Autoencoder_RGB_full_Colombia_Dataset', custom_objects={"TFViTModel": TFViTModel})

You can extract the embeddings removing the last layer using:

import tensorflow as tf

model = tf.keras.Sequential()

for layer in autoencoder.layers[:-1]: # just exclude last layer from copying
    model.add(layer)    
        

Intended uses & limitations

The model was trained with images of 81 different cities in Colombia, however it may require fine tuning or retraining to learn from other contexts such as countries and other continents.

Training and evaluation data

The model was trained with satellite images of 81 different cities with most dengue cses in Colombia extracted from sentinel-2 using RGB bands using an asymmetric autoencoder. Images with information that could result in noise such as black images were filtered prior to training to avoid noise in the data.

The dataset was split into train and test using 80% for train and 20% to test.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameters Value
name Adam
learning_rate 0.001
decay 0.0
beta_1 0.9
beta_2 0.999
epsilon 1e-07
amsgrad False
training_precision float32