fill-mask

This model is derived from the bert-base-uncased checkpoint by replacing the GELU with ReLU activation function and further pre-trained through several iterations to adapt it to the change of the activation function.