Fixed-roberta-base
roberta-base but with a resized embedding matrix and an extra dim in the token type embedding matrix for better sharding/partitioning.
roberta-base but with a resized embedding matrix and an extra dim in the token type embedding matrix for better sharding/partitioning.