Fixed-roberta-base

roberta-base but with a resized embedding matrix and an extra dim in the token type embedding matrix for better sharding/partitioning.