Model card for model ID

This is a T5 v1.1 model, pre-trained on a Japanese corpus.

Model details

T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5.

This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used.

Model Description

<!-- Provide a longer summary of what this model is. -->

Training Details

We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format.

Training Data

The training data used is

Preprocessing

The following filtering is done

Training Hyperparameters

Speeds, Sizes, Times

We trained 2097152 steps.

Technical Specifications

Model Architecture and Objective

Model architecture.

Compute Infrastructure

Google Cloud TPU v3-32.

Software

More Information

https://note.com/retrieva/n/n7b4186dc5ada (in Japanese)

Model Card Authors

Jiro Nishitoba

Model Card Contact

pr@retrieva.jp