Perceiver IO masked language model

This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created from C4 and English Wikipedia. It is weight-equivalent to the deepmind/language-perceiver model but based on implementation classes of the perceiver-io library. It can be created from the deepmind/language-perceiver model with a library-specific conversion utility. Both models generate equal output for the same input.

Content of the deepmind/language-perceiver model card also applies to this model except usage examples. Refer to the linked card for further model and training details.

Model description

The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the Perceiver IO paper (UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).

Intended use

Although the raw model can be used directly for masked language modeling, the main use case is fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset (example) or fine-tuning on a labeled dataset using the pretrained encoder of this model (example) for weight initialization.

Usage examples

To use this model you first need to install the perceiver-io library with extension text.

pip install perceiver-io[text]

Then the model can be used with PyTorch. Either use the model and tokenizer directly

from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

masked_text = "This is an incomplete sentence where some words are" \

encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)

# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)

or use a fill-mask pipeline:

from transformers import pipeline
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

masked_text = "This is an incomplete sentence where some words are" \

filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))

Model conversion

The krasserm/perceiver-io-mlm model has been created from the source deepmind/language-perceiver model with:

from perceiver.model.text.mlm import convert_model



  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},