Model Card for yolochess_mlm_azure-cloud-35

This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

Model Details

Model Description

Developed by: Jonathan Rahn
Model type: Distilbert
Language(s) (NLP): Chess FEN
License: MIT

Uses

Direct Use

This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.

Downstream Use

It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.

Out-of-Scope Use

Anything other than Chess Positions in standard FEN format.

Bias, Risks, and Limitations

n/a

Recommendations

n/a

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")

from transformers import pipeline
pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35")
pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74")

Training Details

Training Data

Lichess-Elite 22-11 Dataset

Training Procedure

Masked Language Modeling objective with 15% masked token ratio.

Preprocessing

Tokenize data["train"]["fen"] with max-length padding to 200 tokens with default distilbert-base-cased tokenizer. Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters. The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters. Experiments with reduced max-length in tokenization show performance gains.

Speeds, Sizes, Times

Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 1x RTX 4090
Hours used: 10
Cloud Provider: local
Compute Region: local
Carbon Emitted: 1.5kg

Technical Specifications

Model Architecture and Objective

Distilbert, Masked Language Modeling