cyrilzhang/gpt2-numfix - AI Model Zoo - BimAnt

GPT-2 Tokenizer with unmerged digits

A fork of the GPT-2 tokenizer, which removes multi-digit tokens:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('cyrilzhang/gpt2-numfix')

tokenizer('123.45')  # [16, 17, 18, 13, 19, 20]
gpt2_tokenizer('123.45')  # [10163, 13, 2231]

Backward-compatible:

tokenizer.decode([10163, 46387])  # '<unused123> pigeon'
gpt2_tokenizer.decode([10163, 46387])  # '123 pigeon'

This is for my investigations into the arithmetic capabilities of large language models. There is no model here, only a tokenizer.
PaLM does this. I think it's very reasonable.
Many models (illustriously, GPT-3) don't do this, because they use the GPT-2 tokenizer.

NSDT 3DConvert

Convert 30+ 3D formats online: GLTF, GLB, GBX, OBJ, DAE, IFC, STEP, STL...

UnrealSynth

Unreal engine based photo realistic synthetic data generator for YOLO.

DreamTexture.js

AI powered 3d texture generation and projection SDK for three.js.