conan1024hao/cjkbert-small - AI Model Zoo - BimAnt

Model description

This model was trained on ZH, JA, KO's Wikipedia (5 epochs).

How to use

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")

Before you fine-tune downstream tasks, you don't need any text segmentation.
(Though you may obtain better results if you applied morphological analysis to the data before fine-tuning)

Morphological analysis tools

ZH: For Chinese, we use LTP.
JA: For Japanese, we use Juman++.
KO: For Korean, we use KoNLPy(Kkma class).

Tokenization

We use character-based tokenization with whole-word-masking strategy.

Model size

vocab_size: 15015
num_hidden_layers: 4
hidden_size: 512
num_attention_heads: 8
param_num: 25M

NSDT 3DConvert

Convert 30+ 3D formats online: GLTF, GLB, GBX, OBJ, DAE, IFC, STEP, STL...

UnrealSynth

Unreal engine based photo realistic synthetic data generator for YOLO.

DreamTexture.js

AI powered 3d texture generation and projection SDK for three.js.