intro

1.38G的中文私人QQ群聊天记录语料
1400万个tokens
一张3060显卡训练17小时

个人首次尝试训练人工智能模型，学习训练GPT2模型，仅供参考。

交互结果仅供参考，本模型不对结果的合法性和合理性做保证，

Link

从头开始训练因果语言模型

infer code


from transformers import GPT2LMHeadModel, AutoTokenizer

model_name_or_path = "isaachong127/gpt2_chinese_with_personal_qqchat_data"#"checkpoint-16000"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

# add the EOS token as PAD token to avoid warnings
model = GPT2LMHeadModel.from_pretrained(model_name_or_path, pad_token_id=tokenizer.eos_token_id)

txt = """\
今天
"""
# encode context the generation is conditioned on
input_ids = tokenizer.encode(txt, return_tensors='pt')
# set no_repeat_ngram_size to 2
beam_output = model.generate(
    input_ids, 
    max_length=100, 
    num_beams=5, 
    no_repeat_ngram_size=2, 
    early_stopping=True
)

print("Output:\n" + 50 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
今天 已 经 是 你 的 第 667 次 签 到 啦 ～ 纱 雾 酱 对 乃 的 好 感 度 [ + 10 ] 2021 年 ， 要 加 油 哦 ~ ','签 到 ','@ \ u202e

NSDT 3DConvert

Convert 30+ 3D formats online: GLTF, GLB, GBX, OBJ, DAE, IFC, STEP, STL...

UnrealSynth

Unreal engine based photo realistic synthetic data generator for YOLO.

DreamTexture.js

AI powered 3d texture generation and projection SDK for three.js.