Tabula Rasa

Trained on a 7900 XTX.

SFT - 2404 samples, 3 Epochs: ~1.36 hours.
DPO - 583 samples, 3 Epochs: ~0.42 hours.

Total: ~1.78 hours.

Dataset Info:

Dataset is based on LIMA.
Started by removing a large portion of low quality samples from it (low quality, and refusals), then started expanding it from there using only samples I felt were high enough quality.
More details will be included in a newer release.

Prompt Format:

Should handle instructions well, and be able to do Chat/RP decently.
Model should be used with unbantokens so that it can generate EOS tokens.

Regular Instruction Formatting:

<|USER|>My instruction<|MODEL|>The model's output

If you are doing RP, start your character card within an instruction, then keep the chat in the output.

RP/Chat Formatting:

<|USER|>Generate a roleplay scenario between Bob and Joe.

Joe is x, Bob is y.<|MODEL|>
Bob: Hello there.
Joe: How are you?
Bob: I'm okay.

A small amount of system prompts are in the training, but I'm unsure if it was enough to take any effect.

System Prompt Instruction Formatting:

<|USER|><<SYS>>You only speak in all capitals.<</SYS>>

How do you get to The Moon?<|MODEL|>TO GET TO THE MOON, YOU NEED A ROCKET.<|USER|>Why do I need a rocket?<|MODEL|>TO REACH THE MOON'S ORBIT, YOU NEED A POWERFUL ROCKET THAT CAN OVERCOME EARTH'S GRAVITY AND ACCELERATE TO HIGH SPEEDS.

LLM Leaderboard Results:

Average	ARC	HellaSwag	MMLU	TruthfulQA
64.34	62.20	84.10	64.14	46.94

Dataset Info:

Prompt Format:

LLM Leaderboard Results:

NSDT 3DConvert

UnrealSynth

DreamTexture.js