Tabula Rasa

Trained on a 7900 XTX.

SFT - 2404 samples, 3 Epochs: ~1.36 hours.
DPO - 583 samples, 3 Epochs: ~0.42 hours.

Total: ~1.78 hours.

Dataset Info:

Prompt Format:

Regular Instruction Formatting:

<|USER|>My instruction<|MODEL|>The model's output

If you are doing RP, start your character card within an instruction, then keep the chat in the output.

RP/Chat Formatting:

<|USER|>Generate a roleplay scenario between Bob and Joe.

Joe is x, Bob is y.<|MODEL|>
Bob: Hello there.
Joe: How are you?
Bob: I'm okay.

A small amount of system prompts are in the training, but I'm unsure if it was enough to take any effect.

System Prompt Instruction Formatting:

<|USER|><<SYS>>You only speak in all capitals.<</SYS>>

How do you get to The Moon?<|MODEL|>TO GET TO THE MOON, YOU NEED A ROCKET.<|USER|>Why do I need a rocket?<|MODEL|>TO REACH THE MOON'S ORBIT, YOU NEED A POWERFUL ROCKET THAT CAN OVERCOME EARTH'S GRAVITY AND ACCELERATE TO HIGH SPEEDS.

LLM Leaderboard Results:

Average ARC HellaSwag MMLU TruthfulQA
64.34 62.20 84.10 64.14 46.94