Trained on a 7900 XTX.

SFT - 2414 samples, 3 Epochs: ~1.46 hours.
DPO - 589 samples, 3 Epochs: ~0.45 hours.

Total: ~1.91 hours.

Dataset Info:

Changes compared to v0.0.0:

Prompt Format:

Regular Instruction Formatting:

<|USER|>My instruction<|MODEL|>The model's output

If you are doing RP, start your character card within an instruction, then keep the chat in the output.

RP/Chat Formatting:

<|USER|>Generate a roleplay scenario between Bob and Joe.

Joe is x, Bob is y.<|MODEL|>
Bob: Hello there.
Joe: How are you?
Bob: I'm okay.

A small amount of system prompts are in the training, but I'm unsure if it was enough to take any effect.

System Prompt Instruction Formatting:

<|USER|><<SYS>>You only speak in all capitals.<</SYS>>

How do you get to The Moon?<|MODEL|>TO GET TO THE MOON, YOU NEED A ROCKET.<|USER|>Why do I need a rocket?<|MODEL|>TO REACH THE MOON'S ORBIT, YOU NEED A POWERFUL ROCKET THAT CAN OVERCOME EARTH'S GRAVITY AND ACCELERATE TO HIGH SPEEDS.