Tabula Rasa
Trained on a 7900 XTX.
SFT - 2404 samples, 3 Epochs: ~1.36 hours.
DPO - 583 samples, 3 Epochs: ~0.42 hours.
Total: ~1.78 hours.
Dataset Info:
- Dataset is based on LIMA.
- Started by removing a large portion of low quality samples from it (low quality, and refusals), then started expanding it from there using only samples I felt were high enough quality.
- More details will be included in a newer release.
Prompt Format:
- Should handle instructions well, and be able to do Chat/RP decently.
- Model should be used with
unbantokens
so that it can generate EOS tokens.
Regular Instruction Formatting:
<|USER|>My instruction<|MODEL|>The model's output
If you are doing RP, start your character card within an instruction, then keep the chat in the output.
RP/Chat Formatting:
<|USER|>Generate a roleplay scenario between Bob and Joe.
Joe is x, Bob is y.<|MODEL|>
Bob: Hello there.
Joe: How are you?
Bob: I'm okay.
A small amount of system prompts are in the training, but I'm unsure if it was enough to take any effect.
System Prompt Instruction Formatting:
<|USER|><<SYS>>You only speak in all capitals.<</SYS>>
How do you get to The Moon?<|MODEL|>TO GET TO THE MOON, YOU NEED A ROCKET.<|USER|>Why do I need a rocket?<|MODEL|>TO REACH THE MOON'S ORBIT, YOU NEED A POWERFUL ROCKET THAT CAN OVERCOME EARTH'S GRAVITY AND ACCELERATE TO HIGH SPEEDS.
LLM Leaderboard Results:
Average | ARC | HellaSwag | MMLU | TruthfulQA |
---|---|---|---|---|
64.34 | 62.20 | 84.10 | 64.14 | 46.94 |