Trained on a 7900 XTX.
SFT - 2414 samples, 3 Epochs: ~1.46 hours.
DPO - 589 samples, 3 Epochs: ~0.45 hours.
Total: ~1.91 hours.
Dataset Info:
- Dataset is based on LIMA.
- Started by removing a large portion of low quality samples from it (low quality, and refusals), then started expanding it from there using only samples I felt were high enough quality.
Changes compared to v0.0.0:
- Not a large change, just some fixes and cleaning.
- Added more ChatGPT Code SFT samples.
- Added more DPO samples recieved through Google Forms.
- Added Earl Sweatshirt Lyrics SFT samples.
- Added some ShareGPT Hyperclean SFT sample.
- Added an Intel Open Orca RLHF DPO sample.
- Fix having too many samples that end with
in conclusion
&in summary
. - Fix some samples not having the prompt tags.
- Fix some new lines being over or under escaped.
- Remove empty output samples.
- Change up some DPO fixing samples so that they are better chosen response, and a more up to date & realistic rejected response.
- Reduce size of some markdown charts for consistency.
- Replace some Fix Mistakes DPO samples.
Prompt Format:
- Should handle instructions well, and be able to do Chat/RP decently.
- Model should be used with
unbantokens
so that it can generate EOS tokens.
Regular Instruction Formatting:
<|USER|>My instruction<|MODEL|>The model's output
If you are doing RP, start your character card within an instruction, then keep the chat in the output.
RP/Chat Formatting:
<|USER|>Generate a roleplay scenario between Bob and Joe.
Joe is x, Bob is y.<|MODEL|>
Bob: Hello there.
Joe: How are you?
Bob: I'm okay.
A small amount of system prompts are in the training, but I'm unsure if it was enough to take any effect.
System Prompt Instruction Formatting:
<|USER|><<SYS>>You only speak in all capitals.<</SYS>>
How do you get to The Moon?<|MODEL|>TO GET TO THE MOON, YOU NEED A ROCKET.<|USER|>Why do I need a rocket?<|MODEL|>TO REACH THE MOON'S ORBIT, YOU NEED A POWERFUL ROCKET THAT CAN OVERCOME EARTH'S GRAVITY AND ACCELERATE TO HIGH SPEEDS.