text-classification

Hyperion 🏹

Hyperion is an extremely lightweight (435M parameters) RoBERTa-based binary classifier that detects jailbreak/prompt injection attempts with 88% accuracy based on test cases.

We are continously releasing open-source models created during our research on prompt injection & model alignment. These models are not state of the art, but a very limited preview of our current capabilities. Hyperion was one of our very early tests in our process to build infrastructure for real time jailbreak detection. Smaller models and lightweight infrastructure are cheaper and faster to provide rapid responses to the emerging cat and mouse game for adversarial prompting. To learn more about us, visit our website!

Intended Use

Training Data

Validation Metrics

Considerations

Caveats and Recommendations