Overview

<!-- This model is obtained by finetuning Pre-Trained RoBERTa on dataset containing several sets of malicious prompts. Using this model, we can classify malicious prompts that can lead towards creation of phishing websites and phishing emails. This model is obtained by finetuning a Pre-Trained RoBERTa using a dataset encompassing multiple sets of malicious prompts, as detailed in the corresponding arXiv paper. Using this model, we can classify malicious prompts that can lead towards creation of phishing websites and phishing emails. -->

Our model, "Is it Phish?" is designed to identify malicious prompts that can be used to generate phishing websites and emails using popular commercial LLMs like ChatGPT, Bard and Claude. This model is obtained by finetuning a Pre-Trained RoBERTa using a dataset encompassing multiple sets of malicious prompts, as detailed in our corresponding arXiv paper

Try out "Is it Phish?" using the Inference API. Our model classifies prompts with "Label 1" to signify the identification of a phishing attempt, while "Label 0" denotes a prompt that is considered safe and non-malicious.

Dataset Details

The dataset utilized for training this model has been created using malicious prompts generated by GPT-4. Due to ethical concerns, our dataset is currently available only upon request.

Training Details

The model was trained using RobertaForSequenceClassification.from_pretrained. In this process, both the model and tokenizer pertinent to the RoBERTa-base were employed. We trained this model for 10 epochs, setting a learning rate to 2e-5, and used AdamW Optimizer.

Inference

There are multiple ways to use this model. The simplest way to use is with pipeline "text-classification"

from transformers import pipeline
classifier = pipeline(task="text-classification", model="phishbot/Isitphish", top_k=None)
prompt = ["Your Sample Sentence or Prompt...."]
model_outputs = classifier(prompt)
print(model_outputs[0])

Results

Achieved an accuracy of 96% with an F1-score of 0.96, on test sets distribution, explained in the paper.

Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> If you find Isitphish to be useful, please cite it with:

@misc{roy2023chatbots,
      title={From Chatbots to PhishBots? -- Preventing Phishing scams created using ChatGPT, Google Bard and Claude}, 
      author={Sayak Saha Roy and Poojitha Thota and Krishna Vamsi Naragam and Shirin Nilizadeh},
      year={2023},
      eprint={2310.19181},
      archivePrefix={arXiv},
      primaryClass={cs.CR}
}