v3.1 may not have been the world's most accurate Safesearch model, but the v5.0 series will significantly outperform Google Safesearch and perhaps Bing Safesearch (at least on the benchmark). I will make sure that anyone can reproduce the results in the v5.0 series reports. Please look here for updates: https://huggingface.co/datasets/aistrova/cmad.
<details> <summary><strong> 🌐 Language </strong></summary>
<div style="column-count: 1;">
<div style="display: inline-block;">
</div>
<div style="display: inline-block;">
</div>
<div style="display: inline-block;">
</div>
</div>
</details>
AIstrova Safesearch v3.1 is an ultra-precise & efficient multi-class image classifier that accurately detects sexually suggestive or gory images & videos with near-zero false positives.
For website classification, please run the demo on your local computer and change `interactive=False` to `interactive=True`. Using the `requests` library produced incorrect output on the demo running on HuggingFace Space. In other words, the demo works perfectly fine on my laptop (in Canada) but not on the demo. This is because adult websites look like this on the demo: <img src="./image.png" alt="blocked" width="20%">
Click here for an official demo of this state-of-the-art model that can be customized based on personal preferences & sensitivity.
Statistics | |
---|---|
Epochs | 4 |
Optimizer | AdamW<br>lr=1e-5<br>weight_decay=1e-2 |
Training Images and GIFs | 4,210,000+ |
Architecture | efficientnet_b1_pruned |
Training Accuracy | 99.0% |
Validation Accuracy | 98.5% |
Training F1-score | 96.8% |
Training CrossEntropyLoss Loss | 0.104 |
Human Evaluation | The training accuracy and F1-score underestimates this model's performance on the dataset.<br>When evaluating the model on the test set, we found that incorrect predictions on the test set were mostly a result of mislabeled data.<br>In other words, this model identified mislabelled data during training and did not overfit. Instead, the model predicted what's actually correct. |
Drastic Improvements from v2 | 1. a more balanced training dataset (before data argumentation: ≈326,000 for nsfw_suggestive , ≈345,000 for safe , and ≈3,600 for nsfw_gore ) <br>2. a more robust optimization algorithm<br>3. a careful selection of the base model<br>4. the use of ≈5,000 handpicked AI-generated images for training |
<br>
Data Argumentation
I applied this transforms
function to the training set to enhance the model's ability to generalize on NSFW patterns, particularly with respect to rotated images/GIFs.
transform_train = transforms.Compose([
transforms.Resize((299, 299)),
transforms.RandomChoice([
transforms.RandomRotation(90),
transforms.RandomRotation(180),
transforms.RandomRotation(270),
transforms.RandomHorizontalFlip(p=0.1),
transforms.Lambda(lambda x: x),
transforms.Lambda(lambda x: x),
transforms.Lambda(lambda x: x)
]),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
<br>
PyTorch
You must request access by signing up for a HuggingFace account and agreeing to share your contact information with us at the top.
pip install huggingface_hub
import timm
from huggingface_hub import login
HUGGINGFACE_TOKEN = "" # https://huggingface.co/settings/tokens
login(HUGGINGFACE_TOKEN)
model = timm.create_model("hf_hub:aistrova/safesearch-v3.1", pretrained=True)
<br>
TensorFlow
Here's how to convert a timm
model to TensorFlow's SavedModel format.
pip install onnx onnx_tf
import timm
import torch
import tensorflow as tf
import onnx
from onnx_tf.backend import prepare
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = timm.create_model("hf_hub:aistrova/safesearch-v3.1", pretrained=True)
model.to(device)
# Export the model to ONNX
batch_size = 1
img_size = 299
sample_input = torch.rand((batch_size, 3, img_size, img_size)).to(device)
onnx_model_path = 'model.onnx'
torch.onnx.export(
model,
sample_input,
onnx_model_path,
verbose=False,
input_names=['input'],
output_names=['output'],
opset_version=12
)
# Convert the ONNX model to TensorFlow 2
tf_model_path = 'model_tf'
onnx_model = onnx.load(onnx_model_path)
tf_rep = prepare(onnx_model)
tf_rep.export_graph(tf_model_path)
# Load the converted model
model_tf = tf.saved_model.load(tf_model_path)
<br>
For a less accurate deep learning model with a more permissive license, please see v2.
Attribution
- For personal or research projects, please cite
aistrova/safesearch-v3.1
or "AIstrova Technologies Inc." in yourREADME.md
or project description.
Attribution-NonCommercial-ShareAlike (cc-by-nc-sa-4.0) lets others remix, adapt, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.
License
- The license is currently strictly
cc-by-nc-sa-4.0
, but may change toapache-2.0
in the future. - When the license is changed, the
apache-2.0
tag will replacecc-by-nc-sa-4.0
at the top of the page, below the model nameaistrova/safesearch-v3.1
.