v3.1 may not have been the world's most accurate Safesearch model, but the v5.0 series will significantly outperform Google Safesearch and perhaps Bing Safesearch (at least on the benchmark). I will make sure that anyone can reproduce the results in the v5.0 series reports. Please look here for updates: https://huggingface.co/datasets/aistrova/cmad.

<details> <summary><strong> 🌐 Language </strong></summary>

<div style="column-count: 1;">

<div style="display: inline-block;">

</div>

<div style="display: inline-block;">

</div>

<div style="display: inline-block;">

</div>

</div>

</details>

AIstrova Safesearch v3.1 is an ultra-precise & efficient multi-class image classifier that accurately detects sexually suggestive or gory images & videos with near-zero false positives.

For website classification, please run the demo on your local computer and change `interactive=False` to `interactive=True`. Using the `requests` library produced incorrect output on the demo running on HuggingFace Space. In other words, the demo works perfectly fine on my laptop (in Canada) but not on the demo. This is because adult websites look like this on the demo: <img src="./image.png" alt="blocked" width="20%">

Click here for an official demo of this state-of-the-art model that can be customized based on personal preferences & sensitivity.

Statistics
Epochs 4
Optimizer AdamW<br>lr=1e-5<br>weight_decay=1e-2
Training Images and GIFs 4,210,000+
Architecture efficientnet_b1_pruned
Training Accuracy 99.0%
Validation Accuracy 98.5%
Training F1-score 96.8%
Training CrossEntropyLoss Loss 0.104
Human Evaluation The training accuracy and F1-score underestimates this model's performance on the dataset.<br>When evaluating the model on the test set, we found that incorrect predictions on the test set were mostly a result of mislabeled data.<br>In other words, this model identified mislabelled data during training and did not overfit. Instead, the model predicted what's actually correct.
Drastic Improvements from v2 1. a more balanced training dataset (before data argumentation: ≈326,000 for nsfw_suggestive, ≈345,000 for safe, and ≈3,600 for nsfw_gore) <br>2. a more robust optimization algorithm<br>3. a careful selection of the base model<br>4. the use of ≈5,000 handpicked AI-generated images for training

<br>

Data Argumentation

I applied this transforms function to the training set to enhance the model's ability to generalize on NSFW patterns, particularly with respect to rotated images/GIFs.

transform_train = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.RandomChoice([
        transforms.RandomRotation(90),
        transforms.RandomRotation(180),
        transforms.RandomRotation(270),
        transforms.RandomHorizontalFlip(p=0.1),
        transforms.Lambda(lambda x: x),
        transforms.Lambda(lambda x: x),
        transforms.Lambda(lambda x: x)
    ]),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

<br>

PyTorch

You must request access by signing up for a HuggingFace account and agreeing to share your contact information with us at the top.

pip install huggingface_hub
import timm
from huggingface_hub import login

HUGGINGFACE_TOKEN = "" # https://huggingface.co/settings/tokens

login(HUGGINGFACE_TOKEN)
model = timm.create_model("hf_hub:aistrova/safesearch-v3.1", pretrained=True)

<br>

TensorFlow

Here's how to convert a timm model to TensorFlow's SavedModel format.

pip install onnx onnx_tf
import timm
import torch
import tensorflow as tf
import onnx
from onnx_tf.backend import prepare

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = timm.create_model("hf_hub:aistrova/safesearch-v3.1", pretrained=True)
model.to(device)

# Export the model to ONNX
batch_size = 1
img_size = 299
sample_input = torch.rand((batch_size, 3, img_size, img_size)).to(device)
onnx_model_path = 'model.onnx'
torch.onnx.export(
    model,
    sample_input,
    onnx_model_path,
    verbose=False,
    input_names=['input'],
    output_names=['output'],
    opset_version=12
)

# Convert the ONNX model to TensorFlow 2
tf_model_path = 'model_tf'
onnx_model = onnx.load(onnx_model_path)
tf_rep = prepare(onnx_model)
tf_rep.export_graph(tf_model_path)

# Load the converted model
model_tf = tf.saved_model.load(tf_model_path)

<br>

For a less accurate deep learning model with a more permissive license, please see v2.

Attribution

Attribution-NonCommercial-ShareAlike (cc-by-nc-sa-4.0) lets others remix, adapt, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

License