vision image-to-text image-captioning

Model Details

Model Description

Git-base-One-Piece is a fine-tuned variant of Microsoft's git-base model, specifically trained for the task of generating descriptive text captions for images from the One-Piece-anime-captions dataset.

The dataset consists of 856 {image: caption} pairs, providing a substantial and diverse training corpus for the model.

The model is conditioned on both CLIP image tokens and text tokens and employs a teacher forcing training approach. It predicts the next text token while considering the context provided by the image and previous text tokens.

image/jpeg

Limitations

Usage

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-to-text", model="ayoubkirouane/git-base-One-Piece")

or

# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("ayoubkirouane/git-base-One-Piece")
model = AutoModelForCausalLM.from_pretrained("ayoubkirouane/git-base-One-Piece")