<style> .title-container { display: flex; justify-content: center; align-items: center; height: 100vh; /* Adjust this value to position the title vertically */ } .title { font-size: 3em; text-align: center; color: #333; font-family: Arial, sans-serif; text-transform: uppercase; letter-spacing: 0.05em; padding: 0.5em 0; box-shadow: 0px 0px 20px 0px rgba(0,0,0,0.15); background: transparent; } .title span { background: -webkit-linear-gradient(45deg, #fe6b8b 30%, #ff8e53 90%); -webkit-background-clip: text; -webkit-text-fill-color: transparent; } .image-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 0.5em; } .image-item { box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15); padding: 10px; } .image-item img { width: 100%; height: 100%; object-fit: cover; border-radius: 10px; transition: transform .2s; } .image-item img:hover { transform: scale(1.1); } .custom-table { table-layout: fixed; width: 100%; border-collapse: collapse; } .custom-table td { width: 50%; vertical-align: top; padding: 10px; box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15); } .custom-image { width: 100%; height: 100%; object-fit: cover; border-radius: 10px; transition: transform .2s; } .custom-image:hover { transform: scale(1.1); } </style>
<h1 class="title"><span>Hermitage XL</span></h1>
<div class="image-grid"> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample1.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample1.png"> </a> </div> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample2.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample2.png"> </a> </div> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample3.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample3.png"> </a> </div> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample4.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample4.png"> </a> </div> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample5.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample5.png"> </a> </div> <div class="image-item"> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample6.png"> <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample6.png"> </a> </div> </div>
<hr>
Overview
Hermitage XL is a high-resolution, latent text-to-image diffusion model. The model has been fine-tuned using a learning rate of 4e-7 over 5000 steps with a batch size of 16 on a curated dataset of superior-quality anime-style images. This model is derived from Stable Diffusion XL 1.0.
e.g. 1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden
- Use it with the
Stable Diffusion Webui
- Use it with 🧨
diffusers
- Use it with the
ComfyUI
<hr>
Features
- High-Resolution Images: The model trained with 1024x1024 resolution. The model is trained using NovelAI Aspect Ratio Bucketing Tool so that it can be trained at non-square resolutions.
- Anime-styled Generation: Based on given text prompts, the model can create high quality anime-styled images.
- Fine-Tuned Diffusion Process: The model utilizes a fine-tuned diffusion process to ensure high quality and unique image output.
<hr>
Model Details
- Developed by: Linaqruf
- Model type: Diffusion-based text-to-image generative model
- Model Description: This is a model that can be used to generate and modify anime-themed images based on text prompts.
- License: CreativeML Open RAIL++-M License
- Finetuned from model: Stable Diffusion XL 1.0 <hr>
How to Use:
- Download
Hermitage XL
here, the model is in.safetensors
format. - You need to use Danbooru-style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime
- You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse:
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
- And, the following should also be prepended to prompts to get high aesthetic results:
masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details
<hr>
🧨 Diffusers
Make sure to upgrade diffusers to >= 0.18.2:
pip install diffusers --upgrade
In addition make sure to install transformers
, safetensors
, accelerate
as well as the invisible watermark:
pip install invisible_watermark transformers accelerate safetensors
Running the pipeline (if you don't swap the scheduler it will run with the default EulerDiscreteScheduler in this example we are swapping it to EulerAncestralDiscreteScheduler:
import torch
from torch import autocast
from diffusers.models import AutoencoderKL
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
model = "Linaqruf/hermitage-xl"
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
pipe = StableDiffusionXLPipeline.from_pretrained(
model,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
vae=vae
)
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')
prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
guidance_scale=12,
target_size=(1024,1024),
original_size=(4096,4096),
num_inference_steps=50
).images[0]
image.save("anime_girl.png")
<hr>
Limitation
- This model inherit Stable Diffusion XL 1.0 limitation
- This model is overfitted and cannot follow prompts well, because it's fine-tuned for 5000 steps with small scale datasets.
- It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0
<hr>
Example
Here is some cherrypicked samples and comparison between available models:
<table class="custom-table"> <tr> <td> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image1.png"> <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image1.png" alt="sample1"> </a> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image3.png"> <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image3.png" alt="sample3"> </a> </td> <td> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image2.png"> <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image2.png" alt="sample2"> </a> <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image4.png"> <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image4.png" alt="sample4"> </a> </td> </tr> </table>