Lumina-Yume-v0.1

This model is based on Lumina-Image-2.0, which is a DIT model with 2 billions parameter flow-based diffusion transformer. For more information, visit here.

Overview

This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.

Key Features:

Anime Support via Danbooru Tags: Easily generate anime-style images using familiar tagging systems.
Improved Spatial Accuracy: Enhanced ability to place objects and characters correctly based on detailed prompts.
Preserved General Knowledge: Maintains a broad understanding from the base model, ensuring flexibility across domains.

Limitations:

Text generation inside images is still inaccurate.
Output image quality is currently moderate and may vary depending on prompts.
Understanding of specific character prompts via Danbooru tags is limited.

Notes:

This is an experimental model and not the final release. I plan to update it with improved versions in the future.
This model has been fine-tuned by me to suit my personal preferences. As this is the model I have worked on individually, any feedback or suggestions for improvement would be highly appreciated. Your input will help me enhance future versions of the model. Thank you for your support!
The file LumiYume_v0.1_bf16.safetensors is an all-in-one file that contains the necessary weights for the VAE, text encoder, and image backbone to be used with ComfyUI.

Model Components & Training Details

Text Encoder: Pre-trained Gemma-2-2b
Variational Autoencoder: Pre-trained Flux.1 dev's VAE
Image Backbone: Fine-tune Lumina's Image Backbone

The model was trained on a dataset containing approximately 30 million images. This dataset includes:

Anime-style images labeled with Danbooru tags
Real human images collected from the internet
Images containing text (primarily short text snippets)
Images annotated with detailed instances location information to enhance spatial understanding

Usage

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("duongve/Lumina-Yume-v0.1", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "There are three girls. The girl on the left side has a long black hair and blue eyes. The girls on the right side has pink hair and yellow eyes. The girl on the middle has a short blonde hair and orange eyes. They are on the park"
image = pipe(
    prompt,
    height=1216,
    width=832,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=0.25,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior anime images with the superior degree of image-text alignment based on textual prompts or user prompts.",

).images[0]
image.save("luminayume_demo.png")

Suggestion

System Prompt: This help you generate your desired images more easily by understanding and aligning with your prompts.

For anime-style images using Danbooru tags:

 You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process. 

 You are an assistant designed to generate high-quality images based on user prompts and  danbooru tags.

For general use:

 You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.

 You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.

Recommended Settings

CFG: 3–6
Sampling Steps: 40-50
Sampler: Euler a

Acknowledgments

narugo1992 – for the invaluable Danbooru dataset
Alpha-VLLM - for creating the a wonderful model!
AngelBottomless and his team – for openly sharing their Lumina-Illustrious training experiments, which provided helpful insights during development.

duongve
/

Lumina-Yume-v0.1

Lumina-Yume-v0.1

Model tree for duongve/Lumina-Yume-v0.1