Lumina-Yume-v0.1

Lumina-Yume-v0.1 This model is based on Lumina-Image-2.0, which is a DIT model with 2 billions parameter flow-based diffusion transformer. For more information, visit here.


  1. Overview

This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model.

Key Features:

  • Anime Support via Danbooru Tags: Easily generate anime-style images using familiar tagging systems.
  • Improved Spatial Accuracy: Enhanced ability to place objects and characters correctly based on detailed prompts.
  • Preserved General Knowledge: Maintains a broad understanding from the base model, ensuring flexibility across domains.

Limitations:

  • Text generation inside images is still inaccurate.
  • Output image quality is currently moderate and may vary depending on prompts.
  • Understanding of specific character prompts via Danbooru tags is limited.

Notes:

  • This is an experimental model and not the final release. I plan to update it with improved versions in the future.

  • This model has been fine-tuned by me to suit my personal preferences. As this is the model I have worked on individually, any feedback or suggestions for improvement would be highly appreciated. Your input will help me enhance future versions of the model. Thank you for your support!

  • The file LumiYume_v0.1_bf16.safetensors is an all-in-one file that contains the necessary weights for the VAE, text encoder, and image backbone to be used with ComfyUI.


  1. Model Components & Training Details
  • Text Encoder: Pre-trained Gemma-2-2b
  • Variational Autoencoder: Pre-trained Flux.1 dev's VAE
  • Image Backbone: Fine-tune Lumina's Image Backbone

The model was trained on a dataset containing approximately 30 million images. This dataset includes:

  • Anime-style images labeled with Danbooru tags
  • Real human images collected from the internet
  • Images containing text (primarily short text snippets)
  • Images annotated with detailed instances location information to enhance spatial understanding

  1. Usage
import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("duongve/Lumina-Yume-v0.1", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "There are three girls. The girl on the left side has a long black hair and blue eyes. The girls on the right side has pink hair and yellow eyes. The girl on the middle has a short blonde hair and orange eyes. They are on the park"
image = pipe(
    prompt,
    height=1216,
    width=832,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=0.25,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(0),
    system_prompt="You are an assistant designed to generate superior anime images with the superior degree of image-text alignment based on textual prompts or user prompts.",

).images[0]
image.save("luminayume_demo.png")
  1. Suggestion

System Prompt: This help you generate your desired images more easily by understanding and aligning with your prompts.

For anime-style images using Danbooru tags:

 You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process. 

 You are an assistant designed to generate high-quality images based on user prompts and  danbooru tags.

For general use:

 You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.

 You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.

Recommended Settings

  • CFG: 3–6
  • Sampling Steps: 40-50
  • Sampler: Euler a

  1. Acknowledgments
  • narugo1992 – for the invaluable Danbooru dataset
  • Alpha-VLLM - for creating the a wonderful model!
  • AngelBottomless and his team – for openly sharing their Lumina-Illustrious training experiments, which provided helpful insights during development.
Downloads last month
61
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for duongve/Lumina-Yume-v0.1

Finetuned
(4)
this model