|
--- |
|
base_model: |
|
- PixArt-alpha/PixArt-XL-2-1024-MS |
|
language: |
|
- en |
|
license: apache-2.0 |
|
pipeline_tag: image-to-image |
|
library_name: diffusers |
|
--- |
|
|
|
# π¨ Cobra |
|
|
|
**Efficient Line Art COlorization with BRoAder References** |
|
|
|
**Authors:** Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuanβ and Ying Shanβ |
|
|
|
<a href='https://zhuang2002.github.io/Cobra/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> |
|
<a href="https://github.com/Zhuang2002/Cobra"><img src="https://img.shields.io/badge/GitHub-Repository-black?logo=github"></a> |
|
<a href='https://huggingface.co/spaces/JunhaoZhuang/Cobra'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> |
|
<a href="https://arxiv.org/abs/2504.12240"><img src="https://img.shields.io/badge/arXiv-2504.12240-b31b1b.svg"></a> |
|
<a href="https://huggingface.co/JunhaoZhuang/Cobra"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a> |
|
|
|
**Your star means a lot for us to develop this project!** :star: |
|
|
|
<img src='https://zhuang2002.github.io/Cobra/fig/teaser.png'/> |
|
|
|
### π Abstract |
|
|
|
The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. |
|
A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. |
|
Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing challenges related to handling extensive reference images, time-consuming inference, and flexible control. |
|
We investigate the necessity of extensive contextual image guidance on the quality of line art colorization. To address these challenges, we introduce **Cobra**, an efficient and versatile method that supports color hints and utilizes **over 200 reference images** while maintaining low latency. |
|
Central to Cobra is a Causal Sparse DiT architecture, which leverages specially designed positional encodings, causal sparse attention, and Key-Value Cache to effectively manage long-context references and ensure color identity consistency. |
|
Results demonstrate that Cobra achieves accurate line art colorization through extensive contextual reference, significantly enhancing inference speed and interactivity, thereby meeting critical industrial demands. |
|
|
|
### π° News |
|
|
|
- **Release Date:** April 17, 2025 - The inference code and model weights have also been released! π |
|
|
|
### π TODO |
|
|
|
- β
Release inference code and model weights |
|
- β¬οΈ Release training code |
|
|
|
### π Getting Started |
|
|
|
Follow these steps to set up and run Cobra on your local machine: |
|
|
|
- **Clone the Repository** |
|
|
|
Download the code from our GitHub repository: |
|
```bash |
|
git clone https://github.com/zhuang2002/Cobra |
|
cd Cobra |
|
``` |
|
|
|
- **Set Up the Python Environment** |
|
|
|
Ensure you have Anaconda or Miniconda installed, then create and activate a Python environment and install required dependencies: |
|
```bash |
|
conda create -n cobra python=3.11.11 |
|
conda activate cobra |
|
pip install -r requirements.txt |
|
``` |
|
|
|
- **Run the Application** |
|
|
|
You can launch the Gradio interface for Cobra by running the following command: |
|
```bash |
|
python app.py |
|
``` |
|
|
|
- **Access Cobra in Your Browser** |
|
|
|
Open your browser and go to `http://localhost:7860`. If you're running the app on a remote server, replace `localhost` with your server's IP address or domain name. To use a custom port, update the `server_port` parameter in the `demo.launch()` function of app.py. |
|
|
|
### π Demo |
|
|
|
You can [try the demo](https://huggingface.co/spaces/JunhaoZhuang/Cobra) of Cobra on Hugging Face Space. |
|
|
|
### π οΈ Method |
|
|
|
The overview of Cobra. |
|
This figure depicts the framework of Cobra, which utilizes a large collection of retrieved reference images to guide the colorization of comic line art. The framework effectively manages an arbitrary number of contextual image references through localized reusable positional encoding, ensuring appropriate aspect ratios and resolutions. Additionally, the causal sparse DiT architecture processes long contextual references, enhancing identity preservation and color accuracy while reducing computational complexity. The integration of optional color hints further ensures user flexibility, culminating in high-quality coloring that is highly suitable for industrial applications. |
|
|
|
<img src="https://zhuang2002.github.io/Cobra/fig/flowchart.png" width="1000"> |
|
|
|
π€ We welcome your feedback, questions, or collaboration opportunities. Thank you for trying Cobra! |
|
|
|
### π Acknowledgments |
|
|
|
We would like to acknowledge the following open-source projects that have inspired and contributed to the development of Cobra: |
|
|
|
- **MangaLineExtraction_PyTorch**: https://github.com/ljsabc/MangaLineExtraction_PyTorch |
|
|
|
We are grateful for the valuable resources and insights provided by these projects. |
|
|
|
### π Contact |
|
|
|
- **Junhao Zhuang** |
|
Email: [[email protected]](mailto:[email protected]) |
|
|
|
### π Citation |
|
|
|
``` |
|
@misc{zhuang2025cobraefficientlineart, |
|
title={Cobra: Efficient Line Art COlorization with BRoAder References}, |
|
author={Junhao Zhuang and Lingen Li and Xuan Ju and Zhaoyang Zhang and Chun Yuan and Ying Shan}, |
|
year={2025}, |
|
eprint={2504.12240}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2504.12240}, |
|
} |
|
``` |