Spaces:

transformers-community
/

support

Running

App Files Files Community

The Transformers Library: standardizing model definitions

by lysandre - opened 1 day ago

Discussion

lysandre

Transformers Community org 1 day ago

•

edited 1 day ago

Hey 👋

Yesterday, we publicly posted about the new strategic direction that we wanted for transformers: "Going forward, we're aiming for Transformers to be the pivot across frameworks: if a model architecture is supported by transformers, you can expect it to be supported in the rest of the ecosystem."

This is the accomplishment of months of work with many, many open-source entities. In a way, it isn't so much a new direction rather than writing down the direction that we have been aiming for over the past few months; what this means, is that going forward we'll aim for even further community engagement, third-party integrations, and for transformers to become an even better, customizable, malleable building block across the ecosystem.

We discuss it in the following blogpost: The Transformers Library: standardizing model definitions.

We open this discussion today to gather comments, updates, potential improvements, and more generally to discuss this direction with you all.

lysandre

Transformers Community org 1 day ago

The first step towards this is to significantly simplify model contributions. I'm sharing here a few PRs/long-term objectives that we have done over the past few months that go in this direction. The goal is to go from model PRs that modify 20 files and add between 2k to 10k new LoCs, to PRs that modify 5 files, and add ~400-600 new LoCs for a new model.

First of all, this line of thinking introduced the Modular approach that we detail in the following documentation article.

This seems to go against what we just mentioned as it adds another file in the PRs; but this removes a significant barrier to contribution and maintenance, as that modular file is now the only modeling file that needs to be checked on these PRs.

A good example of this is the Qwen3 PR -> on first look, it adds a new +1200 LoC modeling file for Qwen3, and +1400 LoC modeling file for Qwen3Moe. However, the modular files(200 LoCs for Qwen3 and 400 LoCs for Qwen3Moe) are the only files actually being written by the contributors, and the only ones being reviewed by maintainers.

This approach isn't perfect yet and modular still has some rough edge-cases; @ArthurZ and @cyrilvallez are constantly improving that code with the help of model contributors.

lysandre

Transformers Community org 1 day ago

Following the above, we've also been reworking several core components of model contributions to make them simpler. The following is a non-exhaustive list of some refactors/simplifications having been done over the past few months:

Refactoring the attentions to offer a standardized interface: #35235, by @ArthurZ
Reworking the docstrings by abstracting/factorizing the docstring addition: #33771, by @yonigozlan
Abstracting away the return_dict to keep the modeling code simple: #36794, by @qubvel-hf
Simplifying the gradient checkpointing system: #37223, by @qubvel-hf
Rewrite of the cache system to be much more intuitive: #37866, by @cyrilvallez
Uniformizing model processors/Unpack: #31368, amongst others, by @molbap , @yonigozlan
Simplifying the __init__,py files while keeping lazy loading, as well as cleaner addition to exportable objects: #31329, #35167, #35170, #35238, #37653

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment