The Transformers Library: standardizing model definitions

#7
by lysandre - opened
Transformers Community org
โ€ข
edited 1 day ago

Hey ๐Ÿ‘‹

Yesterday, we publicly posted about the new strategic direction that we wanted for transformers: "Going forward, we're aiming for Transformers to be the pivot across frameworks: if a model architecture is supported by transformers, you can expect it to be supported in the rest of the ecosystem."

This is the accomplishment of months of work with many, many open-source entities. In a way, it isn't so much a new direction rather than writing down the direction that we have been aiming for over the past few months; what this means, is that going forward we'll aim for even further community engagement, third-party integrations, and for transformers to become an even better, customizable, malleable building block across the ecosystem.

We discuss it in the following blogpost: The Transformers Library: standardizing model definitions.

We open this discussion today to gather comments, updates, potential improvements, and more generally to discuss this direction with you all.

transformers-thumbnail.png

Transformers Community org

The first step towards this is to significantly simplify model contributions. I'm sharing here a few PRs/long-term objectives that we have done over the past few months that go in this direction. The goal is to go from model PRs that modify 20 files and add between 2k to 10k new LoCs, to PRs that modify 5 files, and add ~400-600 new LoCs for a new model.

First of all, this line of thinking introduced the Modular approach that we detail in the following documentation article.

This seems to go against what we just mentioned as it adds another file in the PRs; but this removes a significant barrier to contribution and maintenance, as that modular file is now the only modeling file that needs to be checked on these PRs.

A good example of this is the Qwen3 PR -> on first look, it adds a new +1200 LoC modeling file for Qwen3, and +1400 LoC modeling file for Qwen3Moe. However, the modular files(200 LoCs for Qwen3 and 400 LoCs for Qwen3Moe) are the only files actually being written by the contributors, and the only ones being reviewed by maintainers.

This approach isn't perfect yet and modular still has some rough edge-cases; @ArthurZ and @cyrilvallez are constantly improving that code with the help of model contributors.

Transformers Community org

Following the above, we've also been reworking several core components of model contributions to make them simpler. The following is a non-exhaustive list of some refactors/simplifications having been done over the past few months:

Sign up or log in to comment