|
--- |
|
title: README |
|
emoji: 🚀 |
|
colorFrom: gray |
|
colorTo: purple |
|
sdk: static |
|
pinned: false |
|
--- |
|
# Multi-Domain Expert Learning (M*DEL)**: |
|
## How to increase knowledge without breaking the bank? |
|
|
|
[Ontocord.AI](https://huggingface.co/ontocord) and the open science community. |
|
|
|
M*DEL is a volunteer open science community for creating better mixtures of experts with volunteers from: |
|
Bedrock AI, TurkuNLP, ETH, Redmond.AI, Incite, MICS CentraleSupelec, Centro de Excelência em Inteligência Artificial, VietAI, Technion - Israel Institute of Technology, Nous Research, University of Western Australia, KoboldAI Community, LAION.AI, Mila, Luleå University of Technology, Juelich Supercomputing Center, Tokyo Tech, RIKEN, Together |
|
|
|
- [Try out our current proof of concept](https://huggingface.co/Multi-Domain-Expert-Layers/meow_1b/) |
|
|
|
OSS AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way. |
|
|
|
The proposed method that we call Multi-Domain Expert Learning (MDEL) involves branching from a base model, training each branch independently on a specific domain for specific layers or other adapters, and merging the trained models at the end. Additionally, the specific layers or adapters are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system. |
|
|
|
In this effort, we seek international labs and open science aligned researchers and companies in various countries to each train a set of domain experts of their choosing, thereby enabling international participation and knowledge sharing. This will also result in lower costs for training and a lower environmental impact due to reuse and lower energy usage. Currently we have volunteers from four continents and are looking for more. |
|
|
|
We will be using a varient of the c-BTM (https://arxiv.org/pdf/2303.14177v1.pdf) method and will be focusing on models ranging from 7-70B parameters. |
|
|
|
## In some of our models, we will also be adding multi-lingual, multi-modal abilities for both understanding and generation with context lengths of 8K-35K tokens. |
|
|
|
Languages will include: hi, vi, en, ja, fi. We may add others if compute is available. |
|
|
|
If you are interested in contributing to this project, please reach out to us and learn more about how you can get involved at [email protected]. |
|
|
|
Let's work together to create open-source models that benefit everyone! 🤝 #AI #MDEL #Supercomputers #Summit #OpenSource #Innovation #VolunteersNeeded #OpenScience #DemocratizeAI |
|
|
|
## Requirements for joining this HF Repo: By joining this hf repo, you agree that you will not disclose any data we are gathering or ideas we present in our community channels until after a paper has been written. |
|
This protects the intellecctual freedom of researchers and their right to publish and benefit from their work. |
|
|
|
** Why did we change the term "Layer" to "Learning"? Because we are exploring, in addition to layerwise experts, also working with different adapters and architecture like Flamingo (https://arxiv.org/abs/2204.14198), EMU (https://arxiv.org/abs/2307.05222) and a novel multi-node architecture for training loras we call lora-x, which will allow us to swap out different component experts to improve the performance of the model. |