Command Line Interfaces (CLIs)

You can use TRL to fine-tune your Language Model with Supervised Fine-Tuning (SFT) or Direct Policy Optimization (DPO) or even chat with your model using the TRL CLIs.

Currently supported CLIs are:

trl sft: fine-tune a LLM on a text/instruction dataset
trl dpo: fine-tune a LLM with DPO on a preference dataset
trl chat: quickly spin up a LLM fine-tuned for chatting

Fine-tuning with the CLI

Before getting started, pick up a Language Model from Hugging Face Hub. Supported models can be found with the filter “text-generation” within models. Also make sure to pick up a relevant dataset for your task.

Before using the sft or dpo commands make sure to run:

accelerate config

and pick up the right configuration for your training setup (single / multi-GPU, DeepSpeed, etc.). Make sure to complete all steps of accelerate config before running any CLI command.

We also recommend you passing a YAML config file to configure your training protocol. Below is a simple example of a YAML file that you can use for training your models with trl sft command.

model_name_or_path:
  HuggingFaceM4/tiny-random-LlamaForCausalLM
dataset_name:
  imdb
dataset_text_field:
  text
report_to:
  none
learning_rate:
  0.0001
lr_scheduler_type:
  cosine

Save that config in a .yaml and get directly started ! Note you can overwrite the arguments from the config file by explicitly passing them to the CLI, e.g.:

trl sft --config example_config.yaml --output_dir test-trl-cli --lr_scheduler_type cosine_with_restarts

Will force-use cosine_with_restarts for lr_scheduler_type.

Supported Arguments

We do support all arguments from transformers.TrainingArguments, for loading your model, we support all arguments from ~trl.ModelConfig:

class trl.ModelConfig

< source >

( model_name_or_path: Optional = None model_revision: str = 'main' torch_dtype: Optional = None trust_remote_code: bool = False attn_implementation: Optional = None use_peft: bool = False lora_r: Optional = 16 lora_alpha: Optional = 32 lora_dropout: Optional = 0.05 lora_target_modules: Optional = None lora_modules_to_save: Optional = None lora_task_type: str = 'CAUSAL_LM' load_in_8bit: bool = False load_in_4bit: bool = False bnb_4bit_quant_type: Optional = 'nf4' use_bnb_nested_quant: bool = False )

Arguments which define the model and tokenizer to load.

You can pass any of these arguments either to the CLI or the YAML file.

Supervised Fine-tuning (SFT)

Follow the basic instructions above and run trl sft --output_dir <output_dir> <*args>:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

The SFT CLI is based on the examples/scripts/sft.py script.

Direct Policy Optimization (DPO)

First, follow the basic instructions above and run trl dpo --output_dir <output_dir> <*args>. Make sure to process your DPO dataset in the TRL format as follows:

1- Make sure to pre-tokenize the dataset using chat templates:

python examples/datasets/tokenize_ds.py --model gpt2 --dataset yourdataset

You might need to adapt the examples/datasets/tokenize_ds.py to use yout chat template

2- Format the dataset into TRL format (you can adapt the examples/datasets/anthropic_hh.py):

python examples/datasets/anthropic_hh.py --push_to_hub --hf_entity your-hf-org

Once your dataset being pushed, run the dpo CLI as follows:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

The SFT CLI is based on the examples/scripts/dpo.py script.

Chat interface

The chat CLI lets you quickly load the model and talk to it. Simply run the following:

trl chat --model_name_or_path  Qwen/Qwen1.5-0.5B-Chat

Note that the chat interface relies on the chat template of the tokenizer to format the inputs for the model. Make sure your tokenizer has a chat template defined.

Besides talking to the model there are a few commands you can use:

clear: clears the current conversation and start a new one
example {NAME}: load example named {NAME} from the config and use it as the user input
set {SETTING_NAME}={SETTING_VALUE};: change the system prompt or generation settings (multiple settings are separated by a ’;’).
reset: same as clear but also resets the generation configs to defaults if they have been changed by set
save {SAVE_NAME} (optional): save the current chat and settings to file by default to ./chat_history/{MODEL_NAME}/chat_{DATETIME}.yaml or {SAVE_NAME} if provided
exit: closes the interface

The default examples are defined in examples/scripts/config/default_chat_config.yaml but you can pass your own with --config CONIG_FILE where you can also specify the default generation parameters.