TRL documentation

Command Line Interfaces (CLIs)

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.17.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Command Line Interfaces (CLIs)

TRL provides a powerful command-line interface (CLI) to fine-tune large language models (LLMs) using methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and more. The CLI abstracts away much of the boilerplate, letting you launch training jobs quickly and reproducibly.

Currently supported commands are:

Training Commands

  • trl dpo: fine-tune a LLM with DPO
  • trl grpo: fine-tune a LLM with GRPO
  • trl kto: fine-tune a LLM with KTO
  • trl sft: fine-tune a LLM with SFT

Other Commands

  • trl env: get the system information
  • trl vllm-serve: serve a model with vLLM

Fine-Tuning with the TRL CLI

Basic Usage

You can launch training directly from the CLI by specifying required arguments like the model and dataset:

SFT
DPO
trl sft \
  --model_name_or_path Qwen/Qwen2.5-0.5B \
  --dataset_name stanfordnlp/imdb

Using Configuration Files

To keep your CLI commands clean and reproducible, you can define all training arguments in a YAML configuration file:

SFT
DPO
# sft_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: stanfordnlp/imdb

Launch with:

trl sft --config sft_config.yaml

Scaling Up with Accelerate

TRL CLI natively supports 🤗 Accelerate, making it easy to scale training across multiple GPUs, machines, or use advanced setups like DeepSpeed — all from the same CLI.

You can pass any accelerate launch arguments directly to trl, such as --num_processes. For more information see Using accelerate launch.

SFT inline
SFT w/ config file
DPO inline
DPO w/ config file
trl sft \
  --model_name_or_path Qwen/Qwen2.5-0.5B \
  --dataset_name stanfordnlp/imdb \
  --num_processes 4

Using --accelerate_config for Accelerate Configuration

The --accelerate_config flag lets you easily configure distributed training with 🤗 Accelerate. This flag accepts either:

  • the name of a predefined config profile (built into TRL), or
  • a path to a custom Accelerate YAML config file.

Predefined Config Profiles

TRL provides several ready-to-use Accelerate configs to simplify common training setups:

Name Description
fsdp1 Fully Sharded Data Parallel Stage 1
fsdp2 Fully Sharded Data Parallel Stage 2
zero1 DeepSpeed ZeRO Stage 1
zero2 DeepSpeed ZeRO Stage 2
zero3 DeepSpeed ZeRO Stage 3
multi_gpu Multi-GPU training
single_gpu Single-GPU training

To use one of these, just pass the name to --accelerate_config. TRL will automatically load the corresponding config file from trl/accelerate_config/.

Example Usage

SFT inline
SFT w/ config file
DPO inline
DPO w/ config file
trl sft \
  --model_name_or_path Qwen/Qwen2.5-0.5B \
  --dataset_name stanfordnlp/imdb \
  --accelerate_config zero2  # or path/to/my/accelerate/config.yaml

Chat Interface

The chat interface is deprecated and will be removed in TRL 0.19. Use transformers-cli chat instead. For more information, see the Transformers documentation, chat with text generation models.

The chat CLI lets you quickly load the model and talk to it. Simply run the following:

$ trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat 
<quentin_gallouedec>:
What is the best programming language?

<Qwen/Qwen1.5-0.5B-Chat>:
There isn't a "best" programming language, as everyone has different style preferences, needs, and preferences. However, some people commonly use   
languages like Python, Java, C++, and JavaScript, which are popular among developers for a variety of reasons, including readability, flexibility,  
and scalability. Ultimately, it depends on personal preference, needs, and goals.

Note that the chat interface relies on the tokenizer’s chat template to format the inputs for the model. Make sure your tokenizer has a chat template defined.

Besides talking to the model there are a few commands you can use:

  • clear: clears the current conversation and start a new one
  • example {NAME}: load example named {NAME} from the config and use it as the user input
  • set {SETTING_NAME}={SETTING_VALUE};: change the system prompt or generation settings (multiple settings are separated by a ;).
  • reset: same as clear but also resets the generation configs to defaults if they have been changed by set
  • save or save {SAVE_NAME}: save the current chat and settings to file by default to ./chat_history/{MODEL_NAME}/chat_{DATETIME}.yaml or {SAVE_NAME} if provided
  • exit: closes the interface

Getting the System Information

You can get the system information by running the following command:

trl env

This will print out the system information including the GPU information, the CUDA version, the PyTorch version, the transformers version, and the TRL version, and any optional dependencies that are installed.

Copy-paste the following information when reporting an issue:

- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.9
- PyTorch version: 2.4.1
- CUDA device: NVIDIA H100 80GB HBM3
- Transformers version: 4.45.0.dev0
- Accelerate version: 0.34.2
- Accelerate config: 
  - compute_environment: LOCAL_MACHINE
  - distributed_type: DEEPSPEED
  - mixed_precision: no
  - use_cpu: False
  - debug: False
  - num_processes: 4
  - machine_rank: 0
  - num_machines: 1
  - rdzv_backend: static
  - same_network: True
  - main_training_function: main
  - enable_cpu_affinity: False
  - deepspeed_config: {'gradient_accumulation_steps': 4, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}
  - downcast_bf16: no
  - tpu_use_cluster: False
  - tpu_use_sudo: False
  - tpu_env: []
- Datasets version: 3.0.0
- HF Hub version: 0.24.7
- TRL version: 0.12.0.dev0+acb4d70
- bitsandbytes version: 0.41.1
- DeepSpeed version: 0.15.1
- Diffusers version: 0.30.3
- Liger-Kernel version: 0.3.0
- LLM-Blender version: 0.0.2
- OpenAI version: 1.46.0
- PEFT version: 0.12.0

This information are required when reporting an issue.

< > Update on GitHub