File size: 330 Bytes
5fa1a76 |
1 2 3 4 |
An additional level of debug is to add NCCL_DEBUG=INFO environment variable as follows: NCCL_DEBUG=INFO python -m torch.distributed.run --nproc_per_node 2 --nnodes 1 torch-distributed-gpu-test.py This will dump a lot of NCCL-related debug information, which you can then search online if you find that some problems are reported. |