Fix optimizer CPU offload for megatron-fsdp dtensor param by wplf · Pull Request #4623 · NVIDIA/Megatron-LM

wplf · 2026-05-05T04:48:58Z

Summary

Handle DTensor parameters and gradients by operating on local shards before optimizer CPU offload copies.
Avoid dispatching pin_memory/is_pinned through DTensor and respect pin_cpu_params.

Tests

uv run isort megatron/core/optimizer/cpu_offloading/hybrid_optimizer.py
PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=. python -m pytest tests/unit_tests/test_optimizer_cpu_offloading.py -q

test result

Handle Megatron-FSDP DTensor parameters and gradients by operating on local shards before CPU optimizer offload copies. This avoids dispatching pin_memory/is_pinned through DTensor and lets pin_cpu_params control CPU parameter pinning.

copy-pr-bot · 2026-05-05T04:49:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

wplf · 2026-05-05T04:55:21Z

/ok to test

copy-pr-bot · 2026-05-05T04:55:24Z

/ok to test

@wplf, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

wplf · 2026-05-05T06:02:24Z

/ok to test 5da7d67

Fix optimizer CPU offload for DTensor params

5da7d67

Handle Megatron-FSDP DTensor parameters and gradients by operating on local shards before CPU optimizer offload copies. This avoids dispatching pin_memory/is_pinned through DTensor and lets pin_cpu_params control CPU parameter pinning.

wplf changed the title ~~Fix optimizer CPU offload for DTensor params~~ Fix optimizer CPU offload for megatron-fsdp dtensor param May 5, 2026

wplf added the module: megatron-fsdp label May 5, 2026

wplf self-assigned this May 5, 2026

wplf marked this pull request as ready for review May 5, 2026 04:50

wplf requested review from a team as code owners May 5, 2026 04:50

svcnvidia-nemo-ci added the complexity: low label May 5, 2026

copy-pr-bot Bot temporarily deployed to test May 5, 2026 06:03 Inactive

yaox12 requested a review from shjwudp May 6, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix optimizer CPU offload for megatron-fsdp dtensor param#4623

Fix optimizer CPU offload for megatron-fsdp dtensor param#4623
wplf wants to merge 1 commit intoNVIDIA:devfrom
wplf:jinliang/fix-mfsdp-optimizer-offload

wplf commented May 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

wplf commented May 5, 2026

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

wplf commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wplf commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

test result

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

wplf commented May 5, 2026

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

wplf commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wplf commented May 5, 2026 •

edited

Loading