Skip to content

Fix optimizer CPU offload for megatron-fsdp dtensor param#4623

Open
wplf wants to merge 1 commit intoNVIDIA:devfrom
wplf:jinliang/fix-mfsdp-optimizer-offload
Open

Fix optimizer CPU offload for megatron-fsdp dtensor param#4623
wplf wants to merge 1 commit intoNVIDIA:devfrom
wplf:jinliang/fix-mfsdp-optimizer-offload

Conversation

@wplf
Copy link
Copy Markdown
Member

@wplf wplf commented May 5, 2026

Summary

  • Handle DTensor parameters and gradients by operating on local shards before optimizer CPU offload copies.
  • Avoid dispatching pin_memory/is_pinned through DTensor and respect pin_cpu_params.

Tests

  • uv run isort megatron/core/optimizer/cpu_offloading/hybrid_optimizer.py
  • PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=. python -m pytest tests/unit_tests/test_optimizer_cpu_offloading.py -q

test result

image

Handle Megatron-FSDP DTensor parameters and gradients by operating on local shards before CPU optimizer offload copies. This avoids dispatching pin_memory/is_pinned through DTensor and lets pin_cpu_params control CPU parameter pinning.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@wplf wplf changed the title Fix optimizer CPU offload for DTensor params Fix optimizer CPU offload for megatron-fsdp dtensor param May 5, 2026
@wplf wplf self-assigned this May 5, 2026
@wplf wplf marked this pull request as ready for review May 5, 2026 04:50
@wplf wplf requested review from a team as code owners May 5, 2026 04:50
@wplf
Copy link
Copy Markdown
Member Author

wplf commented May 5, 2026

/ok to test

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

/ok to test

@wplf, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@wplf
Copy link
Copy Markdown
Member Author

wplf commented May 5, 2026

/ok to test 5da7d67

@yaox12 yaox12 requested a review from shjwudp May 6, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants