-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix DistributedOptimizer OOM when resuming from checkpoint
complexity: low
Final Review
PR is in the "final review" stage
#4638
opened May 6, 2026 by
cuichenx
Contributor
Loading…
Support GEMM + SwiGLU fused MLP (rebased from #3971)
complexity: high
Run tests
#4636
opened May 5, 2026 by
Connor-XY
Loading…
5 tasks
chore: nightly sync main into dev (05_05_2026)
Run functional tests
Run MBridge tests
Attach this for testing this PR against MBridge main
#4635
opened May 5, 2026 by
svcnvidia-nemo-ci
•
Draft
Inference: Cache input + position ID views
complexity: low
#4634
opened May 5, 2026 by
mathemakitten
Contributor
Loading…
5 tasks
Refactor TE fused ops integration into mixin
#4630
opened May 5, 2026 by
CarlosGomes98
Contributor
•
Draft
5 tasks
Disable MSC by default; opt in via --enable-msc
complexity: low
#4629
opened May 5, 2026 by
asolergi-nv
Contributor
Loading…
5 tasks done
dist-ckpt: add local-replica mode to eliminate cross-rank reads of replicated params on torch_dist load
complexity: medium
#4628
opened May 5, 2026 by
asolergi-nv
Contributor
Loading…
5 tasks done
ci: introduce L-tier scope vocabulary via parser
#4625
opened May 5, 2026 by
balasaajay
Contributor
•
Draft
5 tasks
[dev]: faster implementation of mHC fused kernels
#4624
opened May 5, 2026 by
jingqiny-99
•
Draft
5 tasks
Fix optimizer CPU offload for megatron-fsdp dtensor param
complexity: low
module: megatron-fsdp
#4623
opened May 5, 2026 by
wplf
Member
Loading…
[Dev] Fix single grouped weight when enabling MXFP8 primary weight
#4621
opened May 5, 2026 by
zhongbozhu
Contributor
•
Draft
5 tasks
chore: Update Docker image version to 26.04-py3
Run functional tests
#4611
opened May 4, 2026 by
balasaajay
Contributor
•
Draft
5 tasks
Fix gradient corruption with layerwise param all-gather overlap
Approved
All necessary approvals have been made
complexity: low
fix tokenizers in respect to newer transformers
complexity: low
Expert Review
[deprecated] Apply this label to indicate that your PR is ready for expert review.
Run functional tests
#4608
opened May 4, 2026 by
dimapihtar
Contributor
Loading…
5 tasks
ci: Bump GHA versions
Approved
All necessary approvals have been made
complexity: low
#4606
opened May 4, 2026 by
chtruong814
Contributor
Loading…
5 tasks
Enable async scheduling for decode-only inference
Run tests
#4604
opened May 4, 2026 by
lmcafee-nvidia
Contributor
•
Draft
Inference: Speed up the moe_sum_kernel by capping number of blocks
Approved
All necessary approvals have been made
complexity: low
#4603
opened May 4, 2026 by
sidsingh-nvidia
Contributor
Loading…
5 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.