alphaXiv

Explore

Sign In

Blog

Feedback

Browser Extension

Upgrade to Pro

Dark mode

We're hiring

Ask or search anything...

What are the most popular benchmarks for math reasoning?

Alt+↵ To search

Events

Watch Recordings
Why LLMs Aren’t Scientists Yet: Lessons from Four Autonomous Research Attempts05/08 · Prof. Dhruv Kumar and Dhruv Trehan · alphaXiv
HotLikes
Sign in
HotLikes
Let ViT Speak: Generative Language-Image Pre-training
01 May 2026
ByteDance logoByteDanceBeijing Jiaotong University logoBeijing Jiaotong University
Yan Fang
Mengcheng Lan
Zilong Huang

The GenLIP framework introduces a minimalist generative pre-training approach for Vision Transformers, enabling them to directly predict language tokens from visual inputs. It achieves competitive or superior performance on 14 diverse multimodal benchmarks with 8 billion pretraining samples, outperforming baselines that use up to 40 billion samples.

View blog
#computer-science#computer-vision-and-pattern-recognition#generative-models
Audio1
Paper thumbnail
954
MolmoAct2: Action Reasoning Models for Real-world Deployment
04 May 2026
University of Washington logoUniversity of WashingtonNational University of Singapore logoNational University of Singapore
Haoquan Fang
Jiafei Duan
Donovan Clay

MolmoAct2, developed by the Allen Institute for AI and the University of Washington, is a fully open-source action reasoning model designed for real-world robot deployment, enhancing generalizability and efficiency. It incorporates an embodied reasoning VLM backbone and a continuous action expert, achieving up to 87.1% success on real-world DROID tasks with unseen objects and a 2.42x speedup in control rate compared to unoptimized inference.

View blog
#computer-science#robotics
Audio28
Paper thumbnail
304
Thinking with Visual Primitives
30 Apr 2026
Tsinghua University logoTsinghua UniversityPeking University logoPeking University
Ruijie Lu
Yiyang Ma
Xiaokang Chen

DeepSeek-AI researchers introduced "Thinking with Visual Primitives," a framework that integrates points and bounding boxes as fundamental units of thought into Multimodal Large Language Models (MLLMs) to address the "Reference Gap" in complex visual reasoning. This approach improves performance on tasks like counting, spatial deduction, and topological navigation while significantly enhancing visual token efficiency.

View blog
Audio38
Paper thumbnail
2,937
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE
04 May 2026
ByteDance logoByteDance
Yangming Shi
Shixiang Zhu
Tao Shen

ByteDance's Mamoda2.5 unifies multimodal understanding, image generation, and video generation/editing within a single Autoregressive–Diffusion framework, leveraging a fine-grained Mixture-of-Experts Diffusion Transformer (DiT-MoE) for computational efficiency. The model demonstrates competitive performance on benchmarks like VBench 2.0 (61.64) and OpenVE-Bench (3.86), while its distilled version achieves up to 95.9 times faster video editing inference.

View blog
#computer-science#computer-vision-and-pattern-recognition#efficient-transformers
Audio95
Paper thumbnail
297
Model Spec Midtraining: Improving How Alignment Training Generalizes
03 May 2026
Anthropic logoAnthropic
Chloe Li
Sara Price
Samuel Marks

Model Spec Midtraining (MSM) enhances the generalization of large language models by integrating an intermediate training phase that instills a deep understanding of a Model Spec's principles and values. This approach reduces agentic misalignment in out-of-distribution scenarios and improves the compute efficiency of alignment fine-tuning by up to 60x in low-sample regimes.

View blog
#agents#computer-science#artificial-intelligence
Audio
Paper thumbnail
197
On-Policy Distillation
05 May 2026
Thinking Machines
Kevin Lu

This work introduces on-policy distillation, a post-training method for large language models that combines the on-policy relevance of reinforcement learning with dense, token-level feedback from a teacher model. The approach achieved 70% on the AIME'24 mathematical reasoning benchmark with Qwen3-8B and demonstrated a 30x cost reduction compared to off-policy distillation, while also recovering instruction-following abilities in personalized models.

View blog
Audio33
Paper thumbnail
182
RLDX-1 Technical Report
05 May 2026
Dongyoung Kim
Huiwon Jang
Myungkyu Koo

Researchers from RLWRLD and KAIST developed RLDX-1, a general-purpose robotic policy that integrates motion awareness, long-term memory, and physical sensing into a Vision-Language-Action (VLA) model for dexterous manipulation. The system consistently outperformed state-of-the-art VLAs in both simulation and real-world tasks, exhibiting superior performance in dynamic and contact-rich environments.

View blog
#agents#computer-science#artificial-intelligence
Audio
Paper thumbnail
143
A Theory of Generalization in Deep Learning
02 May 2026
Stanford University logoStanford University
Elon Litman
Gabe Guo

Researchers at Stanford University developed a generalization theory for deep neural networks that operates in the full feature-learning regime, showing how the empirical Neural Tangent Kernel (eNTK) partitions output space into a signal channel and a test-invisible reservoir. This framework leads to an optimization method that accelerates generalization by suppressing noise, achieving, for instance, a 5-fold speedup in grokking and improving reward accuracy in LLM fine-tuning under noisy preferences.

View blog
#computer-science#machine-learning#fine-tuning
Audio84
Paper thumbnail
192
HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness
04 May 2026
Jianing Wang
Linsen Guo
Zhengyu Chen

HEAVYSKILL formalizes a two-phase 'heavy thinking' process as an intrinsic LLM skill, combining parallel reasoning with sequential deliberation. This framework consistently improves performance on complex reasoning tasks, outperforming traditional strategies like Best-of-N and demonstrating that deliberation can synthesize correct solutions not present in individual initial attempts.

View blog
#agentic-frameworks#agents#chain-of-thought
Audio1
Paper thumbnail
106
T2^22PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning
04 May 2026
UCLA logoUCLAAmazon
Haixin Wang
Hejie Cui
Chenwei Zhang

T2PO (Token- and Turn-level Policy Optimization) introduces an uncertainty-guided exploration control framework for multi-turn agentic reinforcement learning, achieving improved training stability and task performance by adaptively mitigating inefficient token-level thinking and turn-level repetition. The method demonstrated higher success rates and reduced token consumption and interaction turns across diverse interactive environments like WebShop and ALFWorld.

View blog
#agents#computer-science#conversational-ai
Audio2
Paper thumbnail
136
Representation Fréchet Loss for Visual Generation
30 Apr 2026
CUHK logoCUHKOpenAI logoOpenAI
Jiawei Yang
Zhengyang Geng
Xuan Ju

This research introduces FD-loss, a method that directly optimizes Fréchet Distance as a training objective for generative models by decoupling population statistics from batch-level gradient computation. This approach enhances the visual quality of one-step generators and repurposes multi-step models for efficient single-step generation, while also proposing FDr_k, a new multi-representation metric for comprehensive evaluation.

View blog
#computer-science#computer-vision-and-pattern-recognition#generative-models
Audio2
Paper thumbnail
1,249
World Model for Robot Learning: A Comprehensive Survey
30 Apr 2026
ETH Zurich logoETH ZurichHarvard University logoHarvard University
Bohan Hou
Gen Li
Jindou Jia

This survey systematically reviews world models in robot learning, offering a robotics-centric definition and categorizing existing approaches by architectural coupling, functional roles, and application domains. It synthesizes current research, identifies key challenges, and outlines future directions for integrating predictive modeling into embodied AI.

View blog
#agent-based-systems#autonomous-vehicles#computer-science
Audio29
Paper thumbnail
349
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
05 May 2026
Yuwen Du
Rui Ye
Shuo Tang
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
View blog
#agents#computer-science#artificial-intelligence
Audio622
Paper thumbnail
54
From Qubit to Qubit: A Graduate Course in Quantum Mechanics
02 May 2026
Jeremy Levy

Jeremy Levy's textbook "From Qubit to Qubit" introduces a graduate quantum mechanics curriculum by first developing foundational concepts using finite-dimensional spin-1/2 systems (qubits) before progressing to continuous quantum mechanics. This approach aims to provide a more intuitive and unified understanding, integrating quantum information, condensed matter, and atomic physics.

View blog
#physics#quantum-physics
Audio
Paper thumbnail
51
Posterior Augmented Flow Matching
01 May 2026
University of Washington logoUniversity of WashingtonHugging Face logoHugging Face
George Stoica
Sayak Paul
Matthew Wallingford

Posterior Augmented Flow Matching (PAFM) reforms the Flow Matching objective to address sparse supervision in continuous-time generative models, incorporating a multi-target supervision signal during training. This method achieved up to 3.4 FID50K improvement on ImageNet-1K and 0.92 FID5K improvement on CC12M, while introducing negligible computational overhead.

View blog
#computer-science#computer-vision-and-pattern-recognition#generative-models
Audio
Paper thumbnail
160
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
01 May 2026
Siyuan Huang
Xiaoye Qu
Yafu Li

Researchers from Shanghai AI Laboratory and Shanghai Jiao Tong University developed Persistent Visual Memory (PVM), a module designed to counteract visual signal dilution in autoregressive Large Vision-Language Models (LVLMs) during deep generation tasks. Integrating PVM led to an absolute improvement of 4.8% in average accuracy on eight multimodal benchmarks, reaching 71.5% with a Qwen3-VL-8B-Instruct backbone, and showed a 27.3% relative performance boost for long output sequences.

View blog
#attention-mechanisms#computer-science#artificial-intelligence
Audio1
Paper thumbnail
104
Recursive Multi-Agent Systems
28 Apr 2026
University of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignStanford University logoStanford University
Xiyuan Yang
Jiaru Zou
Rui Pan

RecursiveMAS introduces a framework that integrates recursive computation into multi-agent systems, enabling agents to refine collaborative reasoning through iterative latent-space interactions rather than explicit text. This approach leads to average accuracy improvements of up to 20.2% and inference speedups of up to 2.4x compared to text-based recursive multi-agent systems, significantly reducing token usage.

View blog
#agentic-frameworks#agents#chain-of-thought
Audio5,027
Paper thumbnail
2,010
Linearizing Vision Transformer with Test-Time Training
04 May 2026
Yining Li
Dongchen Han
Zeyu Liu

Tsinghua University researchers developed T5 (Transformer To Test-Time Training), a method for linearizing pre-trained Softmax Vision Transformers into linear-complexity architectures with minimal fine-tuning. This approach, which includes architectural and representational alignments, enables near-full performance recovery in classification and generation tasks while significantly accelerating inference, such as a 1.47x speedup for Stable Diffusion at 2048x2048 resolution.

View blog
#computer-science#computer-vision-and-pattern-recognition#efficient-transformers
Audio
Paper thumbnail
50
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
03 May 2026
Sungyoung Lee
Dohyeong Kim
Eshan Balachandar

The University of Texas at Austin and independent researchers developed Flow-Anchored Noise-conditioned Q-Learning (FAN), an offline reinforcement learning algorithm that leverages expressive flow policies and distributional critics while significantly improving computational efficiency. This method achieved competitive or superior task performance across D4RL and OGBench benchmarks, exhibiting 5-14 times faster training runtime and competitive inference speed compared to previous distributional approaches.

View blog
#computer-science#machine-learning#robotics
Audio
Paper thumbnail
39
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
04 May 2026
Sunghwan Kim
Junhee Cho
Beong-woo Kwak

This empirical study identifies intrinsic task horizon length as a fundamental bottleneck in training large language model agents, demonstrating that longer horizons cause severe training instability and performance collapse. The research shows that applying horizon reduction techniques, such as macro actions and subgoal decomposition, effectively stabilizes training and improves performance, enabling generalization to longer, previously unseen tasks.

View blog
#agents#computer-science#artificial-intelligence
Audio
Paper thumbnail
38
There are no more papers matching your filters at the moment.