Yining Li

I am a research scientist at Shanghai Artificial Intelligence Laboratory, and an adjunct Ph.D. supervisor at Shanghai Jiao Tong University. I received my Ph.D. from Multimedia Lab (MMLab) at The Chinese University of Hong Kong in 2019, advised by Prof. Chen Change Loy and Prof. Xiaoou Tang. Before that, I earned my B.S. Degree from Tsinghua University in 2014. Prior to joining Shanghai AI Lab, I was a Senior Research Scientist at SenseTime from 2019 to 2021.

I work on large language models and agentic systems. I am a member of the Intern Large Models team, where we develop foundation models and agentic systems to accelerate scientific discovery. With a background in computer vision, I was a core member of OpenMMLab and led the development of MMPose.

Hiring We have openings for research interns and full-time researchers in LLM training and infrastructure. I am also looking for Ph.D. students (in collaboration with SJTU). Feel free to email me if you are interested.

News

Apr 15, 2026	TREX technical report is available. We introduce TREX, a tree-search enhanced agentic system that automates the challenging process of LLM fine-tuning, along with a new benchmark, FT-Bench, comprising 10 fine-tuning tasks to evaluate automatic research systems. Check out the project page for more information.
Apr 07, 2026	RouteMoA is accepted to ACL 2026.
Mar 30, 2026	Kernel-Smith technical report is online.
Dec 12, 2025	MG-LLaVA is accepted to TCSVT.
Feb 26, 2025	Auto Cherry-Picker is accepted to CVPR 2025.
Feb 16, 2025	MIG is accepted to ACL 2026 Findings.
Jan 23, 2025	RMP-SAM is accepted to ICRL 2025 as an oral presentation.
Jan 15, 2025	InternLM3-8B-Instruct is released, supporting both a normal response mode for general purpose and a deep thinking mode for solving complicated reasoning tasks via long CoT.
Sep 26, 2024	5 papers accepted to NeurIPS 2024, 3 in the main track (MotionBooth, ADC, XComposer2-4KHD) and 2 in the Datasets and Benchmarks track (GTA, MMBench-Video).
Jul 11, 2024	We released RTMW, the newest addition to RTMPose series, which specializes in predicting whole-body 2D and 3D keypoints simultaneously in real time.
Jul 01, 2024	Open-Vocabulary SAM is accepted to ECCV 2024.
May 26, 2024	InternLM2 technical report is online.
Feb 27, 2024	3 papers accepted to CVPR 2024: RTMO, OMG-Seg and ROVI.
Dec 08, 2023	We introduce AgentLego, a modular tool library to equip LLM agents with composable, multi-modal capabilities through standardized tool interfaces.

Selected Publications

LLM/VLM

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, and 5 more authors

arXiv, 2026

arXiv Code
Agent

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

Jize Wang, Han Wu, Zhiyuan You, Yiming Song, Yijun Wang, and 7 more authors

ACL, 2026

arXiv HTML Code
Agent

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du, and 5 more authors

arXiv, 2026

arXiv HTML Website
LLM/VLM

DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning

Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, and Kai Chen

arXiv, 2026

arXiv HTML
LLM/VLM

MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space

Yicheng Chen, Yining Li, Kai Hu, Zerun Ma, Haochen Ye, and 1 more author

In Findings of ACL, 2025

DOI arXiv HTML Code Website
Vision & Multimodality

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, and 2 more authors

In CVPR, 2025

arXiv HTML Video Code Website
Vision & Multimodality

MotionBooth: Motion-Aware Customized Text-to-Video Generation

Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, and 3 more authors

In NeurIPS Spotlight, 2024

arXiv HTML Video Code Website
LLM/VLM

InternLM2 Technical Report

Zhaowei Cai, Ming Cao, Hao Chen, Kai Chen, Kaibo Chen, and 4 more authors

arXiv, 2024

arXiv HTML Code Website
LLM/VLM

InternLM-XComposer2: Mastering Free-Form Text-Image Composition and Comprehension in Vision-Language Large Model

Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yan Cao, Boxiao Wang, and 4 more authors

arXiv, 2024

arXiv HTML Code Website
Agent

GTA: A Benchmark for General Tool Agents

Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, and 2 more authors

In NeurIPS Datasets and Benchmarks Track, 2024

arXiv HTML Code Website
Vision & Multimodality

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, and 1 more author

In CVPR, 2024

arXiv HTML Code