🙋♂️ I am building a deterministic agentic AI ecosystem at Alibaba. I was the chief scientist at a startup (raised more than 50M$), previously worked at JD Explore Academy and Tencent AI Lab, and held an adjunct researcher position at ZJU.
🔭 Working on the whole pipeline of LLM R&D and their human-centric applications, including efficient and sufficient training, alignment, evaluations, compression, multilinguality, multimodality, agentic application, and much more.
💪 I'm keen on bodybuilding (5 years+), marathon (completed first half marathon (126min) in Beijing-2016 and most recent half marathon (86min) in Sydney-2019😅. will resume training in 2024💪🏻).
🥗 I (once😅) enjoy cooking.
🐈 I like to spend Sundays with my cats (two from 2020-2023, one from 2023).
🔥 Recent open-source projects — agentic AI (data, evaluation, context) and LLM alignment / policy optimization:
- 🔄 AgentHER Hindsight relabeling of failed trajectories for training.
- 🧬 AgentSynth Synthetic agent data from scratch with execution validation.
- 📏 AdaRubric Dynamic rubric evaluation for trajectory quality.
- 🗜️ trajectory_tokenization ReAct with compressed history for long-horizon context.
- 📡 SigFibPO SNR-calibrated trust regions and causal fiber residuals for multi-domain RLVR (research code + verl hook).
