Huihan Li

Hi, my name is Huihan Li. I’m a fourth year PhD student working on Natural Language Processing in University of Southern California. I’m part of the INK Lab, advised by Xiang Ren. I got my M.S.E in Computer Science from Princeton University, being part of the Princeton Natural Language Processing Group and advised by Danqi Chen. I studied Computer Science and Cognitive & Linguistic Sciences at Wellesley College, working with Christine Bassem on human crowdsensing.

I am passionate about Natural Language Processing, Computational Linguistics, and everything about languages. In high school, I competed in International Linguistics Olympiad representing China, and won an Honorable Mention in Sofia, Bulgaria (2015) and a Bronze Medal in Mysore, India (2016).

Outside of research, I enjoy all kinds of sports, cooking/baking, and reading. I played water polo in college and it has been one of my best memories. Currently, I am learning tennis.

A functional, pain-free body is the foundation of all pursuits and endeavors. From my years of PT visits, I tried summarizing my learnings in a crash course on self-diagnosing and managing common bodily pains. I hope this is helpful for whomever is visiting my site ❤️.

Research

My research focuses on training and evaluating language models for robust generalization in unfamiliar situations. While many paths lead to Rome, I believe in creating comprehensive data and effective methods to learn from the data. As of now, I am especially interested in attributing LM performance to pretraining data. My broad research goal involves two key objectives: (1) understanding the impact of pretraining data on model learning, especially how models behave in situations that are rare or non-existent in pretraining, i.e. the long-tail distribution; (2) developing methods for detecting and generating long-tail data to enhance model performance across diverse scenarios.

Previously, I have worked on conversation models, constraint decoding, culture bias, and long-tail data generation. A full list of my publications is in this link.

News

August 2025. Our paper, “Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time”, is accepted to EMNLP 2025 Main Conference. See you in Suzhou in November!
January 2025. I will be joining Meta GenAI as a Research Scientist intern starting May 2025, working with Bo Xiong!
January 2025. Our paper, “Attributing Culture-Conditioned Generations to Pretraining Corpora”, is accepted to ICLR 2025. See you in Singapore in April!
September 2024. Our paper, “In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search”, is accepted to EMNLP 2024 Main Conference. See you in Miami in November!
August 2024. I am awarded the Amazon ML PhD Fellowship for 2024-2025. This fellowship will support my work on Secure and Trusted Machine Learning.
July 2024. Our paper, “CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting”, is accepted to COLM 2024. See you in Philly in October!
March 2023. I will be joining AI2 Mosaic Team as a summer research intern starting May 2023, working with Nouha Dziri and Yejin Choi!

Experience

Ink Lab, USC. PhD Student. Sept. 2022 - Present
- Research in Natural Language Processing
- Advisor: Xiang Ren
Meta, GenAI. Research Intern. May 2025 - Aug. 2025
- Research in RL for Reasoning
- Mentor: Bo Xiong, Liang Tan, Yipin Zhou, Derek Hao Hu
AI2, Mosaic. Research Intern. May 2023 - March 2024
- Research in Multicultural biases in LM
- Mentor: Nouha Dziri, Yejin Choi
Apple. AI/ML Intern. May 2022 - Aug. 2022
- Individual NLP research/engineering project, Siri Information Intelligence, Answers and Web Ranking Team
- Mentors & Supervisors: Michael Tu, Nihkil Ramesh, Chris Dubois
Princeton NLP Group. M.S.E Student. Sept. 2020 - May 2022
- Research in Natural Language Processing
- Advisor: Danqi Chen
Wellesley College. Research Assistant. Sept. 2018 - July 2020
- Research in Mobile Crowdsensing
- Advisor: Christine Bassem
Google. SWE Intern. May 2019 - Aug. 2019
- Individual engineering project, Shopping Assistant, Natural Language Team
- Supervisors: Jesse Welch, John Karro

Teaching

Foundations of Artificial Intelligence (CSCI 561). University of Southern California
Introduction to Programming Systems (COS217). Princeton University
Data Structures (CS230). Wellesley College

Honors and Awards

Amazon Fellow. University of Southern California. Aug. 2024
Siebel Scholars. Princeton University. Sept. 2021
Sigma Xi Scientific Research Honor Society. Wellesley College. May 2020
Durant Scholars magna cum laude. Wellesley College. May 2020

Service and leadership

Reviewer. COLING 2024, ACL 2024, EMNLP 2024, ARR 2024, COLING 2025, ICLR 2025, ACL 2025, COLM 2025, EMNLP 2025
Student Representative on Board of Admission. Wellesley College. Oct. 2019 - May 2020