Huihan Li
Hi, my name is Huihan Li. I’m a third year PhD student working on Natural Language Processing in University of Southern California. I’m part of the INK Lab, advised by Xiang Ren. I got my M.S.E in Computer Science from Princeton University, being part of the Princeton Natural Language Processing Group and advised by Danqi Chen. I studied Computer Science and Cognitive & Linguistic Sciences at Wellesley College, working with Christine Bassem on human crowdsensing.
I am passionate about Natural Language Processing, Computational Linguistics, and everything about languages. In high school, I competed in International Linguistics Olympiad representing China, and won an Honorable Mention in Sofia, Bulgaria (2015) and a Bronze Medal in Mysore, India (2016).
Outside of research, I enjoy all kinds of sports, cooking/baking, and reading. I played water polo in college and it had been one of my best memories. Currently, I am learning tennis.
** I am actively looking for Summer 2025 Research internships. If my background seems fit to your team, please definitely reach out! **
Research
My research focuses on training and evaluating language models for robust generalization in unfamiliar situations. While many paths lead to Rome, I believe in creating comprehensive data and effective methods to learn from the data. As of now, I am especially interested in attributing LM performance to pretraining data. My broad research goal involves two key objectives: (1) understanding the impact of pretraining data on model learning, especially how models behave in situations that are rare or non-existent in pretraining, i.e. the long-tail distribution; (2) developing methods for detecting and generating long-tail data to enhance model performance across diverse scenarios.
Previously, I have worked on conversation models, constraint decoding, culture bias, and long-tail data generation. A full list of my publications is in this link.
News
- September 2024. Our paper, “In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search”, is accepted to EMNLP 2024 Main Conference. See you in Miami in November!
- August 2024. I am awarded the Amazon ML PhD Fellowship for 2024-2025. This fellowship will support my work on Secure and Trusted Machine Learning.
- July 2024. Our paper, “CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting”, is accepted to COLM 2024. See you in Philly in October!
- March 2023. I will be joining AI2 Mosaic Team as a summer research intern starting May 2023, working with Nouha Dziri and Yejin Choi!
Teaching
- Introduction to Programming Systems (COS217). Princeton University
- Data Structures (CS230). Wellesley College
Experience
- Ink Lab, USC. PhD Student. Sept. 2022 - Present
- Research in Natural Language Processing
- Advisor: Xiang Ren
- AI2, Mosaic. Research Intern. May 2023 - Present
- Work on Multicultural biases in LM
- Mentor: Nouha Dziri, Yejin Choi
- Apple. AI/ML Intern. May 2022 - Aug. 2022
- Individual NLP research/engineering project, Siri Information Intelligence, Answers and Web Ranking Team
- Mentors & Supervisors: Michael Tu, Nihkil Ramesh, Chris Dubois
- Princeton NLP Group. M.S.E Student. Sept. 2020 - May 2022
- Research in Natural Language Processing
- Advisor: Danqi Chen
- Wellesley College. Research Assistant. Sept. 2018 - July 2020
- Research in Mobile Crowdsensing
- Advisor: Christine Bassem
- Google. SWE Intern. May 2019 - Aug. 2019
- Individual engineering project, Shopping Assistant, Natural Language Team
- Supervisors: Jesse Welch, John Karro
Honors and Awards
- Amazon Fellow. University of Southern California. Aug. 2024
- Siebel Scholars. Princeton University. Sept. 2021
- Sigma Xi Scientific Research Honor Society. Wellesley College. May 2020
- Durant Scholars magna cum laude. Wellesley College. May 2020
Service and leadership
- Reviewer. COLING 2024, ACL 2024, EMNLP 2024, ARR 2024, COLING 2025, ICLR 2025
- Student Representative on Board of Admission. Wellesley College. Oct. 2019 - May 2020