Huihan Li

Hi, my name is Huihan Li. I’m a third year PhD student working on Natural Language Processing in University of Southern California. I’m part of the INK Lab, advised by Xiang Ren. I got my M.S.E in Computer Science from Princeton University, being part of the Princeton Natural Language Processing Group and advised by Danqi Chen. I studied Computer Science and Cognitive & Linguistic Sciences at Wellesley College, working with Christine Bassem on human crowdsensing.

I am passionate about Natural Language Processing, Computational Linguistics, and everything about languages. In high school, I competed in International Linguistics Olympiad representing China, and won an Honorable Mention in Sofia, Bulgaria (2015) and a Bronze Medal in Mysore, India (2016).

Outside of research, I enjoy all kinds of sports, cooking/baking, and reading. I played water polo in college and it had been one of my best memories. Currently, I am learning tennis.

Research

My research focuses on training and evaluating language models for robust generalization. As of now, I am especially interested in attributing LM performance to pretraining data. My broad research goal involves two key objectives: (1) understanding the impact of data frequency on model learning, especially in the low-confidence domain or distribution, i.e. the long-tail distribution; (2) developing methods for detecting and generating long-tail data to enhance model performance across diverse scenarios. A full list of my publications is in this link.

News

August 2024. I am awarded the Amazon ML PhD Fellowship for 2024-2025. This fellowship will support my work on Secure and Trusted Machine Learning.
July 2024. Our paper, “CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting”, is accepted to COLM 2024. See you in Philly in October!
March 2023. I will be joining AI2 Mosaic Team as a summer research intern starting May 2023, working with Nouha Dziri and Yejin Choi!

Teaching

Introduction to Programming Systems (COS217). Princeton University
Data Structures (CS230). Wellesley College

Experience

Ink Lab, USC. PhD Student. Sept. 2022 - Present
- Research in Natural Language Processing
- Advisor: Xiang Ren
AI2, Mosaic. Research Intern. May 2023 - Present
- Work on Multicultural biases in LM
- Mentor: Nouha Dziri, Yejin Choi
Apple. AI/ML Intern. May 2022 - Aug. 2022
- Individual NLP research/engineering project, Siri Information Intelligence, Answers and Web Ranking Team
- Mentors & Supervisors: Michael Tu, Nihkil Ramesh, Chris Dubois
Princeton NLP Group. M.S.E Student. Sept. 2020 - May 2022
- Research in Natural Language Processing
- Advisor: Danqi Chen
Wellesley College. Research Assistant. Sept. 2018 - July 2020
- Research in Mobile Crowdsensing
- Advisor: Christine Bassem
Google. SWE Intern. May 2019 - Aug. 2019
- Individual engineering project, Shopping Assistant, Natural Language Team
- Supervisors: Jesse Welch, John Karro

Honors and Awards

Amazon Fellow. University of Southern California. Aug. 2024
Siebel Scholars. Princeton University. Sept. 2021
Sigma Xi Scientific Research Honor Society. Wellesley College. May 2020
Durant Scholars magna cum laude. Wellesley College. May 2020

Service and leadership

Reviewer. COLING 2024, ACL 2024, EMNLP 2024, ARR Aug 2024, COLING 2025, ICLR 2025
Student Representative on Board of Admission. Wellesley College. Oct. 2019 - May 2020