Huihan Li

Hi, my name is Huihan Li. I’m a third year PhD student working on Natural Language Processing in University of Southern California. I’m part of the INK Lab, advised by Xiang Ren. I got my M.S.E in Computer Science from Princeton University, being part of the Princeton Natural Language Processing Group and advised by Danqi Chen. I studied Computer Science and Cognitive & Linguistic Sciences at Wellesley College, working with Christine Bassem on human crowdsensing.

I am passionate about Natural Language Processing, Computational Linguistics, and everything about languages. In high school, I competed in International Linguistics Olympiad representing China, and won an Honorable Mention in Sofia, Bulgaria (2015) and a Bronze Medal in Mysore, India (2016).

Outside of research, I enjoy all kinds of sports, cooking/baking, and reading. I played water polo in college and it had been one of my best memories. Currently, I am learning tennis.

** I am actively looking for Summer 2025 Research internships. If my background seems fit to your team, please definitely reach out! **

Research

My research focuses on training and evaluating language models for robust generalization in unfamiliar situations. While many paths lead to Rome, I believe in creating comprehensive data and effective methods to learn from the data. As of now, I am especially interested in attributing LM performance to pretraining data. My broad research goal involves two key objectives: (1) understanding the impact of pretraining data on model learning, especially how models behave in situations that are rare or non-existent in pretraining, i.e. the long-tail distribution; (2) developing methods for detecting and generating long-tail data to enhance model performance across diverse scenarios.

Previously, I have worked on conversation models, constraint decoding, culture bias, and long-tail data generation. A full list of my publications is in this link.

News

  • September 2024. Our paper, “In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search”, is accepted to EMNLP 2024 Main Conference. See you in Miami in November!
  • August 2024. I am awarded the Amazon ML PhD Fellowship for 2024-2025. This fellowship will support my work on Secure and Trusted Machine Learning.
  • July 2024. Our paper, “CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting”, is accepted to COLM 2024. See you in Philly in October!
  • March 2023. I will be joining AI2 Mosaic Team as a summer research intern starting May 2023, working with Nouha Dziri and Yejin Choi!

Teaching

  • Introduction to Programming Systems (COS217). Princeton University
  • Data Structures (CS230). Wellesley College

Experience

  • Ink Lab, USC. PhD Student. Sept. 2022 - Present
    • Research in Natural Language Processing
    • Advisor: Xiang Ren
  • AI2, Mosaic. Research Intern. May 2023 - Present
    • Work on Multicultural biases in LM
    • Mentor: Nouha Dziri, Yejin Choi
  • Apple. AI/ML Intern. May 2022 - Aug. 2022
    • Individual NLP research/engineering project, Siri Information Intelligence, Answers and Web Ranking Team
    • Mentors & Supervisors: Michael Tu, Nihkil Ramesh, Chris Dubois
  • Princeton NLP Group. M.S.E Student. Sept. 2020 - May 2022
    • Research in Natural Language Processing
    • Advisor: Danqi Chen
  • Wellesley College. Research Assistant. Sept. 2018 - July 2020
    • Research in Mobile Crowdsensing
    • Advisor: Christine Bassem
  • Google. SWE Intern. May 2019 - Aug. 2019
    • Individual engineering project, Shopping Assistant, Natural Language Team
    • Supervisors: Jesse Welch, John Karro

Honors and Awards

  • Amazon Fellow. University of Southern California. Aug. 2024
  • Siebel Scholars. Princeton University. Sept. 2021
  • Sigma Xi Scientific Research Honor Society. Wellesley College. May 2020
  • Durant Scholars magna cum laude. Wellesley College. May 2020

Service and leadership

  • Reviewer. COLING 2024, ACL 2024, EMNLP 2024, ARR 2024, COLING 2025, ICLR 2025
  • Student Representative on Board of Admission. Wellesley College. Oct. 2019 - May 2020