Dan Liebling
Dan Liebling (he/him) creates experiences that advance scientific research by integrating AI and academic knowledge. He joined the Science AI team in 2022 after five years of building and leading a research and engineering team focused on speech-to-speech translation experiences. His work at Google Research brings a human-computer interaction (HCI) lens to language-focused disciplines such as academic writing, speech recognition, and machine translation research. Prior to working at Google, he worked on information retrieval and human HCI research at Microsoft Research.
MS, Computer Science and Engineering, University of Washington
BS, Engineering and Applied Science, Caltech
See other publications via Google Scholar or Semantic Scholar
Authored Publications
Sort By
Expert evaluation of LLM world models: A high-Tc superconductivity case study
Haoyu Guo
Maria Tikhanovskaya
Paul Raccuglia
Alexey Vlaskin
Chris Co
Scott Ellsworth
Matthew Abraham
Lizzie Dorfman
Peter Armitage
Chunhan Feng
Antoine Georges
Olivier Gingras
Dominik Kiese
Steve Kivelson
Vadim Oganesyan
Brad Ramshaw
Subir Sachdev
Senthil Todadri
John Tranquada
Eun-Ah Kim
Proceedings of the National Academy of Sciences (2026)
Preview abstract
Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. This work evaluates the performance of six different LLM-based systems for answering scientific literature questions, including commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. We conduct a rigorous expert evaluation of the systems in the domain of high-temperature cuprate superconductors, a research area that involves material science, experimental physics, computation, and theoretical physics. We use an expert-curated database of 1726 scientific papers and a set of 67 expert-formulated questions. The evaluation employs a multi-faceted rubric assessing balanced perspectives, factual comprehensiveness, succinctness, evidentiary support, and image relevance. Our results demonstrate that RAG-based systems, powered by curated data and multimodal retrieval, outperform existing closed models across key metrics, particularly in providing comprehensive and well-supported answers, and in retrieving relevant visual information. This study provides valuable insights into designing and evaluating specialized scientific literature understanding systems, particularly with expert involvement, while also highlighting the importance of rich, domain-specific data in such systems.
View details
An AI system to help scientists write expert-level empirical software
Johan Kartiwa
Matthew Abraham
Qian-Ze Zhu
Zahra Shamsi
Shibl Mourad
Julie Wang
Anastasiya Belyaeva
Scott Ellsworth
Yuchen Zhou
Jackson Cui
Grace Joseph
Malcolm Kane
Paul Raccuglia
Ryan Krueger
Jeffrey Cardille
Erica Brand
Renee Johnston
James Thompson
Chris Co
James Manyika
Anna Bulanova
David Smalling
Eser Aygün
Kat Chou
Gheorghe Comanici
arXiv (2025)
Preview abstract
The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. The system achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a wide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, it
generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. By devising and implementing novel solutions to diverse tasks, the system represents a significant step towards accelerating scientific progress.
Keywords: Tree Search, Generative AI, Scorable Scientific Tasks, Empirical Software
View details
Towards AI-assisted academic writing
Malcolm Kane
Madeleine Grunde-McLaughlin
Ian Lang
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, Association for Computational Linguistics (2025), pp. 31-45
Preview abstract
We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user’s current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs.
View details
Three Directions for the Design of Human-Centered Machine Translation
Samantha Robertson
Wesley Deng
Timnit Gebru
Margaret Mitchell
Samy Bengio
Niloufar Salehi
(2021)
Preview abstract
As people all over the world adopt machine translation (MT) to communicate across languages, there is increased need for affordances that aid users in understanding when to rely on automated translations. Identifying the information and interactions that will most help users meet their translation needs is an open area of research at the intersection of Human-Computer Interaction (HCI) and Natural Language Processing (NLP). This paper advances work in this area by drawing on a survey of users' strategies in assessing translations. We identify three directions for the design of translation systems that support more reliable and effective use of machine translation: helping users craft good inputs, helping users understand translations, and expanding interactivity and adaptivity. We describe how these can be introduced in current MT systems and highlight open questions for HCI and NLP research.
View details
Unmet Needs and Opportunities for Mobile Translation AI
Abigail Evans
Aaron Michael Donsbach
Boris Smus
Jess Scon Holbrook
Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), ACM, Honolulu, Hawaii, USA
Preview abstract
Translation apps and devices are often presented in the context of providing assistance while traveling abroad. However, the spectrum of needs for cross-language communication is much wider. To investigate these needs, we conducted three studies with populations spanning socioeconomic status and geographic regions: (1) United States-based travelers, (2) migrant workers in India, and (3) immigrant populations in the United States. We compare frequent travelers' perception and actual translation needs with those of the two migrant communities. The latter two, with low language proficiency, have the greatest translation needs to navigate their daily lives. However, current mobile translation apps do not meet these needs. Our findings provide new insights on the usage practices and limitations of mobile translation tools. Finally, we propose design implications to help apps better serve these unmet needs.
View details