Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations

Conversational AI has fundamentally reshaped how we interact with technology. While one-on-one interactions with large language models (LLMs) have seen significant advances, they rarely capture the full complexity of human communication. Many real-world dialogues, including team meetings, family dinners, or classroom lessons, are inherently multi-party. These interactions involve fluid turn-taking, shifting roles, and dynamic interruptions.

For designers and developers, simulating natural and engaging multi-party conversations has historically required a trade-off: settle for the rigidity of scripted interaction or accept the unpredictability of purely generative models. To bridge this gap, we need tools that blend the structural predictability of a script with the spontaneous, improvisational nature of human conversation.

To address this need, we introduce DialogLab, presented at ACM UIST 2025, an open-source prototyping framework designed to author, simulate, and test dynamic human-AI group conversations. DialogLab provides a unified interface to manage multi-party dialogue complexity, handling everything from defining agent personas to orchestrating complex turn-taking dynamics. Through integrating real-time improvisation with structured scripting, this framework enables developers to test conversations ranging from a structured Q&A session to a free-flowing creative brainstorm. Our evaluations with 14 end users or domain experts validate that DialogLab supports efficient iteration and realistic, adaptable multi-party design for training and research.

DialogLab is a research prototype that supports authoring, simulating, and testing dynamic human–AI group conversations. Designers can 1) configure group, party, snippet characteristics, 2) test with simulation and live interaction, and 3) gain insights with timeline view and post-hoc analytics.

A framework for dynamic conversation

DialogLab decouples a conversation’s social setup — such as participants, roles, subgroups, and relationships — from its temporal progression. This separation enables creators to author complex dynamics via a streamlined three-stage workflow: author, test, verify.

At its core, the DialogLab framework defines conversations along two dimensions:

Group dynamics: This covers the social setup of the interaction.
- A group is the top-level container (e.g., a conference social event).
- Parties are sub-groups that have distinct roles (e.g., "presenters" and "audience").
- Elements are the individual participants (human or AI) and any shared content, like a presentation slide.
Conversation flow dynamics: This describes how the dialogue unfolds over time.
- The flow is broken down into snippets, which represent distinct phases of the conversation. Each snippet has a defined set of participants, a sequence of conversational turns, and specific interaction styles (e.g., collaborative or argumentative). Creators can also define rules for interruptions and backchanneling to make the dialogue more realistic.

Our framework separates the social setup (roles, parties) from the temporal flow (snippets, turn-taking rules), allowing for modular conversation design. Left: An example of authoring group dynamics for demo presenters and Q&A audiences; Right: An example of authoring conversation dynamics in three stages: opening, debate, consensus.

The “author-test-verify” workflow for dynamic conversation

DialogLab guides creators through a structured author-test-verify workflow, supported by a visual interface designed for rapid iteration.

Authoring with visual tools: The interface features a drag-and-drop canvas where users position avatars and content from libraries to build scenes. Inspector panels allow for granular configuration, from an avatar’s persona to the interaction patterns within a specific snippet. To accelerate the design process, DialogLab offers auto-generated conversation prompts that can be fine-tuned to meet specific narrative goals.
Simulation with human-in-the-loop: Testing is critical for multi-party interactions. DialogLab includes a live preview panel that displays the conversation transcript and a "human control" mode, where an audit panel suggests potential AI responses. The designer can edit, accept, or dismiss these suggestions, providing fine-grained control over the AI's contributions and allowing for rapid iterations.
Verification and analytics: To validate the interaction, the verification dashboard serves as a diagnostic tool. It visualizes conversation dynamics, allowing creators to quickly analyze turn-taking distributions and sentiment flows without parsing through lengthy raw transcripts.

Demonstration of the DialogLab prototype, which supports the authoring, simulating, and testing of dynamic human-AI group conversations.

Prototype evaluation

We evaluated DialogLab with 14 participants across game design, education, and social science research. Participants completed two tasks in DialogLab: designing an academic social event, and testing a group discussion with AI under three conditions:

Human control: When testing a conversation, the user can ask agents to “shift topic”, generate “new perspective”, “probe question”, or generate “emotional response”.
Autonomous: The simulated agents proactively participate in the conversation based on pre-defined orders (random or one-by-one), while generating emotional responses and topic shifts automatically.
Reactive: the simulated human agent only responds when directly mentioned by other agents, simulating traditional human-AI turn-taking behaviors.

Participants rated each condition at a 5-point Likert scale. Participants found the human control mode to be significantly more engaging, and generally more effective and realistic for simulating real-world conversations.

Bar chart comparing Human Control, Autonomous, and Reactive systems across Ease of Use, Engagement, Effectiveness, and Realism.

Participants’ feedback further highlighted the system's ability to balance automation with control:

Intuitive and engaging: Most participants found DialogLab easy to use and the visual, drag-and-drop interface for setting up scenes and roles to be fun and efficient.
Flexible and controllable: Users appreciated the balance between auto-generated prompts and the ability to fine-tune conversation details. The system's ability to model different moderation strategies was also highlighted as a key strength.
Realistic simulation: The human control mode was the clear favorite for testing, with users reporting that it gave them a greater sense of agency and immersion. It was rated as more engaging, effective, and realistic for simulating human behavior compared to fully autonomous or purely reactive agents.
Powerful verification: The verification dashboard was seen as a valuable diagnostic tool for quickly analyzing conversation dynamics without having to read through lengthy transcripts.

Future directions

DialogLab is more than just a research prototype; it's a step toward a future where human-AI collaboration is richer and more nuanced. The potential applications are vast:

Education and skill development: Students could practice public speaking in front of a simulated audience, or professionals could rehearse difficult conversations and interviews.
Game design and storytelling: Writers and game developers can create more believable and dynamic non-player characters (NPCs) that interact with each other and the player in more natural ways.
Social science research: DialogLab can be used as a controlled environment to study group dynamics, allowing researchers to test hypotheses about social interaction without the logistical challenges of recruiting large groups of people.

Example applications of DialogLab, including practicing conference Q&A sessions, simulating debates, and creating game dialog design.

Moving forward, we envision richer multimodal behaviors, such as non-verbal gestures and facial expressions, could be integrated into this framework, We could also explore the use of photorealistic avatars and 3D environments like ChatDirector to create even more immersive and realistic simulations in our open-source XR Blocks framework. We hope this research will inspire continued innovation in the exciting and emerging field of human-AI group conversation dynamics.

See video demonstration of DialogLab to learn more.

Acknowledgements

Key contributors to the project include Erzhen Hu, Yanhe Chen, Mingyi Li, Vrushank Phadnis, Pingmei Xu, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. We would like to extend our thanks to Adarsh Kowdle for providing feedback or assistance for the manuscript and the blog post. This project is partly sponsored by the Google PhD fellowship.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations

Quick links

A framework for dynamic conversation

The “author-test-verify” workflow for dynamic conversation

Prototype evaluation

Future directions

Acknowledgements

Quick links

Google AI

Google Cloud

Google DeepMind

Google Labs

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations

Quick links

A framework for dynamic conversation

The “author-test-verify” workflow for dynamic conversation

Prototype evaluation

Future directions

Acknowledgements

Quick links

Other posts of interest