Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11252 publications
    Preview abstract As artificial intelligence (AI) transitions from experimental pilot programs to mission-critical enterprise operations, traditional software-based security frameworks are proving insufficient against sophisticated infrastructure-level threats. This article introduces the concept of Silicon-Level Sovereignty, a first-principles approach to digital trust that anchors security in the physical hardware rather than the software stack. We examine the technical architecture of Hardware Root of Trust (RoT), specifically focusing on the roles of Trusted Platform Modules (TPMs) and Secure Enclaves in modern AI accelerators such as GPUs and TPUs. By leveraging cryptographic remote attestation, organizations can move from a model of assumed software integrity to one of verifiable hardware-level proof. The discussion provides a comparative analysis of industry-leading implementations, including NVIDIA’s Hopper architecture [1, 2], Google’s Titan-backed TPU v5p [3, 4], and Microsoft’s Azure Boost Cerberus system [5, 6], alongside the cluster-scale trust challenges presented by ultra-large systems like xAI’s Colossus [7]. The article concludes that Silicon-Level Sovereignty is no longer an optional security feature but a foundational requirement for establishing the integrity, privacy, and multi-tenant isolation necessary for high-stakes AI workloads. View details
    Preview abstract The management of a hybrid workforce comprising human and autonomous computational agents may be challenged by the use of separate systems for human capital and software assets, which can create a governance gap. A system can provide a unified framework for managing a hybrid workforce. For example, the system may utilize a labor service mesh to analyze and route tasks to either a human intent tier or an agentic execution tier. A potential principle of the system is structural symmetry, where computational agents can be assigned digital identities and managed through a lifecycle process that may parallel human resource functions, such as onboarding, performance evaluation, and structured offboarding. This integrated approach can facilitate a unified system of record and governance model for an organization's intelligence capacity. View details
    A Framework for Interactive Machine Learning and Enhanced Conversational Systems
    Jerry Young
    Richard Abisla
    Sanjay Batra
    Mikki Phan
    Nature, Springer-Verlag (2026)
    Preview abstract Conversational systems are increasingly prevalent, yet current versions often fail to support the full range of human speech, including variations in speed, rhythm, syntax, grammar, articulation, and resonance. This reduces their utility for individuals with dysarthria, apraxia, dysphonia, and other language and speech-related disabilities. Building on research that emphasizes the need for specialized datasets and model training tools, our study uses a scaffolded approach to understand the ideal model training and voice recording process. Our findings highlight two distinct user flows for improving model training and provide six guidelines for future conversational system-related co-design frameworks. This study offers important insights on creating more effective conversational systems by emphasizing the need to integrate interactive machine learning into training strategies. View details
    MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
    Sieun Kim
    Qianhui Zheng
    Ruoyu Xu
    Ravi Tejasvi
    Anuva Kulkarni
    Junyi Zhu
    2026
    Preview abstract In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g. faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to five concurrent sources (e.g. two voices + three instruments) with ca. 2 second processing latency. We validate MoXaRt through a technical evaluation on a new, complex dataset we collected, and a 22-participant user study. Our results demonstrate that MoXaRt significantly improves communication clarity—boosting listening comprehension in noisy conditions by 33.2% (p=0.0058)—and significantly reduces cognitive load (M=7.50 vs. M=3.36, p<0.001), paving the way for more perceptive and socially adept XR experiences. View details
    Preview abstract Global shared service centers are critical to modern enterprise operations but struggle to provide consistent, timely support across linguistic boundaries. This paper introduces the Glossary-Grounded Universal Queue (GGUQ), a socio-technical framework designed to bridge the gap between the operational goal of a unified global service queue and the reality of a multilingual workforce. The GGUQ is a real-time, workflow-embedded communication architecture that leverages Large Language Models (LLMs) to provide high-fidelity, two-way translation directly within an agent's enterprise platform. The framework's key innovation is a "glossary-grounded" approach, where translation prompts are programmatically injected with a curated repository of enterprise-specific terminology. This ensures a level of contextual and terminological integrity unachievable by generic machine translation tools. By detailing the GGUQ's three-pillar architecture—Dynamic Translation, Glossary-Grounded Integrity, and Resilient Operations—we propose a new model for computer-mediated communication in global enterprises. This framework aims to move beyond federated, language-siloed support models to enable a true "follow-the-sun" operational capability, promoting both organizational efficiency and a more inclusive employee experience. View details
    VISTA: A Test-Time Self-Improving Video Generation Agent
    Xuan Long Do
    Hootan Nakhost
    The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
    Preview abstract Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons. View details
    Preview abstract Enterprise service centers, particularly in domains like People Operations, are critical hubs of organizational knowledge work. They face a persistent difficulty in disseminating the tacit, case-specific expertise of senior agents, which can lead to inconsistent service and slower onboarding for new hires. While existing Knowledge Management (KM) and Case-Based Reasoning (CBR) systems have improved the retrieval of historically similar cases, they inadvertently shift the cognitive burden of synthesizing this information to the time-constrained agent. This paper introduces the Dynamic Case Precedent (DCP) architecture, a novel socio-technical framework designed to address this gap. The DCP architecture moves beyond simple precedent recommendation to automated precedent synthesis. It achieves this by integrating a semantic retrieval model with the large-context reasoning capabilities of a generative Large Language Model (LLM). We propose a three-pillar framework—(1) Contextual Similarity Indexing, (2) Generative Insight Synthesis, and (3) Human-in-the-Loop Refinement. By analyzing multiple relevant historical cases to generate a concise summary of resolution patterns, the DCP architecture aims to reduce agent cognitive load, accelerate proficiency, and improve service consistency. This conceptual framework offers a new model for human-AI collaboration, framing the AI not as a mere information tool, but as an active partner in sensemaking. View details
    Reasoning-Driven Synthetic Data Generation and Evaluation
    Tim R. Davidson
    Benoit Seguin
    Transactions on Machine Learning Research (2026)
    Preview abstract Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution — limiting their scalability, explainability, and control. In this paper, we introduce Simula: a novel reasoning-driven framework for data generation and evaluation. It employs a seedless, agentic approach to generate synthetic datasets at scale, allowing users to define desired dataset characteristics through an explainable and controllable process that enables fine-grained resource allocation. We show the efficacy of our approach on a variety of datasets, rigorously testing both intrinsic and downstream properties. Our work (1) offers guidelines for synthetic data mechanism design, (2) provides insights into generating and evaluating synthetic data at scale, and (3) unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount. View details
    Preview abstract When managing complex, unpredictable (non-deterministic) AI agents using simple, fixed control systems (like finite state machines), operational failures and accountability issues often arise. This document introduces a probabilistic governance and telemetry framework to resolve these problems. Instead of following a rigid sequence of steps, this framework defines a multi-dimensional operational boundary, a 'behavioral volume', and assigns the agent a goal. This allows the agent to use its own reasoning to achieve the goal while remaining within the defined boundaries. A separate telemetry layer monitors the agent's actions by calculating metrics, such as alignment scores and drift velocity, to measure how much the agent deviates from its intended behavior. This system provides a method for guiding, monitoring, and securing autonomous agents, effectively managing the performance and security of an unpredictable AI workforce in complex environments. View details
    Preview abstract This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation. View details
    On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration
    Yehonathan Refael
    Amit Aides
    Aviad Barzilai
    Vered Silverman
    Bolous Jaber
    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894
    Preview abstract Open-vocabulary object detection (OVD) models offer remarkable flexibility applications by enabling object detection from arbitrary text queries. Still, the zero-shot performance of the pre-trained models is hampered by the inherent semantic ambiguity of natural language, result to low precision, leading to insufficient crucial downstream applications. For instance, in the remote sensing (RS) domain, a query for "ship" can yield varied and contextually irrelevant results. To address this, for real time applications, we propose a novel cascaded architecture that synergizes the broad capabilities of a large, pre-trained OVD model with a lightweight, few-shot classifier. Our approach utilizes the frozen weights of the zero-shot model to generate initial, high-recall object-embedding proposals, which are then refined by a compact classifier trained in real-time on a handful of user-annotated examples. The core of our contribution is an efficient one step active learning strategy for selecting the most informative samples for user annotation. Our method identifies (extremely) small amount of an uncertain candidates near the theoretical decision boundary using density estimation and then applies clustering to ensure a diverse training set. This targeted sampling enables our cascaded system to elevate performance on standard remote sensing benchmarks. Our work thus presents a practical and resource-efficient framework for adapting foundational models to specific user needs, drastically reducing annotation overhead while achieving high accuracy without costly full-model fine-tuning. View details
    DeduBB: Binary Code Size Reduction via Post-Link Basic Block De-duplication
    Chaitanya Mamatha Ananda
    Rajiv Gupta
    Mahbod Afarin
    Han Shen
    LCTES (Languages, Compilers, Tools and Theory of Embedded Systems) (2026) (to appear)
    Preview abstract Binary sizes of newer versions of software applications tend to be larger, primarily due to feature bloat. This poses various challenges, particularly for mobile applications. It affects upgrade rates directly impacting revenues, increases maintenance costs of supporting multiple versions, and prevents some users from getting critical security fixes. Code bloat also poses a problem for large warehouse-scale applications. Such applications experience performance degradation when their code size exceeds what smaller and more efficient code models can handle. In this paper, we introduce a post-link optimization tech nique called DeduBB, which deduplicates basic blocks of an application across procedure boundaries. While prior tech- niques used function outlining to de-duplicate redundant code sequences, it missed out on many opportunities as it cannot handle code that manipulates the program stack. In addition, previous techniques were either limited to the scope of a module or lacked scalable implementations required to handle large warehouse-scale applications. Our technique, DeduBB, handles all types of code duplication as we use a novel save-and-jump code pattern to execute de-duplicated code blocks. In addition, DeduBB has been designed to work on scalable post-link optimizers and can even be applied to large warehouse-scale datacenter applications. Finally, DeduBB is profile-guided and can be applied selectively to infrequently executed cold basic blocks to not affect application performance. In fact, in several cases, the performance of the smaller application binary improves due to reductions in its hot working set size. We have implemented our technique on the state-of-the-art post link optimizers, BOLT and Propeller. Experiments show that we can significantly reduce the code size of several benchmarks by 1.55% to 18.63%, on both Arm and x86 platforms, and on binaries that have already been heavily optimized for size using existing code size reduction features. Furthermore, aided by profiles, our technique can retain more than 80% of the maximal code size savings without affecting performance. View details
    Preview abstract In modern Kubernetes environments, eBPF (Extended Berkeley Packet Filter) has become the de facto standard for high-performance dataplane enforcement. However, this architecture introduces a complex distributed state problem: the asynchronous synchronization between the Kubernetes control plane (Intent) and the kernel-space BPF maps (Reality). A critical failure mode, termed “Silent Divergence,” occurs when the control plane believes a network policy or identity is applied, but the underlying kernel state is missing or corrupted. In this “Gray Failure” state, standard observability tools—including logs, liveness probes, and agent status checks—report health, while the network silently drops traffic. This paper introduces eBPF-Auditor, a specialized consistency verification framework. Unlike standard agents that rely on event-based reconciliation, eBPF-Auditor performs a periodic “Two-Way State Audit” that mathematically verifies the intersection of Kubernetes Intent and BPF Reality. We demonstrate through fault injection and benchmarks on 5,000 pods that this approach successfully detects state drift with 100% accuracy and negligible sub-millisecond overhead (ms), making it a viable solution for high-frequency runtime verification in production hyperscale clusters. View details
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Preview abstract Online financial scams represent a long-standing and serious threat for which people seek help. We present a study to understand people’s in situ motivations for engaging with scams and the help needs they express before, during, and after encountering a scam. We identify the main emotions scammers exploited (e.g., fear, hope) and characterize how they did so. We examine factors—such as financial insecurity and legal precarity—which elevate people’s risk of engaging with specific scams and experiencing harm. We indicate when people sought help and describe their help-seeking needs and emotions at different stages of the scam. We discuss how these needs could be met through the design of contextually-specific prevention, diagnostic, mitigation, and recovery interventions. View details
    ×