From MAS to MARS: Coordination Failures and Reasoning Trade-offs in Hierarchical Multi-Agent Robotic Systems within a Healthcare Scenario

1 Cornell University, 2 University of Maryland, College Park, 3 Imperial College London 4 New York University
Study Design Pipeline

We designed a controlled test case in a healthcare setting that simulates real-world complexity, serving as a testbed to examine how hierarchical MARS systems operate under high-stakes conditions. Our exploration goes beyond surfacing coordination patterns by analyzing how three factors shape system-level performance: contextual knowledge, communication structures, and model reasoning. κ = 1 indicates the inclusion of contextual and procedural knowledge, while κ = 0 corresponds to its absence. σ = 1 denotes an enhanced communication structure, and σ = 0 reflects its absence. ω specifies the underlying model, either GPT-4o-2024-08-06 or o3-2025-04-16.

Abstract

Multi-agent robotic systems (MARS) build upon multi-agent systems by integrating physical and task-related constraints, increasing the complexity of action execution and agent coordination. However, despite the availability of advanced multi-agent frameworks, their real-world deployment on robots remains limited, hindering the advancement of MARS research in practice. To bridge this gap, we conducted two studies to investigate performance trade-offs of hierarchical multi-agent frameworks in a simulated real-world multi-robot healthcare scenario. In Study 1, using CrewAI, we iteratively refine the system’s knowledge base, to systematically identify and categorize coordination failures (e.g., tool access violations, lack of timely handling of failure reports) not resolvable by providing contextual knowledge alone. In Study 2, using AutoGen, we evaluate a redesigned bidirectional communication structure and further measure the trade-offs between reasoning and non-reasoning models operating within the same robotic team setting. Drawing from our empirical findings, we emphasize the tension between autonomy and stability and the importance of edge-case testing to improve system reliability and safety for future real-world deployment.

Study 1: Evaluation
Study 1: Contextual Knowledge

We developed a knowledge base (KB) containing contextual and procedural knowledge as a shared resource, analogous to organizational documentation, to ground MARS team behavior and decision-making. We evaluated the effectiveness of this contextual knowledge on MARS performance at both the manager and subordinate levels across seven dimensions. Our analysis shows that five critical failure modes persist even with a detailed KB. Annotated example traces are attached below in section "Coordination Failure Modes" below. These findings indicate that while sufficient contextual knowledge is necessary, system structure remains the primary bottleneck for achieving robust coordination.

Study 2: Structure and Reasoning
Study 2: Reasoning Behavior Cards

We identify four major themes in MARS coordination patterns, each comprising several sub-themes. To contextualize these sub-themes, we annotate each with ‘✓’ or ‘✗’ to indicate whether its implications are positive or negative within our test scenario. We also report the frequency of each sub-theme across 20 traces for both GPT-4o and o3. We find distinct behavioral profiles which underscore trade-offs between reasoning and non-reasoning models. For each sub-theme, we provide representative examples along with accompanying comments (green box: [What went well:], red box: [What went wrong:]) in section "Reasoning Behavior Cards" below.

Coordination Failure Modes

Reasoning Behavior Cards

BibTeX

@misc{bai2025masmarscoordinationfailures,
        title={From MAS to MARS: Coordination Failures and Reasoning Trade-offs in Hierarchical Multi-Agent Robotic Systems within a Healthcare Scenario}, 
        author={Yuanchen Bai and Zijian Ding and Shaoyue Wen and Xiang Chang and Angelique Taylor},
        year={2025},
        eprint={2508.04691},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2508.04691}, 
  }