Multi-Agent Communication Patterns for Enterprise AI

A Reference Architecture for Scalable, Maintainable, and Secure AI Agent Systems

Date: January 2026
Authors: Ayanami Hobbes, Mary McGuire
Affiliation: Trizz LLC
Target Audience: CTOs, Enterprise Architects, Technical Evaluators
Conflict of Interest Disclosure: This whitepaper describes the AYA architecture developed by the authors. Performance claims are based on internal testing with methodology disclosed in Appendix E. We actively seek independent validation of these results.

Abstract

Multi-agent AI systems face documented failure rates of 41-86.7% due to architectural issues rather than AI limitations. We present AYA, a reference architecture that addresses these failure modes through structural constraints: Pure Message Architecture (PMA) requiring all agent communication through a centralized message bus, centralized routing for system-wide observability, and single-responsibility agent design. Our heterogeneous agent mesh includes both LLM-based agents (300-2000ms latency) and lightweight specialized agents (sub-100ms) for routing, caching, and data retrieval. Internal testing shows message bus throughput exceeding 10,000 messages/second with sub-10ms routing latency for non-LLM operations, though end-to-end user response times remain dominated by LLM inference (1-2 seconds). The architecture draws from established distributed systems patterns—service mesh, message-passing concurrency, and microservices principles—applied to multi-agent coordination. We discuss limitations including routing overhead, architectural complexity, and unsuitability for simple use cases. Independent validation is welcomed and supported.

Keywords: multi-agent systems, enterprise AI, message-oriented architecture, LLM coordination, distributed systems

Executive Summary

The Multi-Agent Imperative

Multi-agent AI systems have transitioned from academic curiosity to enterprise imperative. The momentum is substantial: Gartner reports a 1,445% surge in multiagent systems (MAS) inquiries from Q1 2024 to Q2 2025 [1]. By 2028, an estimated 33% of enterprise software applications will include agentic AI capabilities, up from less than 1% in 2024 [3].

The economic potential is significant. McKinsey projects that AI agents could generate substantial economic value, with estimates suggesting up to $2.9 trillion per year in the United States alone under optimistic scenarios by 2030 [2]. However, these projections carry significant uncertainty and depend on successful implementation at scale, particularly workflow redesign to enable human-AI collaboration.

The Architectural Challenge

Despite this momentum, the industry faces significant implementation challenges. Gartner predicts that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, and inadequate risk controls [3]. Independent research from UC Berkeley analyzing 1,642+ execution traces found that multi-agent systems experience 41-86.7% failure rates due to architectural gaps—not AI limitations [4].

These statistics reveal a pattern: promising pilots that struggle to scale, integration complexity that consumes development resources, and observability gaps that make debugging difficult.

The AYA Architecture

This paper presents the AYA architecture, a multi-agent system design that addresses common failure modes through architectural constraints. Rather than offering guidelines, AYA provides structural constraints enforced at the code level.

The architecture rests on three foundational principles:

Pure Message Architecture (PMA): All agent-to-agent communication occurs through a centralized message bus. This principle reduces coupling between agents, inspired by patterns that have proven effective in microservices architectures [5].
Centralized Agent Routing: A centralized routing agent manages message flow, enabling decentralized development while maintaining system-wide observability. This mirrors service mesh patterns used by organizations like Google and Microsoft [6].
Single Responsibility Agent Design: Each agent has one primary responsibility, with cross-cutting concerns (logging, metrics, error handling) delegated to specialized agents. This addresses the "role disobedience" and "responsibility overlap" failure modes identified in multi-agent research [4].

Heterogeneous Agent Architecture

A critical distinction: AYA is a heterogeneous agent mesh, not a homogeneous LLM-based system. The architecture includes:

Lightweight Agents (sub-100ms latency):

Intent Parser: Rule-based routing decisions
Cache Manager: Key-value lookups for common queries
SQL Agent: Structured data retrieval
Message Router: Forwarding logic and capability matching
Authentication/Authorization: Policy checks

LLM-Based Agents (300-2000ms latency):

Natural Language Generation
Complex reasoning tasks
Unstructured data analysis

Hybrid Agents:

Use lightweight logic for common cases
Escalate to LLM only when necessary

This heterogeneity enables cost and performance optimization: most operations bypass expensive LLM inference, with LLM agents invoked only when semantic understanding is required.

Key Results from Reference Implementation

Performance benchmarks from our internal testing demonstrate the following results:

Metric	Result	Test Conditions
Message Bus Throughput	>10,000 messages/second	Mixed agent types, in-memory bus
Routing Latency (avg)	<10ms	Non-LLM agents, local network
Routing Latency (p99)	<50ms	Under sustained load
Message Delivery	99.99%	With retry mechanism enabled
Agent Failure Isolation	High	One agent crash does not cascade
Routing Agent Restart	<5 seconds	Stateless design
End-to-End User Latency	1-2 seconds typical	Dominated by LLM inference time

Critical Context: These metrics measure message bus infrastructure performance, not end-to-end AI task completion time. A typical user request involves:

1 LLM call (1-2 seconds)
5-10 lightweight agent interactions (5-50ms total)
Message bus overhead (negligible)

Total user-facing latency: 1-2 seconds, dominated by LLM inference as expected in any LLM-based system.

Note: These results are from controlled internal testing conducted by the development team and have not been independently verified. Production performance will vary based on deployment configuration, network conditions, workload characteristics, and LLM provider selection. Full methodology available in Appendix E. We welcome and will support independent reproduction efforts.

Limitations and Trade-offs

The AYA architecture is not suitable for all scenarios:

Simple use cases: Single-agent solutions may be more appropriate for straightforward tasks. Research suggests centralized multi-agent coordination can degrade performance on simpler tasks [7].
Ultra-low latency requirements: The message bus adds routing overhead (~5-10ms) that may be unacceptable for certain real-time applications requiring sub-millisecond response.
Small teams: The architectural complexity may not justify itself for small projects or teams under 5 developers.
Token cost overhead: Multi-agent coordination can consume 15× more tokens than single-agent approaches for equivalent tasks [7a], though the heterogeneous design mitigates this through selective LLM usage.

These trade-offs are discussed in detail in Section 11.

Section 1: The Problem

1.1 The Rise of Multi-Agent AI

Enterprise Interest

Multi-agent AI is experiencing significant enterprise interest:

Metric	Value	Source
Enterprise apps with AI agents by 2026	40% (projected)	Gartner, August 2025 [1]
Multiagent systems inquiry surge (Q1 2024 → Q2 2025)	1,445%	Gartner, December 2025 [1]
Enterprises with regular AI use (at least one function)	88%	McKinsey, November 2025 [2]
Organizations experimenting with or scaling AI agents	62%	McKinsey, November 2025 [2]

Note on statistics: The 1,445% figure specifically refers to Multiagent Systems (MAS) architecture inquiries. The 88% "regular AI use" figure measures organizations using AI in at least one business function; however, two-thirds have not yet begun scaling beyond pilot deployments [2].

Common Use Cases:

Customer Service: Multi-agent systems handling complex support workflows
Internal Operations: Specialized agents for HR, finance, and operations tasks
Content Generation: Coordinated agents for document creation and code generation
Research Assistance: Literature review, experiment design, and data analysis [8]

1.2 Why Multi-Agent Systems Fail

Empirical Failure Analysis

The UC Berkeley MAST study (Cemri et al., 2025) analyzed 1,642+ execution traces across 7 popular frameworks and identified 14 unique failure modes in three categories [4]:

Failure Category	Examples	Frequency
System Design Issues	Role disobedience, lost conversation history	30-40% of failures
Inter-Agent Misalignment	Ignored input, communication breakdowns	25-35% of failures
Task Verification	Premature termination, incorrect verification	20-30% of failures

The study found failure rates ranging from 41% to 86.7% across different frameworks and task types. Critically, these failures were attributed to architectural issues, not fundamental LLM capability limitations.

The Coupling Problem

Multi-agent systems can develop coupling problems similar to those documented in microservices research [5]. With N agents communicating directly, you have up to N × (N-1) potential connections:

Agent Count	Direct Connections	Complexity
5 agents	20 connections	Manageable
25 agents	600 connections	Challenging
100 agents	9,900 connections	Difficult to maintain

Figure 1: Connection Complexity Growth

Connections
    ^
    |                                    /
    |                                  /   Direct Calls
    |                                /     O(n²)
    |                              /
    |                           /
    |                        /
    |                    /
    |               /
    |          /
    |     /       ______________________ Message Bus
    |  /________                         O(n)
    +-----------------------------------------> Agents
        5    10   25   50   100

The Shared State Problem

Shared databases between agents create several challenges documented in distributed systems literature [9]:

Problem	Impact
Schema Coupling	Changes to shared schema affect multiple agents
Data Contamination	Unintended cross-agent data modifications
Performance Interference	One agent's queries can impact others
Testing Complexity	Difficult to test agents in isolation

Research on agent memory systems suggests that agent-specific memory approaches can outperform shared memory on certain long-running tasks [10].

1.3 The Tool-Calling Discussion

Current Landscape

Many AI agent systems use tool-calling architectures where LLMs directly invoke tools. This approach has trade-offs worth understanding.

Potential Challenges:

Challenge	Description	Mitigation Approaches
Prompt Injection	LLM may not reliably distinguish data from instructions	Input validation, sandboxing
Tool Selection Errors	LLM may select inappropriate tools	Capability constraints, verification
Parameter Issues	LLM may generate incorrect parameters	Schema validation, confirmation steps

Prompt injection is ranked as a significant concern in the OWASP Top 10 for LLM Applications 2025 [11].

When Tool-Calling Works Well:

Simple, well-defined tool interactions
Single-agent systems with limited scope
Rapid prototyping and experimentation
Scenarios where human review is incorporated

When Message-Based Architecture May Be Preferable:

Complex multi-agent coordination
Enterprise deployments requiring auditability
Systems requiring strong isolation guarantees
Scenarios with high security requirements

The choice depends on your specific requirements, risk tolerance, and operational context.

1.4 Hidden Costs of Architectural Debt

Architectural decisions have long-term implications documented in software engineering research:

Cost Category	Impact	Source
Deployment Coordination	Tightly-coupled systems require more deployment coordination	[5]
Development Velocity	Impact analysis and integration testing add overhead	[5]
Maintenance Burden	Technical debt tends to accumulate over time	[9]

These costs are not specific to multi-agent systems but apply to any distributed architecture.

Section 2: Pure Message Architecture

2.1 The Core Principle

All agent-to-agent communication in AYA goes through a message bus.

This is an architectural constraint enforced through:

Static analysis: Linting rules detect direct agent imports
Import restrictions: Agents cannot import other agents' internal modules
Runtime validation: Message bus rejects improperly formatted communication

Figure 2: Pure Message Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        MESSAGE BUS                                   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                   Routing Agent                               │   │
│  │  • Message routing    • Agent registry                        │   │
│  │  • Load balancing     • Capability discovery                  │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│    ┌────────┬────────┬───────┼───────┬────────┬────────┐           │
│    │        │        │       │       │        │        │           │
│    ▼        ▼        ▼       ▼       ▼        ▼        ▼           │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────────┐       │
│ │ LLM │ │SQL  │ │Error│ │ Log │ │Moni-│ │Secu-│ │Cache    │       │
│ │Agent│ │Agent│ │Agent│ │Agent│ │tor  │ │rity │ │Manager  │       │
│ │     │ │     │ │     │ │     │ │Agent│ │Agent│ │         │       │
│ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────────┘       │
│  300-  │  20-  │  5-   │  3-   │  8-   │  15-  │   2-            │
│  2000ms│  80ms │  20ms │  10ms │  25ms │  40ms │   8ms           │
│    │        │        │       │       │        │        │           │
│    └────────┴────────┴───────┴───────┴────────┴────────┘           │
│                     Agent-Specific Databases                         │
│                     (No Shared Tables Pattern)                       │
└─────────────────────────────────────────────────────────────────────┘

Note: Latencies shown are typical observed ranges for each agent type. LLM Agent latency dominated by model inference time (300-2000ms TTFT). Non-LLM agents operate in sub-100ms range.

Why This Approach:

The theoretical foundation draws from message-passing concurrency research (Hoare's CSP, 1978; Actor Model, 1973):

Isolation through communication: Components that share no state interact only through messages
Observable interactions: All interactions can be logged, monitored, and replayed
Failure isolation: Failure in one component is less likely to corrupt another's state

Research suggests systems using structured message passing experience fewer coordination failures than those allowing direct method invocation [12].

2.2 What Pure Message Architecture Prohibits

Prohibited Pattern	Reason
Direct method calls between agents	Creates tight coupling
Shared databases between agents	Schema coupling, data contamination
HTTP/REST calls between agents	N² complexity growth
Direct WebSocket connections	Bypasses centralized observability
Shared file systems	Hidden communication channels
Global state	Unpredictable behavior

2.3 What Pure Message Architecture Requires

1. Message Bus for All Communication

Every interaction between agents flows through the message bus, including:

Commands and responses
Queries and results
Events and notifications
Health checks

2. Standardized Message Format

All messages conform to a validated schema. We provide both Python and JSON Schema representations for language-agnostic implementation:

Python Implementation:

python
from dataclasses import dataclass
from typing import Dict, Any, Optional
from enum import Enum

class MessageType(Enum):
    COMMAND = "command"
    QUERY = "query"
    EVENT = "event"
    RESPONSE = "response"

class Priority(Enum):
    LOW = "low"
    NORMAL = "normal"
    HIGH = "high"
    URGENT = "urgent"

@dataclass
class StandardMessage:
    message_id: str              # Unique identifier (UUID recommended)
    source: str                  # Source agent ID
    target: str                  # Target agent ID  
    message_type: MessageType    # COMMAND, QUERY, EVENT, RESPONSE
    payload: Dict[str, Any]      # Type-specific data
    timestamp: float             # Unix timestamp
    correlation_id: Optional[str] = None  # For request-response pairing
    priority: Priority = Priority.NORMAL  # Message priority
    schema_version: str = "1.0"          # Schema evolution support
    tenant_id: Optional[str] = None      # Multi-tenancy support
    trace_id: Optional[str] = None       # Distributed tracing
    idempotency_key: Optional[str] = None  # Exactly-once processing

JSON Schema Representation:

json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "StandardMessage",
  "type": "object",
  "required": ["message_id", "source", "target", "message_type", "payload", "timestamp"],
  "properties": {
    "message_id": {
      "type": "string",
      "format": "uuid",
      "description": "Unique message identifier"
    },
    "source": {
      "type": "string",
      "description": "Source agent identifier"
    },
    "target": {
      "type": "string", 
      "description": "Target agent identifier"
    },
    "message_type": {
      "type": "string",
      "enum": ["command", "query", "event", "response"],
      "description": "Type of message"
    },
    "payload": {
      "type": "object",
      "description": "Message-type-specific data"
    },
    "timestamp": {
      "type": "number",
      "description": "Unix timestamp"
    },
    "correlation_id": {
      "type": "string",
      "format": "uuid",
      "description": "Correlates request with response"
    },
    "priority": {
      "type": "string",
      "enum": ["low", "normal", "high", "urgent"],
      "default": "normal"
    },
    "schema_version": {
      "type": "string",
      "default": "1.0",
      "description": "Message schema version for evolution"
    },
    "tenant_id": {
      "type": "string",
      "description": "Multi-tenant identifier"
    },
    "trace_id": {
      "type": "string",
      "format": "uuid",
      "description": "Distributed tracing identifier"
    },
    "idempotency_key": {
      "type": "string",
      "description": "Key for idempotent message processing"
    }
  }
}

Enterprise-Ready Fields:

The schema includes fields essential for production deployment:

schema_version: Enables backward-compatible schema evolution
tenant_id: Supports multi-tenant deployments
trace_id: Enables distributed tracing across agent interactions (distinct from correlation_id which pairs requests/responses)
idempotency_key: Prevents duplicate processing in retry scenarios
auth_context: (Can be included in payload) Authentication and authorization metadata

3. Explicit Routing

The Routing Agent serves as the central routing authority:

Capability discovery: Agents register capabilities; senders request capabilities
Load balancing: Multiple agents can provide the same capability
Failover: Automatic routing to backup agents when primary is unavailable

2.4 The Message Primitives

AYA uses four message types that can express agent-to-agent communication needs:

COMMAND: Request-Response

Synchronous, transactional communication where the sender expects a result.

python
response = await comm_bus.send_command(
    target_agent="llm.agent",
    action="generate_text",
    payload={"prompt": "Hello world", "max_tokens": 100}
)

QUERY: Information Retrieval

Read-only requests where no state change is expected. Queries are idempotent and cacheable.

python
result = await comm_bus.send_query(
    target_agent="sql.agent",
    query_type="customer_lookup",
    parameters={"customer_id": "12345"}
)

EVENT: Fire-and-Forget

Asynchronous notifications where no response is expected.

python
await comm_bus.send_event(
    event_type="task_completed",
    event_data={"task_id": "123", "status": "success"}
)

RESPONSE: Completing the Loop

Correlates back to original requests with success/failure status and results.

2.5 Trade-offs

Overhead: Message routing adds latency (~5-10ms) compared to direct calls between non-LLM agents.

Complexity: The message bus infrastructure requires setup and maintenance.

Learning Curve: Developers must adapt to message-based patterns.

Metric	Direct Calls	Message Bus
Single call latency	Lower	Higher (+5-10ms)
System-wide debugging	Distributed	Centralized
Change propagation	Can cascade	Contained
Security audit	Per-agent	Centralized

For simple systems with few agents, the overhead may not be justified. For complex enterprise systems, the benefits of isolation and observability typically outweigh the overhead.

Section 3: Centralized Agent Routing

3.1 The Centralized Router

A single Routing Agent routes all messages, enabling system-wide coordination without tight coupling.

Figure 3: Routing Agent Architecture

                    ┌──────────────────────────────┐
                    │        ROUTING AGENT          │
                    │                                │
                    │  ┌────────────────────────┐   │
                    │  │    Agent Registry       │   │
                    │  │  ┌──────────────────┐  │   │
                    │  │  │ llm.agent        │  │   │
                    │  │  │ - capabilities   │  │   │
                    │  │  │ - status: active │  │   │
                    │  │  ├──────────────────┤  │   │
                    │  │  │ sql.agent        │  │   │
                    │  │  │ - capabilities   │  │   │
                    │  │  │ - status: active │  │   │
                    │  │  └──────────────────┘  │   │
                    │  └────────────────────────┘   │
                    │                                │
                    │  ┌────────────────────────┐   │
                    │  │   Routing Logic         │   │
                    │  │  - Direct routing       │   │
                    │  │  - Capability matching  │   │
                    │  │  - Load balancing       │   │
                    │  └────────────────────────┘   │
                    └──────────────────────────────┘
                                 │
           ┌─────────────────────┼─────────────────────┐
           │                     │                     │
           ▼                     ▼                     ▼
      ┌─────────┐          ┌─────────┐          ┌─────────┐
      │ Agent A │          │ Agent B │          │ Agent C │
      └─────────┘          └─────────┘          └─────────┘

Why Centralized Routing:

This pattern mirrors the service mesh architecture (Istio, Envoy) used at scale by major cloud providers [6]:

Service Mesh Component	AYA Equivalent	Capability
Control Plane	Routing Agent	Central configuration
Service Discovery	Agent Registration	Dynamic capability discovery
mTLS	Security Agent	Secure communication
Telemetry	Monitor Agent	Metrics and tracing
Circuit Breaking	Error Agent	Failure isolation

3.2 Capability-Based Routing

Agents register capabilities; senders request capabilities rather than specific agents.

Benefits:

Multiple agents can provide the same capability (load balancing)
New agents can be added without changing senders
Graceful degradation when agents are unavailable

Routing Strategies:

Strategy	Use Case
Direct	Messages to a specific known agent
Capability-Based	Route to any agent providing a capability
Broadcast	Event notifications to all interested agents
Multicast	Deliver to a specific subset of agents

3.3 The Single Point Question

Concern: "Isn't a central hub a single point of failure?"

Response: The Routing Agent is designed to be stateless and replaceable:

Stateless design: Agent registry can be rebuilt from agent heartbeats
Fast restart: <5 seconds recovery time in testing
Horizontal scaling: Multiple Routing Agent instances possible for high availability

High Availability Configuration:

For production deployments requiring continuous operation:

Active-Active Routing Agents: Multiple instances share load via consistent hashing
Agent Registry Persistence: Optional external cache (Redis, etcd) for faster recovery
Health Monitoring: Automated failover when routing agent becomes unresponsive
Stateless Operation: No transaction state maintained; all routing decisions from current agent registry

The alternative—peer-to-peer routing—creates:

N² connections (difficult to manage at scale)
No centralized observability
No centralized security enforcement
Uneven load distribution

Security Agent High Availability:

Similar stateless design principles apply to the Security Agent:

Policy Caching: Authorization policies cached at edge agents for degraded-mode operation
Multiple Instances: Active-active deployment for load distribution
Policy Updates: Distributed via message bus to all instances
Decision Audit: All authorization decisions logged regardless of which instance serves them

This ensures the security layer does not become a single point of failure any more than the routing layer.

Section 4: Agent Responsibility Boundaries

4.1 Single Responsibility Agent Design

Each agent in AYA has one primary responsibility.

AYA Agent Responsibilities:

Agent	Responsibility	Does NOT Do	Typical Latency
Routing Agent	Message routing	Business logic	2-5ms
LLM Agent	Language model operations	Data storage, error handling	300-2000ms
SQL Agent	Structured data retrieval	Content generation	20-80ms
Cache Manager	Key-value caching	Database queries	2-8ms
Error Agent	Error handling & recovery	Logging (delegates to Log Agent)	5-20ms
Log Agent	Centralized logging	Error handling (delegates to Error Agent)	3-10ms
Monitor Agent	Metrics & monitoring	Business decisions	8-25ms
Security Agent	Security & authorization	Message routing	15-40ms
Connection Agent	External system integration	Internal business logic	Varies

Latency Context: Times shown are agent processing time, not including network overhead or downstream dependencies. LLM Agent latency reflects current LLM provider TTFT (Time-to-First-Token) characteristics.

4.2 Cross-Cutting Concern Delegation

Agents delegate cross-cutting concerns to specialized agents rather than implementing them locally.

Figure 4: Cross-Cutting Concern Delegation

┌─────────────────────────────────────────────────────────────────┐
│                        LLM Agent                                 │
│                                                                  │
│  async def generate_text(self, prompt):                         │
│      start_time = time.time()                                   │
│      try:                                                        │
│          result = await self._call_llm(prompt)                  │
│                                                                  │
│          # Delegate logging                                      │
│          await self.comm_bus.send_command(                      │
│              target_agent="log.agent",                          │
│              action="log_message",                               │
│              payload={"level": "INFO", "message": "Generated"}  │
│          )                                                       │
│                                                                  │
│          # Delegate metrics                                      │
│          await self.comm_bus.send_command(                      │
│              target_agent="monitor.agent",                      │
│              action="record_metric",                             │
│              payload={"metric": "llm_latency", "value": elapsed}│
│          )                                                       │
│                                                                  │
│          return result                                           │
│                                                                  │
│      except Exception as e:                                      │
│          # Delegate error handling                               │
│          await self.comm_bus.send_command(                      │
│              target_agent="error.agent",                        │
│              action="handle_error",                              │
│              payload={"error": str(e), "agent": self.agent_id}  │
│          )                                                       │
│          raise                                                   │
└─────────────────────────────────────────────────────────────────┘

Benefits:

Benefit	Impact
No code duplication	Single implementation for each concern
Consistent implementation	Uniform logging format, error handling
Single point of enhancement	Update once, apply everywhere
Audit compliance	Complete, centralized audit trail

Section 5: Security Architecture

5.1 The Security Agent

All security decisions in AYA flow through a single Security Agent.

Rationale: Distributed security implementations create attack surfaces that scale with agent count. A single compromised security check can potentially cascade through the system [13].

Security Agent Responsibilities:

Function	Description
Authentication validation	Verify identity claims
Authorization checks	Enforce access policies
Rate limiting	Prevent abuse
Audit logging	Record security events
Threat detection	Identify anomalous patterns

5.2 Zero Trust Between Agents

Agents do not implicitly trust each other. Every message is validated.

Message Validation Flow:

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│   Source    │      │   Routing   │      │   Target    │
│   Agent     │──────│    Agent    │──────│   Agent     │
└─────────────┘      └──────┬──────┘      └─────────────┘
                            │
                            │ Validate
                            ▼
                    ┌─────────────┐
                    │  Security   │
                    │   Agent     │
                    └─────────────┘

Source agent sends message
Routing Agent forwards to Security Agent for validation
Security Agent checks permissions, rate limits
If authorized, message is routed to target
All interactions logged for audit trail

5.3 Compliance Support

Every agent action is auditable because every action is a message.

Common Control Requirements Support:

AYA's architecture supports common control requirements found in compliance frameworks such as:

SOC2: Complete audit trail for access controls, change tracking
HIPAA: Patient data access logging, authorization enforcement
GDPR: User data processing audit trail, consent tracking

Important Clarification: AYA does not confer compliance by itself. Organizations must implement appropriate controls, policies, and operational procedures. The architecture provides technical capabilities that support these requirements, but compliance is achieved through holistic implementation including personnel training, policy enforcement, and regular auditing beyond the system architecture.

5.4 Security Limitations

No security architecture is impenetrable. AYA's approach:

Reduces attack surface by centralizing security logic
Improves auditability by logging all interactions
Does not guarantee protection against all attack vectors

Security requires defense in depth, including network security, input validation, and operational security practices beyond the architecture itself.

Section 6: External System Integration

6.1 The Connection Agent

The Connection Agent serves as AYA's gateway to external systems.

Figure 5: Connection Agent Architecture

┌───────────────────────────────────────────────────────────────────┐
│                     AYA INTERNAL ARCHITECTURE                      │
│                                                                    │
│   ┌─────────┐    ┌─────────┐    ┌─────────────────────────────┐  │
│   │   LLM   │    │  SQL    │    │      Connection Agent       │  │
│   │  Agent  │    │  Agent  │    │                             │  │
│   └────┬────┘    └────┬────┘    │  • HTTP client              │  │
│        │              │         │  • WebSocket management     │  │
│        └──────┬───────┘         │  • Webhook handling         │  │
│               │                 │  • Protocol translation     │  │
│               ▼                 └──────────────┬──────────────┘  │
│        ┌─────────────┐                         │                  │
│        │   Routing   │◄────────────────────────┘                  │
│        │    Agent    │                                            │
│        └─────────────┘                                            │
└───────────────────────────────────────────────────────────────────┘
                                 │
                                 │ External Connections
                                 ▼
┌───────────────────────────────────────────────────────────────────┐
│                      EXTERNAL SYSTEMS                              │
│                                                                    │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐            │
│   │ OpenAI  │  │ Slack   │  │ Database│  │ Custom  │            │
│   │   API   │  │   API   │  │   API   │  │   API   │            │
│   └─────────┘  └─────────┘  └─────────┘  └─────────┘            │
└───────────────────────────────────────────────────────────────────┘

Capabilities:

Capability	Description
Outbound REST API calls	Make HTTP requests to external APIs
WebSocket connections	Real-time bidirectional communication
Webhook handling	Receive callbacks from external systems
Protocol translation	Convert between internal and external formats

6.2 LLM Provider Integration

AYA supports multiple LLM providers through the LLM Agent and Connection Agent:

Provider Type	Integration Method
OpenAI/GPT	Direct API integration
Anthropic/Claude	Direct API integration
Local Models (Ollama, vLLM)	Local endpoint connection
Azure OpenAI	Enterprise endpoint support
AWS Bedrock	Multi-model access

Section 7: Competing Solutions Analysis

7.1 Framework Landscape

Several frameworks address multi-agent coordination with different design philosophies. Each framework optimizes for different use cases:

Framework	Architecture Style	Positioning
LangChain	Chain-based, modular	Rapid prototyping with extensive tool ecosystem
AutoGen	Conversation-first	Flexible collaboration patterns, research-oriented
CrewAI	Role-based teams	Intuitive role assignment for business workflows
LangGraph	Graph-based workflows	Visual debugging and explicit control flow
MetaGPT	Software company metaphor	Code generation with defined development roles
AYA	Message-oriented	Production deployment with architectural constraints

7.2 Trade-off Analysis

Each framework makes different trade-offs appropriate for different contexts:

LangChain Strengths:

Extensive integration ecosystem (database connectors, APIs, data loaders)
Strong community support and documentation
Good for rapid prototyping and experimentation

LangChain Trade-offs:

Breaking changes across versions can impact production stability
Adds ~40% latency overhead vs native SDK calls (documented in community benchmarks)
Complex abstraction layers can make debugging difficult

AutoGen Strengths:

Microsoft backing and active research development
Flexible async architecture
Rich conversational patterns

AutoGen Trade-offs:

Can consume 15× more tokens than single-agent approaches (Anthropic research [7a])
Debugging multi-agent conversations can be challenging
Coordination overhead increases with agent count

CrewAI Strengths:

Intuitive role-based design matches business mental models
Quick setup for standard workflows
Good for small teams

CrewAI Trade-offs:

Context overflow issues documented in complex multi-step tasks
SQLite3 backend limits scalability for high-throughput scenarios
Less control over fine-grained coordination

AYA Positioning:

AYA optimizes for production deployment constraints:

Architectural enforcement over best-practice guidelines
Clear observability and audit trails
Isolation and failure containment
Heterogeneous agent types for cost/performance optimization

AYA Trade-offs:

Higher initial setup complexity than rapid prototyping frameworks
Message bus overhead (~5-10ms per hop)
Learning curve for developers unfamiliar with message-oriented patterns

7.3 Framework Comparison Table

Feature	LangChain	AutoGen	CrewAI	AYA
Primary Use Case	Prototyping	Research	Business	Production
Architecture Style	Chains	Conversations	Roles	Messages
Observability	Moderate	Low	Low	High
Coordination Overhead	Moderate	High	Moderate	Low-Moderate
Setup Complexity	Low	Moderate	Low	High
Failure Isolation	Limited	Limited	Limited	Strong
Token Efficiency	Good	Low	Moderate	High (heterogeneous)
Community Size	Large	Medium	Medium	Small
Breaking Change History	Frequent	Moderate	Low	N/A (new)

Section 8: Implementation Considerations

8.1 Deployment Patterns

Development Environment:

In-memory message bus (asyncio.Queue, Python)
Single-node deployment
File-based logging

Staging Environment:

Redis Pub/Sub for message bus
Multi-node capability testing
Centralized logging (ELK, Datadog)

Production Environment:

NATS or Kafka for message bus (depending on throughput requirements)
Horizontally scaled agents
Distributed tracing (OpenTelemetry)
High availability configuration for Routing and Security agents

8.2 Message Bus Technology Selection

The architecture is bus-implementation-agnostic. Common choices:

Technology	Throughput	Latency	Persistence	Best For
asyncio.Queue	50K+ msg/s	<1ms	No	Development, testing
Redis Pub/Sub	100K+ msg/s	1-3ms	Optional	Medium-scale production
NATS	1M+ msg/s	<1ms	Optional	High-throughput systems
Kafka	1M+ msg/s	5-10ms	Yes	Event sourcing, audit
RabbitMQ	50K+ msg/s	2-5ms	Yes	Complex routing logic

8.3 Cost Optimization Strategies

Multi-agent systems can be expensive due to LLM token consumption. AYA's heterogeneous architecture enables:

Intelligent Routing: Check cache before invoking LLM
SQL First: Use structured queries for data retrieval when possible
Intent Classification: Lightweight models determine if LLM is needed
Response Caching: Cache common LLM responses
Batch Processing: Group similar requests when real-time not required

Example Cost Reduction:

Traditional approach (every query hits LLM):

1,000 queries/day × $0.01/query = $10/day = $3,650/year

Heterogeneous approach (80% handled by cache/SQL):

200 LLM queries/day × $0.01/query = $2/day = $730/year
80% cost reduction

Section 9: Real-World Workflow Example

9.1 User Query Processing

Scenario: User asks "What were our Q4 2025 sales in California?"

Message Flow:

User Request
    │
    ▼
Intent Parser Agent (8ms)
    │ Classifies as: data_query, likely_cached=true
    │
    ▼
Cache Manager Agent (3ms)  
    │ Cache miss
    │
    ▼
SQL Agent (45ms)
    │ SELECT SUM(sales) FROM revenue WHERE quarter='Q4' AND state='CA' AND year=2025
    │ Returns: $12.5M
    │
    ▼
Response Formatter Agent (5ms)
    │ Formats: "Your Q4 2025 California sales were $12.5 million."
    │
    ▼
User Response (Total: 61ms)

No LLM invoked. Total cost: infrastructure only (~$0.0001).

9.2 Complex Reasoning Request

Scenario: User asks "Analyze the trend in California sales and recommend regions for expansion."

Message Flow:

User Request
    │
    ▼
Intent Parser Agent (8ms)
    │ Classifies as: analysis_required, needs_llm=true
    │
    ▼
SQL Agent (120ms)
    │ Retrieves 3-year California sales trend data
    │
    ▼
SQL Agent (85ms)  
    │ Retrieves comparison data for other states
    │
    ▼
Context Builder Agent (15ms)
    │ Structures data for LLM prompt
    │
    ▼
LLM Agent (1,400ms)
    │ Analyzes trends, generates recommendations
    │
    ▼
Response Formatter Agent (8ms)
    │
    ▼
User Response (Total: 1,636ms)

LLM invoked once. Cost: ~$0.008 (depending on model and token count).

This demonstrates how the heterogeneous architecture optimizes for both performance and cost by routing to the appropriate agent type.

Section 10: Observability and Debugging

10.1 Centralized Logging

Every message passing through the bus can be logged:

python
# Monitor Agent receives all message events
{
  "timestamp": "2026-01-16T10:30:45.123Z",
  "message_id": "msg_abc123",
  "source": "intent.parser",
  "target": "cache.manager",
  "message_type": "query",
  "latency_ms": 3.2,
  "success": true
}

10.2 Distributed Tracing

Using trace_id field enables end-to-end request tracking:

Request trace_id: trace_xyz789

1. [10:30:45.100] intent.parser → cache.manager (8ms)
2. [10:30:45.108] cache.manager → sql.agent (3ms) [cache miss]
3. [10:30:45.111] sql.agent → database (45ms)
4. [10:30:45.156] sql.agent → response.formatter (2ms)
5. [10:30:45.158] response.formatter → user (5ms)

Total: 63ms

10.3 Failure Diagnosis

When failures occur, the message log provides:

What happened: Error message and stack trace
Where: Which agent failed
When: Precise timestamp
Context: Full message payload and correlation ID
Upstream: Complete message chain leading to failure

This eliminates "debugging archaeology" common in tightly-coupled systems.

Section 11: Limitations and Future Work

11.1 Known Limitations

1. Latency Overhead

The message bus adds 5-10ms per hop. For applications requiring sub-millisecond response:

Direct function calls may be more appropriate
Consider hybrid architecture with message bus for cross-cutting concerns only

2. Complexity for Simple Use Cases

A single-agent system with direct tool calling may be simpler and sufficient for:

Proof-of-concept projects
Internal tools with limited user base
Well-scoped, single-domain problems

3. Token Consumption Overhead

Multi-agent coordination inherently consumes more tokens than single-agent approaches. While our heterogeneous design mitigates this through selective LLM usage, complex multi-step workflows still incur coordination overhead. Anthropic research documents up to 15× token consumption in coordination-heavy scenarios [7a].

4. Message Bus as Potential Bottleneck

While message buses can handle high throughput (NATS: 1M+ msg/s), poorly configured deployments or inadequate infrastructure can create bottlenecks. Monitor bus performance and scale appropriately.

5. Learning Curve

Teams unfamiliar with message-oriented architecture or distributed systems patterns face a steeper learning curve than frameworks with simpler abstractions.

11.2 Open Research Questions

Agent Discovery and Registration:

How should dynamic agent discovery work in multi-region deployments?
What's the optimal heartbeat frequency for agent health monitoring?

Message Prioritization:

How should the routing agent handle priority queuing under load?
Should low-priority messages be dropped or delayed during system stress?

Cross-Framework Interoperability:

Can AYA agents communicate with agents from other frameworks (LangChain, AutoGen)?
What would a standard multi-agent communication protocol look like?

Formal Verification:

Can we formally prove certain properties (e.g., message delivery guarantees, isolation)?
What verification techniques apply to LLM-based agent systems?

11.3 Scalability Boundaries

Our testing has been limited to:

Up to 50 concurrent agents
Single-region deployment
Up to 10,000 messages/second sustained load

Production deployments at significantly larger scale (100+ agents, multi-region, 50K+ msg/s) have not been validated. We hypothesize the architecture will scale but require:

Message bus sharding or partitioning
Geo-distributed routing agents
More sophisticated load balancing

We welcome collaboration with organizations operating at these scales to validate and extend the architecture.

Section 12: Conclusion

Multi-agent AI systems represent a significant shift in how we architect AI applications. The empirical evidence—41-86.7% failure rates, 40% project cancellation predictions—suggests that architectural decisions matter as much as model selection.

The AYA architecture applies established distributed systems patterns to multi-agent coordination:

Message-oriented architecture for isolation and observability
Centralized routing for system-wide coordination
Single-responsibility design for maintainability
Heterogeneous agent types for cost and performance optimization

This is not the only valid approach. For rapid prototyping, frameworks like LangChain and CrewAI offer faster time-to-first-demo. For research into agent collaboration patterns, AutoGen provides flexibility to explore novel coordination mechanisms.

AYA optimizes for a specific set of requirements:

Production deployment with high reliability requirements
Enterprise environments requiring audit trails and compliance support
Systems where architectural enforcement prevents common failure modes
Cost-sensitive deployments benefiting from heterogeneous agent types

The central thesis: Multi-agent AI systems fail not because LLMs aren't capable, but because we apply single-agent architectural patterns to multi-agent problems. Message-oriented architecture, borrowed from decades of distributed systems research, provides a foundation for systems that can scale beyond prototype.

We acknowledge this work represents early-stage exploration. The architecture has not been validated at massive scale, independently verified, or battle-tested across diverse production environments. We offer it as a reference implementation and invite collaboration, critique, and independent validation.

Appendix A: Glossary

Term	Definition
Agent	Autonomous software component with a specific responsibility
Message Bus	Infrastructure for asynchronous message passing between agents
Pure Message Architecture	Design constraint requiring all agent communication via message bus
Routing Agent	Centralized component managing message flow and agent discovery
Capability	Function or service an agent can provide
Heterogeneous Mesh	System combining different agent types (LLM-based, rule-based, hybrid)
TTFT	Time-to-First-Token: latency for LLM to begin generating response
Correlation ID	Identifier linking request and response messages
Trace ID	Identifier for distributed tracing across multiple agent interactions
Idempotency Key	Identifier ensuring duplicate messages are processed only once

Appendix B: Message Schema (Full Specification)

See Section 2.3 for Python and JSON Schema representations.

Appendix C: Agent Capability Registry Example

python
AGENT_CAPABILITIES = {
    "llm.agent": ["text_generation", "summarization", "translation"],
    "sql.agent": ["structured_query", "data_retrieval"],
    "cache.manager": ["cache_get", "cache_set", "cache_invalidate"],
    "error.agent": ["error_handling", "recovery_coordination"],
    "log.agent": ["message_logging", "audit_trail"],
    "monitor.agent": ["metrics_collection", "health_monitoring"],
    "security.agent": ["authentication", "authorization", "rate_limiting"],
    "connection.agent": ["http_client", "websocket", "webhook_handling"]
}

Appendix D: Full Citation List

Market Research:

[1] Gartner. (2025, August). "Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026." Press Release. https://www.gartner.com/en/newsroom/press-releases/2025-08-gartner-predicts-40-percent-enterprise-apps-ai-agents

Note: The 1,445% surge figure comes from Gartner's December 2025 report on Multiagent Systems (MAS) inquiry trends.

[2] McKinsey & Company. (2025, November). "The State of AI: Global Survey 2025." https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai

Note: The $2.9 trillion projection is US-specific and from McKinsey Global Institute's "Agents, robots, and us" report (November 2025).

[3] Gartner. (2025, June 25). "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027." Press Release. https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027

Technical Research:

[4] Cemri, M., Pan, M.Z., Yang, S., Agrawal, L.A., Chopra, B., Tiwari, R., Keutzer, K., Parameswaran, A., Klein, D., Ramchandran, K., Zaharia, M., Gonzalez, J.E., & Stoica, I. (2025, March). "Why Do Multi-Agent LLM Systems Fail?" UC Berkeley. NeurIPS 2025 Datasets and Benchmarks Track (Spotlight). arXiv:2503.13657. https://arxiv.org/abs/2503.13657

[5] Bogner, J., Wagner, S., & Zimmermann, A. (2019). "Architectural Technical Debt in Microservices: A Case Study." IEEE International Conference on Software Architecture Companion (ICSA-C).

[6] Microsoft. (2025). "Istio-based service mesh add-on for Azure Kubernetes Service." Azure Documentation. https://learn.microsoft.com/en-us/azure/aks/istio-about

[7] Anthropic. (2025). "Building Effective Agents." Anthropic Research. https://www.anthropic.com/research/building-effective-agents

[7a] Anthropic. (2025, June). "How we built our multi-agent research system." Anthropic Engineering. https://www.anthropic.com/engineering/how-we-built-our-multi-agent-research-system

[8] Schmidgall, S., Ziaei, R., Achterberg, J., Patel, D., & Ji, S. (2025). "Agent Laboratory: Using LLM Agents as Research Assistants." arXiv:2501.04227. https://arxiv.org/abs/2501.04227

[9] Verdecchia, R., Kruchten, P., & Lago, P. (2021). "Identifying architectural technical debt in microservices." Journal of Systems and Software, 184, 111134.

[10] Xu, Z., Wang, Y., & Liu, Y. (2025). "A-MEM: Agentic Memory for LLM Agents." arXiv:2502.12110. https://arxiv.org/abs/2502.12110

[11] OWASP. (2025). "OWASP Top 10 for LLM Applications 2025." https://owasp.org/www-project-top-10-for-large-language-model-applications/

[12] Sumers, T.R., Yao, S., Narasimhan, K., & Griffiths, T.L. (2024). "Cognitive Architectures for Language Agents." Transactions on Machine Learning Research (TMLR). arXiv:2309.02427. https://arxiv.org/abs/2309.02427

[13] Chen, Y., Wang, X., & Zhang, L. (2025). "Security Vulnerabilities in Distributed Multi-Agent Systems." arXiv:2504.07461. https://arxiv.org/abs/2504.07461

Framework Documentation:

[14] Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., & Wang, C. (2023). "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation." arXiv:2308.08155. https://arxiv.org/abs/2308.08155

[15] Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Zhang, C., Wang, J., Wang, Z., Yau, S.K.S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., & Schmidhuber, J. (2024). "MetaGPT: Meta Programming for Multi-Agent Collaborative Framework." ICLR 2024. arXiv:2308.00352. https://arxiv.org/abs/2308.00352

[16] Qian, C., Cong, X., Liu, W., Yang, C., Chen, W., Su, Y., Dang, Y., Li, J., Xu, J., Li, D., Liu, Z., & Sun, M. (2024). "ChatDev: Communicative Agents for Software Development." ACL 2024. arXiv:2307.07924. https://arxiv.org/abs/2307.07924

[17] Park, J.S., O'Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., & Bernstein, M.S. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." UIST 2023. arXiv:2304.03442. https://arxiv.org/abs/2304.03442

[18] LangChain. (2025). "LangChain Documentation v0.3." https://python.langchain.com/docs/

[19] CrewAI. (2025). "CrewAI Documentation." https://docs.crewai.com/

Appendix E: Benchmark Methodology

Test Environment

Component	Specification
CPU	4 vCPU (Intel Xeon equivalent)
Memory	16 GB RAM
Storage	SSD
Network	Local (sub-millisecond latency)
OS	Ubuntu 22.04 LTS
Python	3.11

Implementation Details

Message Bus:

Development/Testing: asyncio.Queue (Python standard library)
Staging: Redis Pub/Sub (redis-py 5.0)
Production recommendation: NATS or Kafka depending on throughput requirements

Transport Layer:

In-process: asyncio Queue (zero-copy)
Inter-process: Unix domain sockets
Network: TCP with persistent connections

Serialization:

Format: JSON (Python's built-in json module)
Rationale: Human-readable, broadly compatible, sufficient performance for tested loads
Alternative considered: MessagePack for higher throughput scenarios

Delivery Semantics:

At-least-once delivery with broker acknowledgment
Retry logic: 3 attempts with exponential backoff (100ms, 200ms, 400ms)
Acknowledgment path: sender → broker → receiver → broker → sender ACK

Agent Types in Test:

30% LLM agents (OpenAI GPT-4o, simulated 300-2000ms TTFT)
40% lightweight agents (rule-based, <10ms processing)
30% hybrid agents (conditional LLM usage)

Test Procedure

Warm-up: 1000 messages to initialize connections and caches
Baseline: 10,000 messages at sustained rate (all message types)
Burst: 1000 messages in rapid succession (stress test)
Mixed Workload: Combination of COMMAND, QUERY, EVENT messages with realistic payload sizes
Recovery: Agent crash and restart scenarios (10 simulations per agent type)
LLM Latency: Measured separately with actual LLM API calls (100 samples per provider)

Measurements

Message bus throughput: Messages successfully routed per second (measured at broker)
Routing latency: Time from send to delivery confirmation (excludes agent processing time)
Delivery rate: Percentage of messages successfully delivered after retries
Recovery time: Time from agent crash detection to full service restoration
End-to-end latency: Complete request-to-response time including all hops and agent processing

Sample Sizes

Test Category	Sample Size	Runs
Throughput	100,000 messages per run	10 runs
Routing Latency	50,000 messages	5 sessions
Delivery Rate	1,000,000 messages	Cumulative
Recovery	100 crash simulations	Single session
End-to-End w/ LLM	500 complete workflows	3 sessions

Limitations

Single-node testing only: No multi-region or distributed deployment testing
Controlled network conditions: Sub-millisecond latency, no packet loss, no bandwidth constraints
Standard payload sizes: ~1KB average; large payloads (>100KB) not tested
Limited concurrent agent count: Up to 50 agents; scalability beyond this is theoretical
Self-testing: All tests conducted by AYA development team, not independently verified
Simulated LLM latency: Some tests used simulated delays rather than actual LLM API calls to control for provider variability
No adversarial testing: Security and abuse scenarios not systematically tested

Reproducibility

Independent Validation Invitation:

We recognize these results reflect internal testing by the development team and have not been independently verified. We actively seek independent reproduction of these benchmarks and commit to:

Providing full test harness and configuration within 48 hours of request from academic researchers or independent evaluators
Answering methodology questions publicly via GitHub discussions or research forums
Publishing validated results from independent parties alongside our own findings
Supporting validation efforts with access to core contributors for technical questions

Target: External validation within Q2 2026.

Contact for Validation Requests:

Email: research@trizz.ai
GitHub: https://github.com/trizz-ai/aya-benchmark (will be published upon paper acceptance)

Test scripts and configuration are available to enterprise customers and academic researchers. We welcome independent reproduction of these benchmarks and will support validation efforts with full access to methodology, code, and technical consultation.

Last Updated: January 2026
Revision Notes:

v3.0: Major revision incorporating peer review feedback and clarifications
- Critical fixes:
  - Changed authors from "AYA Development Team" to named individuals (Ayanami Hobbes, Mary McGuire)
  - Changed affiliation from "AYA Systems" to Trizz LLC (legal entity)
  - Added formal Abstract section
  - Fixed Gartner timeline error (2027 not 2028 for cancellation prediction)
  - Added missing context for McKinsey $2.9T figure (US-specific)
  - Corrected 88% statistic context (most haven't scaled)
- Heterogeneous architecture clarification:
  - Added new section explaining heterogeneous agent mesh
  - Distinguished LLM agents (300-2000ms) from lightweight agents (sub-100ms)
  - Clarified benchmark claims measure message bus infrastructure, not end-to-end AI latency
  - Added typical latency ranges for each agent type
  - Provided example workflows showing mixed agent types
- Benchmark methodology improvements:
  - Added implementation details (asyncio.Queue, JSON serialization, TCP transport)
  - Specified delivery semantics (at-least-once with acks)
  - Clarified what metrics measure (routing vs end-to-end)
  - Added independent verification invitation
  - Expanded limitations section
- Schema enhancements:
  - Added JSON Schema representation alongside Python
  - Added enterprise-ready fields (schema_version, tenant_id, trace_id, idempotency_key)
  - Explained purpose of each field
- Security and compliance:
  - Softened compliance language ("supports common control requirements" vs "fits")
  - Added explicit disclaimer that AYA doesn't confer compliance by itself
  - Added Security Agent HA discussion
- Enforcement clarity:
  - Maintained focus on architectural patterns rather than implementation details
- Framework comparisons:
  - Reframed as positioning differences rather than superiority claims
  - Added strengths and trade-offs for each framework
  - More balanced tone
- New sections:
  - Section 8: Implementation Considerations
  - Section 9: Real-World Workflow Examples
  - Section 10: Observability and Debugging
  - Appendix A: Glossary
v2.1: Comprehensive revision addressing academic integrity and balance concerns
- Completed all citations with proper author attribution
- Added limitations and trade-offs section
- Revised marketing language to technical tone
- Added balanced competing solutions analysis
- Clarified performance claims as internal testing results
- Added diagrams for key architectural concepts
- Disclosed author affiliation and conflicts of interest
- Removed future papers section
- Added reproducibility information for benchmarks
v2.0: Added research citations and framework comparisons
v1.0: Initial release