Introduction

As AI Agent technology moves towards widespread application, the traditional point-to-point communication model faces challenges in distributed scenarios, such as difficulties in service discovery, high coupling, and poor reliability. To address these issues, we have developed an innovative A2A over MQTT solution based on Tencent Cloud’s TDMQ MQTT version. This solution combines the lightweight, asynchronous, and loosely coupled characteristics of the MQTT protocol with standardized agent interaction protocols, providing a more flexible, reliable, and scalable communication foundation for multi-agent collaboration.

This solution has been tested and validated in various real-world scenarios. It was also selected as a typical case by the China Academy of Information and Communications Technology under the project “AI Cloud Middleware Enables the Scenarios-Based and Engineering-Based Implementation of Large Models”. Next, we will explore in detail how A2A over MQTT overcomes traditional limitations and helps enterprises build next-generation distributed agent platforms.

Introduction to MQTT Protocol

MQTT (Message Queuing Telemetry Transport) is a lightweight messaging protocol designed for use in low-bandwidth, unreliable network environments. It uses a publish-subscribe model, allowing devices to exchange messages through topics. MQTT features lowoverhead, is easy to implement, and is cross-platform. It is widely used in applications such as the Internet of Things, mobile communications, and real-time data transmission.

Introduction to Agent-to-Agent (A2A) Protocol

To address the interoperability challenges posed by AI agents, the Agent-to-Agent (A2A) protocol provides a standardized framework that enables AI agents from different vendors to communicate and collaborate in a unified manner. Each agent using A2A exposes standardized metadata and a set of common public methods, allowing other agents to interact with it for tasks like delegating work, requesting updates, or coordinating workflows.

How the A2A protocol works

Let’s use a code-generation agent as an example to illustrate the basic workflow of the A2A protocol:

Service discovery: Find an AI Agent

Before delegating a task, it’s necessary to identify and select a suitable Agent. The A2A protocol defines a metadata format that describes an Agent’s capabilities and characteristics, known as AgentCard. AgentCard is a JSON file, which, per the protocol, should be placed in the following path on the Agent server:

https://{agent-server-domain}/.well-known/agent-card.json

Therefore, users must know the address of the Agent server in advance in order to obtain the AgentCard.

Execute task: Stream-based status updates

Once a suitable Agent is found, users can interact with it using the standard methods defined by the A2A protocol. By default, the A2A protocol uses HTTP-based JSON-RPC for communication.

We can call the sendMessage method to send the task request to the Agent. The Agent will process the request and continuously send status updates via streaming responses until the task is completed.

Taking the task of “rewriting the thread pool as coroutines” as an example, the Agent first returns an initial response Task, indicating that the task has been accepted.

{  "id": "12345", 
 "final": false, 
 "status": {  
  "state": "submitted",  
  "message": "Task has been accepted and is being processed." 
 }
}

Next, the Agent will plan the various steps for executing the code generation task, and return the task list and the generated code through the TaskArtifactUpdateEvent:

{  "taskId": "12345", 
 "artifact": {  
  "id": "artifact-1",  
  "name": "TODO List",  
  "parts": [   
   {    
     "kind": "text",    
     "text": "1. Analyze the existing thread pool implementation.\n2. Design the coroutine-based equivalent.\n3. Implement the coroutine version step by step.\n4. Test the new implementation."   
    }  
    ] 
   }
 }

As the task progresses, the Agent will continuously send status updates using TaskStatusUpdateEvent until the task is completed:

{  "id": "12345",
 "final": true, 
 "status": {  
  "state": "completed",  
  "message": "Task has been completed successfully." 
 }
}

Limitations of the A2A Protocol

Although the A2A protocol provides a standardized framework for the interoperability of AI agents, it primarily focuses on defining the interaction processes between agents, ignoring the many challenges associated with complex distributed applications. By relying on HTTP or gRPC, A2A inherits all the limitations of a point-to-point architecture. This approach may work well in small, isolated systems, but it’s insufficient for building large-scale agent service platforms.

Difficulty in service discovery

A2A allows agents to discover each other using AgentCards, but that’s about it. Currently, the A2A protocol only defines how to obtain AgentCards through static URLs; there’s no mechanism for dynamic service discovery. When tasks need to be delegated, it’s necessary to know the addresses of all required agents in advance, which increases the complexity for users and makes maintenance more difficult.

Strong upstream and downstream coupling

The A2A protocol assumes a direct communication path between the caller and the callee, but this isn’t always possible in distributed environments. Issues such as network partitions, firewalls, and NAT can prevent direct communication, leading to failed task delegation.

In addition, A2A tightly couples the agents that communicate with each other through direct HTTP connections. This not only requires each agent to configure the addresses of other agents in the system, but also creates a single point of failure. If one agent becomes unavailable, the entire task process can be interrupted, affecting the system’s reliability and availability.

Lack of state management

The A2A protocol does not have a built-in status management mechanism. Therefore, the caller must track the task’s status and progress manually. Additionally, HTTP’s stateless nature fundamentally conflicts with the need for status management, increasing the complexity of implementation. This is especially true in distributed systems, where a single agent may be served by multiple backend nodes. Since the URL configured by the caller usually refers to a load balancer, it becomes even more difficult to continuously track the task’s status.

Security issues

The A2A protocol only provides out-of-band authentication and lacks mechanisms for service authentication. Functions such as flow control and permission management are not available either. All of these require the agent service provider to develop custom solutions, increasing implementation complexity and the cost associated with integrating with other systems.

A2A Integration Guide via MQTT

To overcome the limitations of the A2A protocol in distributed environments, we proposed the A2A over MQTT solution. By combining the A2A protocol with the MQTT protocol, a more flexible, reliable, and scalable agent interoperation mechanism can be achieved. Compared to the traditional A2A protocol, A2A over MQTT has the following advantages:

  • Dynamic service discovery:By utilizing MQTT’s retained messages, persistent messages, and Request-Response mechanism, agents can dynamically register with and discover other agents. Unlike HTTP, where the target URL must be configured in advance, MQTT allows agents to automatically broadcast their identity when they come online and clean it up automatically when they go offline. This greatly simplifies operational configuration.
  • Server-side load balancing:By utilizing MQTT’s shared subscription mechanism, it’s easy to deploy multiple Agent instances in a cluster. The Broker automatically distributes tasks to available Agents, eliminating the need for additional load balancers like Nginx or LVS. This approach inherently provides high availability.
  • Loosely coupled architecture:Through MQTT’s publish-subscribe model, the caller and executor are completely decoupled, eliminating the need for point-to-point network connections between agents. Agents can be deployed within a private network without worrying about network connectivity issues; they simply need to be able to connect to the MQTT broker to provide external services. This significantly reduces deployment complexity and improves the system’s flexibility and scalability.
  • Built-in status management:By utilizing MQTT’s persistent messaging and QoS mechanisms, messages are not lost even if the agent is temporarily offline. Once the agent is back online, it can automatically retrieve pending tasks and resume processing from where it left off. Additionally, the hierarchical structure of topics facilitates detailed tracking and tracing of task statuses.
  • Enhanced security:It leverages MQTT’s mature TLS encryption, client certificate authentication, and ACL-based permission control mechanisms. There’s no need to reinvent authentication at the application layer like with HTTP. This allows for precise control over which agents can publish tasks and which ones can receive instructions.

请在此添加图片描述

We currently provide the following SDKs and examples to help developers quickly integrate A2A over MQTT: Python SDK Java SDK Golang SDK Dify Plugin Below, we use the Python SDK as an example to demonstrate how to send task requests using A2A over MQTT:

  1. AgentCard detected Unlike traditional A2A protocols, which require knowing the Agent server address in advance, A2A over MQTT allows us to dynamically discover AgentCards through the MQTT service:
# MQTT broker URL
mqtt_broker_url = "mqtt://user0:secret0@1.1.1.1:1883/default-org"
# Resolve agent card via agent name
resolver = AgentCardResolver(mqtt_broker_url)
agent_card = await resolver.discover_agent("code-generator-agent")
# Or fetch all registered agent cards
agent_card_list = await resolver.discover_agents()
  1. Create an A2A over MQTT client and send a task request:
# Create client config
config = ClientConfig()
config.supported_transports = ['MQTT']
# Create ClientFactory and register MQTT transport
factory = ClientFactory(config)
factory.register('MQTT',         
         lambda card, url, config, interceptors: MqttTransport(                      mqtt_broker_url,                      card,                  ))
# Create A2A client using the factory
client = factory.create(agent_card)
# Send message
message = Message(       
     message_id=str(uuid.uuid4()),      
       role=Role.user,      
       parts=[TextPart(text="将线程池改写成协程")],   
      )
response = client.send_message(message)

Those familiar with the A2A protocol will notice that the steps outlined above are almost identical to those used when working with open-source A2A SDKs. This is because the A2A protocol itself allows for modifications to the transport layer. We simply need to specify the MQTT transport layer (MqttTransport) when creating the client.

A2A implementation example using TDMQ MQTT version

Tencent Cloud’s TDMQ for MQTT was officially launched commercially in January 2025. To date, it has served nearly a hundred customers across industries such as transportation, education, and finance

请在此添加图片描述

Based on the TDMQ MQTT version, the A2A over MQTT solution has been applied in numerous practical projects and was recognized by the China Academy of Information and Communications Technology“AI Cloud Middleware Facilitates the Practical Application and Engineering Implementation of Large Models”Here are two typical examples:

请在此添加图片描述

Serve a single Agent: Message Queue Problem Diagnosis Assistant

As a cloud-based message queue service provider, we aim to provide operations personnel with AI capabilities to help them diagnose and resolve various issues related to message queues. Through A2A over MQTT, we can create a message queue troubleshooting agent that can be integrated into existing monitoring, alerting, and issue-diagnosis processes.

请在此添加图片描述

When the Agent service starts, it publishes its AgentCard as a retained message to the Discovery Topic. This allows the operations pipeline to continuously receive the latest AgentCard information from the system by subscribing to the Discovery Topic. Additionally, the Agent sets up a “will message” that ensures any remaining AgentCard is deleted in case of a connection failure. This enables other components to promptly detect the Agent’s unavailable status and trigger service discovery processes again.

In addition, since this diagnostic agent operates within the VPC of the production environment, it cannot usually be directly accessed by external pipelines. By leveraging MQTT’s persistent connection feature, the agent only needs to establish a one-way connection to the broker in order to receive instructions from external pipelines. This significantly reduces the deployment requirements.

After the Agent registers its AgentCard, it subscribes to its own Task Topic and waits to receive task requests. The operations pipeline uses the AgentCard to obtain the Agent’s Task Topic, and then publishes diagnostic tasks to that Topic. Once the Agent receives a task request, it executes the diagnostic logic and returns the diagnostic results and status updates via MQTT in a streaming manner.

Diagnostic tasks usually take a long time to complete. Traditional HTTP requests are prone to timeout and disconnection issues. MQTT’s asynchronous communication mechanism is ideal for such scenarios: once a task is submitted via the pipeline, it can be paused. The system waits for the Agent to complete the task and then receives the result through a callback. This eliminates the need for synchronous waiting, thereby significantly improving system stability and resource utilization.

Distributed Agent Platform: Cloud Mate AI Agent Platform

Cloud Mate, an intelligent operations and maintenance expert, is an AI Agent platform developed by Tencent Cloud for intelligent operations and maintenance scenarios. Its core capability lies in accurate problem diagnosis, enabling comprehensive coverage from infrastructure to business systems. It creates a complete operations and maintenance cycle that spans from issue detection to resolution. The Agent services provided by the Cloud Mate platform use the A2A protocol. Unlike platforms that offer a fixed set of Agent services, Cloud Mate has the following advantages:

  1. The number of Agents and the capabilities of each Agent can be dynamically expanded. Users can add, modify, or remove Agents as needed at any time.
  2. The Cloud Mate backend nodes consist of a distributed system made up of multiple servers. Each node can serve any agent, so load balancing and high availability are essential features.

请在此添加图片描述

During the service discovery phase, in addition to supporting AgentCard registration, we also utilized MQTT’s shared subscription feature to enable load balancing among multiple backend nodes for a single agent. Each Cloud Mate node subscribes to the Discovery Topic using shared subscription upon startup. When a new user attempts to obtain an AgentCard for a particular agent, the MQTT service routes the request to one of the nodes. The node that responds to the request then returns the AgentCard to the user. Subsequent task requests are sent to this same node based on the Task Topic specified in the AgentCard, ensuring sticky routing of requests.

This architecture greatly simplifies system design. In traditional microservice architectures, Etcd/Consul are typically used for service registration and discovery, while Nginx/Gateway is used for routing. In contrast, with the A2A over MQTT approach, a single MQTT Broker handles service registration, load balancing, message routing, and communication tasks, significantly reducing infrastructure maintenance costs and system complexity.

Each Cloud Mate node can handle task requests from any agent, and there are numerous agents in the system. To avoid having each node subscribe to the Task Topics of all agents, we utilize MQTT’s topic wildcard feature. This way, each server only needs to subscribe to one wildcard Task Topic to serve all agents. This is a significant advantage over the native A2A protocol, where each agent would need to provide its own HTTP endpoint.

In addition, thanks to MQTT’s QoS mechanism and persistent session capabilities, even if there are network fluctuations or backend service restarts during message publication, the messages are not lost. Instead, they are temporarily stored in the broker and delivered automatically once the service resumes. This out-of-the-box reliability ensures that we don’t need to implement complex retry and compensation logic at the application layer, allowing us to build highly reliable distributed agent systems.

Future Prospects

A2A over MQTT combines the advantages of both the A2A protocol and the MQTT protocol, offering a flexible, reliable, and scalable new model for agent interoperability. In the future, we plan to continue improving the functionality of A2A over MQTT to further enhance its performance in distributed environments:

  1. Optimization for large file transfers:For common scenarios involving image and audio transmission in multi-modal agent interactions, we explore a hybrid transmission approach that combines object storage with MQTT to improve transfer efficiency.
  2. Enhanced observability:A distributed tracing system based on MQTT messaging is built, allowing for visualization of call topologies and timing between agents, thereby helping developers quickly identify performance bottlenecks.
  3. Ecosystem integration:Promote A2A over MQTT to become the standard transmission layer adapter for mainstream agent frameworks like LangChain and AutoGen, thereby reducing developers’ integration costs.

In the future, we will work to make MQTT Transport part of the A2A standard protocol, enabling its wider adoption and implementation. We will also explore additional use cases to promote the development and widespread use of AI Agent technology.

文章来源于腾讯云开发者社区,点击查看原文