The Service Catalog Pattern for AI Agents

written by Stefan Christoph

May 25, 2026 - 14 minutes read

TL;DR: AI agents face the same discovery problem microservices solved with service registries a decade ago, except agents need semantic search, governance workflows, and dynamic capability updates. AWS Agent Registry (preview) implements this pattern as a managed service. I tested it end-to-end: create a registry, register skills, approve them, and discover via natural language queries. The code and gotchas are below.

The Problem We Already Solved Once

In 2012, Netflix open-sourced Eureka [1]. The problem it solved was simple to state and hard to live without: in a microservices architecture, you can’t hardcode where services live. Instances spin up and down. IP addresses change. New capabilities appear. Old ones get deprecated. Without a registry, every service needs to know the exact location of every other service it depends on. That doesn’t scale.

Eureka’s answer was a service registry. Services register themselves on startup. Clients query the registry to discover available instances. Health checks remove dead instances automatically. The registry becomes the single source of truth for “what exists and where to find it.”

Eureka’s primary motivation was runtime reliability: routing requests to healthy instances. But the foundation it established was more general. Before you can route, load-balance, or govern anything, you need to know what exists. That visibility layer is the pattern that transfers.

A decade later, we have the same problem with AI agents. Except it’s worse.

Why Agents Have It Harder

A microservice has a fixed API. It does one thing. Its contract is versioned and documented. When you query a service registry, you know exactly what you’ll get back.

AI agents are different along every dimension that matters:

Dimension	Microservices	AI Agents
Capabilities	Fixed at deploy time	Dynamic — skills added/removed without redeployment
Discovery	By service name or endpoint	By capability description (semantic)
Versioning	API version in URL	Skill version, model version, prompt version — all independent
Trust	mTLS, service mesh	Who approved this agent? What can it access?
Composition	REST/gRPC calls	Tool use, delegation, multi-agent orchestration

In a microservices world, you ask: “Where is the payment service?” In an agent world, you ask: “Which agent can process refunds in German for enterprise customers?” The query is semantic, not structural. The answer might change tomorrow when someone publishes a new skill.

I wrote about this gap in The Agent Security Stack Nobody Is Building [2]: shadow AI is the core accelerant. Teams spin up agents with no tickets, no approvals, no paper trail. You can’t secure what you don’t know exists. You can’t govern what you can’t discover. And you can’t reuse what you can’t find.

The Service Catalog Pattern

The solution is the same architectural pattern, adapted for the new medium. A service catalog for AI agents needs four capabilities that go beyond what Eureka provided:

1. Registration with Metadata

Agents, tools, skills, and MCP servers register themselves with rich metadata: what they do, what inputs they expect, what permissions they need, who built them, and what stage they’re in (development, staging, production).

In the microservices world, registration was mostly about network location. In the agent world, registration is about capability description. The metadata IS the discovery surface.

2. Semantic Discovery

Keyword search isn’t enough. When a developer needs “something that can summarize legal documents in compliance with GDPR,” they need semantic search that understands intent, not just string matching against resource names.

This is the fundamental difference from Eureka or Consul. Those registries answered “where is service X?” The agent catalog answers “what can do Y?” The query language shifts from structural to natural.

3. Governance and Approval

In microservices, any team could deploy a service and register it. The blast radius was limited: a bad service affected its own consumers. In the agent world, a poorly governed agent can access data, make decisions, and take actions across organizational boundaries.

The catalog needs an approval workflow. Someone publishes a new skill. A curator reviews it, checking permissions, data access patterns, and compliance requirements. Only approved resources become discoverable. This is the organizational immune system that prevents shadow AI from metastasizing.

4. Live Discovery

Static catalogs go stale. The most valuable pattern is URL-based discovery: point the registry at a live MCP server endpoint, and it automatically retrieves the current tool schemas, capability descriptions, and metadata. The catalog stays fresh because it reads from the source of truth, the running system itself.

This is analogous to how Consul combined service discovery with health checking. A registered service that fails health checks gets removed from the discovery pool. A registered MCP server whose endpoint returns different tools than what’s cataloged triggers a re-sync.

From Pattern to Implementation

AWS Agent Registry [3], available through Amazon Bedrock AgentCore, implements this pattern. It’s in preview as of April 2026, and it maps directly to the four capabilities above:

Pattern Capability	Agent Registry Implementation
Registration with metadata	Manual registration via console/API, or URL-based auto-discovery from live MCP servers
Semantic discovery	Combined semantic + keyword search; natural language queries against capability descriptions
Governance and approval	Built-in approval workflow; records must be approved before becoming discoverable
Live discovery	URL-based discovery auto-retrieves metadata from running endpoints

The registry supports four resource types: MCP servers, agents, agent skills, and custom resources. It’s accessible three ways: the AgentCore Console UI, APIs (CLI/SDK), or — and this is the interesting one — as an MCP server itself. That means your coding agent can query the registry directly from your IDE to discover available tools without leaving the development flow.

Access control uses both IAM and OAuth (Custom JWT), and CloudTrail provides audit trails of all registry access and administrative actions [3].

The Killer Feature: Skill Updates Without Redeployment

Here’s where the pattern diverges most from traditional service registries. In microservices, updating a service means deploying a new version. In the agent world, updating a skill can mean changing a prompt, adjusting a tool schema, or adding a new capability, all without touching the agent’s deployment.

Danny Teller’s team demonstrated this concretely [4]: they registered 16 skills in under an hour, and the critical insight was that skill updates propagate to all consuming agents without redeployment. Change a skill’s behavior, update the registry record, and every agent that discovers that skill gets the new version at their next session start.

This is hot-swapping for agent capabilities. The closest microservices analogy is feature flags — but instead of toggling behavior within a single service, you’re toggling capabilities across an entire fleet of agents.

The trade-off is latency. Teller’s team measured approximately 1.2 seconds at session start for the catalog fetch [4]. That’s the cost of dynamic discovery versus hardcoded tool lists. For most use cases, it’s invisible. For latency-critical paths, you’d cache the catalog and accept eventual consistency — exactly the same trade-off Eureka made with its client-side caching. That cache also provides resilience: if the registry is temporarily unreachable, agents fall back to their last-known catalog rather than starting with no capabilities.

The Governance Layer We Were Missing

In From Cloud-Native to AI-Native [5], I described the governance gap: everything practitioners were building worked because it was personal and sandboxed. One user, their own data, mistakes that affect only them. The enterprise version needs identity, access control, audit trails, and cost management.

The service catalog pattern fills a specific part of that gap: the visibility layer. Before you can secure agents, authorize them, or audit them, you need to know they exist. Before you can prevent duplicate work, you need to know what’s already been built. Before you can enforce standards, you need a place where standards are discoverable.

Comparison of agent development with and without a registry — showing how a registry prevents duplicate work and enables reuse — Without a registry, teams rebuild capabilities independently. With one, they discover and reuse.

This connects directly to the shadow AI problem I described in the security stack article [2]. IBM’s research found that you can’t secure what you don’t know exists. The registry is the answer to “what exists?” — the prerequisite for every other governance control.

A word of caution: centralization creates coupling. Not every skill should be shared; some duplication preserves team velocity and autonomy. The registry should catalog what’s worth sharing, not mandate sharing everything. The microservices world learned this lesson with shared libraries: too much sharing creates coordination bottlenecks. The same applies here.

The MCP Connection

The registry’s most architecturally interesting property is that it’s itself an MCP server. This creates a recursive pattern: agents use MCP to discover other MCP servers through the registry.

In From Chaos to Control [6], I described the progression from ad-hoc agent code to governed tools. The registry is the infrastructure that makes that progression manageable at scale. Instead of each team maintaining their own list of available tools, there’s a single, governed, searchable catalog that every agent can query.

The MCP-native access pattern also means the registry integrates naturally with the emerging agent development workflow. You don’t need a separate console or CLI to discover available capabilities. Your agent’s existing MCP client can query the registry the same way it queries any other tool server.

What This Means for Architects

If you’re designing multi-agent systems, the service catalog pattern changes your architecture in three ways:

1. Decouple capability from deployment. Skills and tools become independently versionable resources, not hardcoded dependencies. Your agent’s capabilities are defined by what it discovers at runtime, not what was compiled in at build time.

2. Centralize governance, distribute execution. The registry is the governance chokepoint: approval workflows, audit trails, access control. But execution remains distributed. Agents run wherever they need to, discovering capabilities dynamically.

3. Enable composition without coordination. When Team A publishes a new skill, Team B’s agents can discover and use it without any direct coordination between the teams. The registry mediates. This is the same loose coupling that service meshes provided for microservices.

For multi-account AWS organizations, the current pattern is a registry in a shared-services account with cross-account IAM policies. Cross-registry federation (connecting multiple registries and searching as one) is on the roadmap but not yet available.

The Maturity Curve

Not every organization needs a full registry on day one. The maturity progression looks like this:

Stage	What You Have	What You Need
Exploring	A few agents, hardcoded tool lists	Nothing — just build
Scaling	10+ agents, duplicate capabilities, no visibility	A catalog — even a spreadsheet helps
Governing	Production agents handling real data	Approval workflows, audit trails, access control
Optimizing	Large fleet, dynamic capabilities	Semantic discovery, live sync, automated governance

Most organizations I work with are between “Scaling” and “Governing.” They’ve moved past the experimentation phase but haven’t yet built the infrastructure to manage agents at scale. The service catalog pattern is the bridge.

If you have fewer than 10 agents today, you don’t need a managed registry yet. But the organizations that built service catalogs early in their microservices journey avoided years of technical debt from service sprawl. The same will be true here. The question isn’t whether you’ll need this — it’s whether you’ll build it proactively or reactively.

Rolling Your Own: Agent Registry on AWS

Enough theory. Here’s how to implement the service catalog pattern using AWS Agent Registry. I tested this end-to-end in eu-west-1 — the code below is what actually worked, including the gotchas I hit along the way.

Prerequisites

An AWS account with AgentCore access (available in us-east-1, us-west-2, ap-northeast-1, ap-southeast-2, eu-west-1 — note: eu-central-1/Frankfurt is not yet available)
boto3 >= 1.42.87 (earlier versions don’t have the Registry APIs)
IAM permissions for bedrock-agentcore-control:* and bedrock-agentcore:*

pip install --upgrade boto3>=1.42.87

Step 1: Create a Registry

The registry is your catalog. You can organize by team, environment, or resource type — whatever maps to your governance model.

import boto3
import time

REGION = 'eu-west-1'
control_client = boto3.client('bedrock-agentcore-control', region_name=REGION)

# Create the registry
response = control_client.create_registry(
    name='my-agent-registry',
    description='Central catalog for agent skills and MCP servers'
)

registry_id = response['registryId']
print(f"Registry ID: {registry_id}")
print(f"Status: {response['status']}")  # CREATING

# Wait for READY status (~90 seconds)
while True:
    status = control_client.get_registry(registryId=registry_id)['status']
    if status == 'ACTIVE':
        break
    print(f"  Waiting... status={status}")
    time.sleep(10)

print("Registry is ACTIVE")

Gotcha #1: Registry creation takes about 90 seconds. Don’t poll too aggressively — a 10-second interval works well.

Step 2: Register a Resource

Each record represents something discoverable — an MCP server, an agent, a skill, or a custom resource. The CUSTOM type is the easiest to start with because it doesn’t require strict schema validation.

# Register a skill (using CUSTOM type for flexibility)
record_response = control_client.create_registry_record(
    registryId=registry_id,
    name='pricing-lookup-skill',
    description='Looks up AWS service pricing by region, instance type, and usage pattern. Returns hourly and monthly cost estimates.',
    recordType='CUSTOM',
    descriptorType='CUSTOM',
    descriptor={
        'customDescriptor': {
            'description': 'AWS pricing lookup tool for Solutions Architects',
            'metadata': {
                'owner': 'platform-team',
                'stage': 'production',
                'capabilities': 'pricing, cost-estimation, region-comparison',
                'invocation': 'MCP tool via bedrock-agentcore gateway'
            }
        }
    }
)

record_id = record_response['recordId']
print(f"Record ID: {record_id}")
print(f"Status: {record_response['status']}")  # CREATING → DRAFT

Gotcha #2: The descriptorType must be one of: A2A, CUSTOM, MCP, AGENT_SKILLS. If you use MCP, the descriptor must comply exactly with the MCP registry schema version 2025-12-11 — which is strict and not well-documented yet. Start with CUSTOM for quick iteration.

Step 3: Submit for Approval

Records aren’t discoverable until approved. This is the governance gate.

# Wait for record to reach DRAFT status
time.sleep(5)

# Submit for approval
control_client.submit_registry_record_for_approval(
    registryId=registry_id,
    recordId=record_id
)
print("Submitted for approval")

If your registry has auto-approval enabled, the record transitions directly to APPROVED. Otherwise, a curator must explicitly approve it:

# Curator approves the record
control_client.approve_registry_record(
    registryId=registry_id,
    recordId=record_id
)
print("Record approved — now discoverable")

Step 4: Discover via Semantic Search

This is where it gets interesting. The search isn’t just keyword matching — it understands intent.

# Search uses the data plane client (not control plane!)
data_client = boto3.client('bedrock-agentcore', region_name=REGION)

# Semantic search — no exact keyword match needed
results = data_client.search_registry_records(
    registryIds=[registry_id],
    searchQuery='I need help estimating cloud costs for a customer'
)

for record in results.get('records', []):
    print(f"  Found: {record['name']}")
    print(f"  Description: {record['description']}")
    print(f"  Score: {record.get('score', 'N/A')}")

Gotcha #3: Search is on the data plane client (bedrock-agentcore), not the control plane (bedrock-agentcore-control). The parameter is searchQuery (not searchTerm) and registryIds takes a list.

In my testing, querying “I need help finding AWS pricing information” matched a record described as “Research and documentation tools for SA workflows. Includes AWS doc search and pricing lookup.” No keyword overlap — pure semantic understanding. This is the killer feature for enterprise adoption — though I should note this was tested with a small number of records. How well semantic ranking holds up with thousands of records across an enterprise is an open question in preview.

Step 5: Access as an MCP Server

The registry itself is an MCP server. This means your agents can discover other agents’ capabilities without leaving the MCP protocol:

{
  "mcpServers": {
    "agent-registry": {
      "command": "aws",
      "args": [
        "bedrock-agentcore", "start-mcp-server",
        "--registry-id", "YOUR_REGISTRY_ID",
        "--region", "eu-west-1"
      ]
    }
  }
}

Add this to your agent’s MCP configuration (Kiro, Claude Code, or any MCP-compatible client), and the agent gains a search_registry_records tool it can call at runtime to discover available capabilities dynamically.

Registering an MCP Server (Advanced)

For teams ready to register actual MCP servers with full schema validation:

# URL-based discovery — registry pulls metadata from a live endpoint
record_response = control_client.create_registry_record(
    registryId=registry_id,
    name='weather-mcp-server',
    description='Real-time weather data for any location worldwide',
    recordType='MCP',
    descriptorType='MCP',
    discoveryUrl='https://your-mcp-server.example.com/.well-known/mcp.json'
)

URL-based discovery automatically retrieves tool schemas and capability descriptions from the live endpoint. The registry stays fresh because it reads from the source of truth — the running server itself.

The Full Lifecycle

The full lifecycle: publish → approve → discover. EventBridge notifications keep curators in the loop.

Authorization Models

Choose based on your organization’s identity setup:

Model	Best For	Setup Complexity
IAM	AWS-native teams, CLI/SDK access	Low — standard IAM policies
JWT (Cognito quick-create)	Quick start, AWS-managed IdP	Low — auto-configured
JWT (Bring your own IdP)	Enterprise with Okta/Entra/Auth0	Medium — requires Discovery URL + audience config

Gotcha #4: Auth type cannot be changed after registry creation. A registry supports only ONE auth type. Choose carefully upfront.

Official Resources

The AgentCore Starter Toolkit on GitHub [7] has 15+ tutorials covering:

Admin approval flows
Semantic and filter-based search
OAuth-based flows
Pull and push-based synchronization
Calling registry search at runtime from Strands agents
Integration with Kiro using Dynamic Client Registration
Publishing agents/MCP tools using Kiro

The Zero to Registry in 10 Minutes notebook [7] is the fastest path to a working setup.

For the full API reference, see the Agent Registry documentation [8].

Scope and Freshness

This post reflects the state of AWS Agent Registry as of its preview launch on April 9, 2026 [3]. The service is in preview — features, APIs, and regional availability will evolve. Versioning and cross-registry federation are on the roadmap but not yet available. Check the official documentation [8] for the latest.

The architectural pattern itself — centralized catalog, semantic discovery, governance workflows — is stable and vendor-neutral. Whether you implement it with Agent Registry, build your own, or use a different platform, the pattern applies.

What’s your current approach to agent discovery? Are your teams still maintaining hardcoded tool lists, or have you started building toward a catalog? I’d love to hear what’s working at scale.

Sources

[1] Netflix — “Netflix Eureka” (open-sourced 2012): https://github.com/Netflix/eureka

[2] Christoph, S. — “The Agent Security Stack Nobody Is Building” (May 2026): https://schristoph.online/blog/agent-security-stack/

[3] AWS — “AWS Agent Registry for centralized agent discovery and governance is now available in Preview” (April 9, 2026): https://aws.amazon.com/about-aws/whats-new/2026/04/aws-agent-registry-in-agentcore-preview/

[4] Teller, D. — “AgentCore Registry: 16 Skills, 1 Hour, Zero Downtime” (customer blog, 2026): https://dev.to/aws-builders/agentcore-registry-16-skills-1-hour-zero-downtime-4ne7

[5] Christoph, S. — “From Cloud-Native to AI-Native: What Actually Changes” (April 2026): https://schristoph.online/blog/from-cloud-native-to-ai-native/

[6] Christoph, S. — “From Chaos to Control: Building Predictable AI Agents That Get Smarter Over Time” (January 2026): https://schristoph.online/blog/from-chaos-to-control-building-predictable-ai-agents-that-ge/

[7] AWS — “AgentCore Samples: Agent Registry Tutorials” (GitHub): https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/10-Agent-Registry

[8] AWS — “Agent Registry Documentation”: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/registry.html

About the Author

Stefan Christoph is a Principal Solutions Architect at AWS, focused on agentic AI, media & entertainment, and helping builders move from demo to production. He writes about AI architecture, developer productivity, and the future of software.

This is a personal blog. Opinions expressed here are my own and do not represent the views or positions of my employer.

Learn more →

Cross-posted to LinkedIn

❤️ Created with the support of AI (Kiro)