Amazon Bedrock

Configure Amazon Bedrock as an LLM provider in agentgateway.

Agentgateway accepts OpenAI-formatted requests (such as the /v1/chat/completions request body shape) and returns OpenAI-formatted responses, regardless of the route path that you configure. Agentgateway translates between OpenAI and Bedrock formats internally. Bedrock-native APIs such as the Converse API request and response shapes are not supported. Usage fields in responses follow the OpenAI shape (prompt_tokens, completion_tokens, total_tokens), not the Bedrock shape (inputTokens, outputTokens, totalTokens).

Before you begin

Set up an agentgateway proxy.
Make sure that your Amazon credentials have access to the Bedrock models that you want to use. You can alternatively use an AWS Bedrock API key.
Optional: You can configure AWS IAM Identity Center to allow single sign-on (SSO) credentials to authenticate to AWS Bedrock. Make sure that you have access to AWS Bedrock and set up your AWS profile to use SSO, such as through the aws CLI. Make sure the workload can use that profile (for example with AWS_PROFILE). Later when you create the AgentgatewayBackend, omit policies.auth so the proxy uses implicit AWS SSO credentials.

Set up access to Amazon Bedrock

Store your credentials to access the AWS Bedrock API.

export AWS_ACCESS_KEY_ID="<aws-access-key-id>"
export AWS_SECRET_ACCESS_KEY="<aws-secret-access-key>"
export AWS_SESSION_TOKEN="<aws-session-token>"

Create a secret with your Bedrock API key. Optionally provide the session token.

kubectl create secret generic bedrock-secret \
  -n agentgateway-system \
  --from-literal=accessKey="$AWS_ACCESS_KEY_ID" \
  --from-literal=secretKey="$AWS_SECRET_ACCESS_KEY" \
  --from-literal=sessionToken="$AWS_SESSION_TOKEN" \
  --type=Opaque \
  --dry-run=client -o yaml | kubectl apply -f -

Save the API key in an environment variable.

export BEDROCK_API_KEY=<insert your API key>

Create a Kubernetes secret to store your Amazon Bedrock API key.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: bedrock-secret
  namespace: agentgateway-system
type: Opaque
stringData:
  Authorization: $BEDROCK_API_KEY
EOF

Create an AgentgatewayBackend resource to configure your LLM provider. Make sure to reference the secret that holds your credentials to access the LLM.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: bedrock
  namespace: agentgateway-system
spec:
  ai:
    provider:
      bedrock:
        model: "amazon.nova-micro-v1:0"
        region: "us-east-1"
  policies:
    auth:
      aws:
        secretRef:
          name: bedrock-secret
EOF

Review the following table to understand this configuration. For more information, see the API reference.

Setting	Description
`ai.provider.bedrock`	Define the LLM provider that you want to use. The example uses Amazon Bedrock.
`bedrock.model`	The model to use to generate responses. In this example, you use the `amazon.nova-micro-v1:0` model. Keep in mind that some models support cross-region inference. These models begin with a `us.` prefix, such as `us.anthropic.claude-sonnet-4-20250514-v1:0`. For more models, see the AWS Bedrock docs.
`bedrock.region`	The AWS region where your Bedrock model is deployed. Multiple regions are not supported.
`policies.auth`	Provide the credentials to use to access the Amazon Bedrock API. The example refers to the secret that you previously created. To use implicit credentials from the workload or environment instead (for example IRSA and AWS IAM Identity Center (SSO) profiles), omit the `auth` settings.

Create an HTTPRoute resource to route requests through your agentgateway proxy to the Bedrock AgentgatewayBackend.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: bedrock
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-proxy
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/chat/completions
    backendRefs:
    - name: bedrock
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: bedrock
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-proxy
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /bedrock
    backendRefs:
    - name: bedrock
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Send a request to the LLM provider API along the route that you previously created, such as /bedrock or /v1/chat/completions depending on your route configuration. The request body must be in OpenAI chat-completions format. Verify that the request succeeds and that you get back a response from the chat completion API.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json -d '{
    "model": "",
    "messages": [
      {
        "role": "user",
        "content": "You are a cloud native solutions architect, skilled in explaining complex technical concepts such as API Gateway, microservices, LLM operations, kubernetes, and advanced networking patterns. Write me a 20-word pitch on why I should use an AI gateway in my Kubernetes cluster."
      }
    ]
  }' | jq

Localhost:

curl "localhost:8080/v1/chat/completions" -H content-type:application/json -d '{
    "model": "",
    "messages": [
      {
        "role": "user",
        "content": "You are a cloud native solutions architect, skilled in explaining complex technical concepts such as API Gateway, microservices, LLM operations, kubernetes, and advanced networking patterns. Write me a 20-word pitch on why I should use an AI gateway in my Kubernetes cluster."
      }
    ]
  }' | jq

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/bedrock" -H content-type:application/json -d '{
    "model": "",
    "messages": [
      {
        "role": "user",
        "content": "You are a cloud native solutions architect, skilled in explaining complex technical concepts such as API Gateway, microservices, LLM operations, kubernetes, and advanced networking patterns. Write me a 20-word pitch on why I should use an AI gateway in my Kubernetes cluster."
      }
    ]
  }' | jq

Localhost:

curl "localhost:8080/bedrock" -H content-type:application/json -d '{
    "model": "",
    "messages": [
      {
        "role": "user",
        "content": "You are a cloud native solutions architect, skilled in explaining complex technical concepts such as API Gateway, microservices, LLM operations, kubernetes, and advanced networking patterns. Write me a 20-word pitch on why I should use an AI gateway in my Kubernetes cluster."
      }
    ]
  }' | jq

Example output. Note that agentgateway returns OpenAI-shaped responses, including OpenAI-style usage fields (prompt_tokens, completion_tokens, total_tokens), even though the upstream provider is Bedrock.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1730000000,
  "model": "amazon.nova-micro-v1:0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An AI gateway in your Kubernetes cluster can enhance performance, scalability, and security while simplifying complex operations. It provides a centralized entry point for AI workloads, automates deployment and management, and ensures high availability."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 60,
    "completion_tokens": 47,
    "total_tokens": 107
  }
}

Prompt caching

Prompt Caching is a performance, cost-optimization, and cost-reduction feature that allows the model to “remember” frequently used parts of your prompt, including long system instructions, reference documents, or tool definitions. This way, the model does not need to reprocess these parts every time you send a new prompt.

For example, let’s assume you have a 50-page manual and you want to ask your model different questions about the manual. Instead of re-reading the manual for each question, the model can read it once and save it in its internal cache. Then, the model can answer subsequent questions more quickly and more cost efficient.

Prompt caching is configured by using the backend.ai.promptCaching fields in the AgentgatewayPolicy resource.

Prompt caching is supported for Bedrock Claude 3+ and Nova models.

Create an AgentgatewayPolicy resource with your prompt cache settings. The following example enables caching for system prompts and conversation messages, but disables it for tool definitions. Bedrock requires you to set the minimum token count after which caching is enabled. By default, a minimum of 1024 tokens are required by Bedrock for caching to be effective. This is also referred to as a caching checkpoint. For more information, see the API reference.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: bedrock-caching-policy
  namespace: agentgateway-system
spec:
  targetRefs:
    - group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: bedrock
  backend:
    ai:
      promptCaching:
        cacheSystem: true
        cacheMessages: true
        cacheTools: false
        minTokens: 1024
EOF

Port-forward the agentgateway proxy on port 15000.

kubectl port-forward deploy/agentgateway-proxy -n agentgateway-system 15000

Get the caching configuration and verify that you see the cache settings.

curl -s http://localhost:15000/config_dump | jq '.policies[] |                                    
 select(.name.name == "bedrock-caching-policy" and 
      .policy.backend.aI.promptCaching != null)'

Example output:

{
   "key": "backend/agentgateway-system/bedrock-caching-policy:ai:agentgateway-system/bedrock",
   "name": {
     "kind": "AgentgatewayPolicy",
     "name": "bedrock-caching-policy",
     "namespace": "agentgateway-system"
   },
   "target": {
     "route": {
       "name": "bedrock",
       "namespace": "agentgateway-system",
       "kind": "HTTPRoute"
     }
   },
   "policy": {
     "backend": {
       "aI": {
         "defaults": {},
         "overrides": {},
         "promptCaching": {
           "cacheSystem": true,
           "cacheMessages": true,
           "cacheTools": false,
           "minTokens": 1024
         }
       }
     }
   }
}

Extended thinking and reasoning

Extended thinking and reasoning lets models reason through complex problems before generating a response. You can opt in to extended thinking and reasoning by adding the OpenAI reasoning_effort field to your request. Agentgateway translates this to Bedrock’s native thinking budget automatically.

Note: Extended thinking and reasoning requires a Claude model that supports it, such as us.anthropic.claude-opus-4-20250514-v1:0.

Use the reasoning_effort field to control how much reasoning the model applies. The value is automatically mapped to a thinking budget.

`reasoning_effort` value	Thinking budget
`minimal` or `low`	1,024 tokens
`medium`	2,048 tokens
`high` or `xhigh`	4,096 tokens

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 6000,
  "reasoning_effort": "high",
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 6000,
  "reasoning_effort": "high",
  "messages": [
    {
      "role": "user",
      "content": "Explain the trade-offs between consistency and availability in distributed systems."
    }
  ]
}' | jq

Structured outputs

Structured outputs constrain the model to respond with a specific JSON schema. Provide the schema definition in the OpenAI response_format field of your request. Agentgateway translates this to Bedrock’s native format automatically.

Cloud Provider LoadBalancer:

curl "$INGRESS_GW_ADDRESS/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Localhost:

curl "localhost:8080/v1/chat/completions" -H content-type:application/json -d '{
  "model": "",
  "max_tokens": 256,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "answer_schema",
      "schema": {
        "type": "object",
        "properties": {
          "answer": { "type": "string" },
          "confidence": { "type": "number" }
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [
    {
      "role": "user",
      "content": "Is the sky blue? Respond with your answer and a confidence score between 0 and 1."
    }
  ]
}' | jq

Next steps

Multiple endpointsSet up other API endpoints such as embeddings or models. Prompt guardsSet up prompt guards for your LLM traffic. LLM observabilityView metrics and logs for LLM traffic.

Anthropic

Was this page helpful?

Amazon Bedrock

Before you begin

Set up access to Amazon Bedrock

Prompt caching

Extended thinking and reasoning

Structured outputs

Next steps

What could be improved?