Skip to content
ADP
API Design PrincipleBETA

[ADP-410] Third-Party Service Error Abstraction

Overview

When your API depends on third-party services (cloud providers, external APIs, databases), you MUST abstract internal implementation details and avoid exposing them to API consumers. This ADP defines how to translate third-party failures into appropriate API responses that maintain security and architectural opacity.

Guidance

Error Abstraction Requirements

  • API responses MUST NOT reveal third-party service names, vendors, or implementation details.
  • API responses MUST NOT expose third-party error codes or messages.
  • API responses MUST NOT reveal internal service topology or dependencies.
  • Error responses MUST use generic problem types as defined in ADP-403.
  • Error responses MUST comply with RFC 9457 HTTP Problem Details format per ADP-401.

Status Code Mapping

When translating third-party errors, APIs MUST map them to appropriate HTTP status codes:

503 vs 504: Understanding the Difference

The choice between 503 and 504 depends on your API's architectural role:

504 Gateway Timeout - Use when:

  • Your API acts as a gateway or proxy forwarding requests to the third-party service
  • The third-party service is the primary upstream dependency for fulfilling the request
  • Your API's role is primarily to route, aggregate, or transform responses from the third-party
  • Example: API gateway aggregating data from multiple microservices
  • Example: BFF (Backend for Frontend) forwarding requests to backend APIs

503 Service Unavailable - Use when:

  • The third-party is one of many internal dependencies supporting your service
  • Your API provides its own business logic and the third-party is a supporting component
  • The failure represents a temporary condition affecting your service's availability
  • Example: Payment service where payment gateway is one dependency among many (database, cache, etc.)
  • Example: User service that temporarily cannot send notification emails

Semantic Accuracy

504 explicitly signals "I'm acting as a gateway and upstream failed". 503 signals "I'm temporarily unable to serve you". Choose based on whether forwarding/proxying is your primary role or if the third-party is just a supporting dependency.

Error Mapping Table

  • Third-party service timeout → 503 or 504 (see above), MUST include Retry-After header
  • Third-party service unavailable → 503 Service Unavailable with Retry-After header
  • Third-party rate limit exceeded → 503 Service Unavailable (do not expose that it's a third-party limit)
  • Backend authentication/authorization failed → 500 Internal Server Error (never expose auth details)
  • Invalid backend configuration → 500 Internal Server Error
  • Temporary network error → 503 or 504 (depending on your architectural role)

Problem Type Design

APIs SHOULD use generic problem types that do not expose implementation:

Acceptable problem types:

  • /problems/service-unavailable - Generic unavailability (most abstract)
  • /problems/gateway-timeout - Gateway/proxy timeout
  • /problems/storage-unavailable - Storage capability affected (functional category)
  • /problems/payment-service-unavailable - Payment capability affected
  • /problems/notification-service-unavailable - Notification capability affected

Prohibited problem types (expose implementation):

  • /problems/s3-bucket-error - Exposes AWS S3 vendor
  • /problems/stripe-error - Exposes Stripe vendor
  • /problems/firebase-auth-timeout - Exposes Firebase vendor
  • /problems/sendgrid-unavailable - Exposes SendGrid vendor

Functional Categories

You MAY use functional categories (storage, payment, notification) in problem types to indicate what capability is affected, but you MUST NOT expose specific vendor names. This helps clients understand the degraded functionality without revealing your infrastructure choices.

Examples of proper abstraction:

  • storage-unavailable instead of ❌ s3-error
  • payment-service-timeout instead of ❌ stripe-timeout
  • authentication-unavailable instead of ❌ auth0-error
  • notification-failed instead of ❌ sendgrid-503

Response Headers

  • APIs SHOULD include Retry-After header for temporary failures (503 status).
  • APIs MAY include Cache-Control header when serving stale cached data.
  • APIs MAY include custom headers like X-Cache-Status: STALE to indicate degraded mode.

Resilience Patterns

While the following are backend implementation details, APIs SHOULD implement resilience patterns to improve user experience:

  • Circuit Breaker: Temporarily stop calling failing services to prevent cascade failures. When the circuit is open, return cached data or degraded responses.
  • Timeouts: Set appropriate timeouts to prevent long waits. Return 503 Service Unavailable when timeout is reached.
  • Fallbacks: When possible, return degraded functionality rather than complete failure.

INFO

These patterns are internal implementation details. API consumers SHOULD only see the result through standard HTTP responses—they MUST NOT see circuit breaker state or other internal resilience mechanisms.

Examples

Example: Service Unavailable (503)

http
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60

{
  "type": "/problems/service-unavailable",
  "title": "Service Temporarily Unavailable",
  "status": 503,
  "detail": "The service is temporarily unable to process your request. Please retry after the specified time."
}

Benefits:

  • Generic problem type
  • No implementation details exposed
  • Actionable guidance for the client
  • Includes Retry-After header

Example: Gateway Timeout (504)

When your API acts as a gateway/proxy and the upstream service times out:

http
HTTP/1.1 504 Gateway Timeout
Content-Type: application/problem+json
Retry-After: 30

{
  "type": "/problems/gateway-timeout",
  "title": "Gateway Timeout",
  "status": 504,
  "detail": "The server did not receive a timely response from an upstream service. Please try again.",
  "instance": "/api/resources/123"
}

Benefits:

  • Uses 504 to indicate gateway/proxy role
  • Abstracts "upstream service" (no specific service name)
  • Includes Retry-After for client guidance
  • Proper semantic HTTP status code

Example: Functional Category (Storage Unavailable)

When you want to indicate what capability is affected without exposing which vendor:

http
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60

{
  "type": "/problems/storage-unavailable",
  "title": "Storage Service Unavailable",
  "status": 503,
  "detail": "The storage service is temporarily unavailable. Your request has been saved and will be processed when the service recovers.",
  "instance": "/api/documents/upload"
}

Benefits:

  • Indicates "storage" capability is affected (helps client understand degraded functionality)
  • Does NOT reveal it's AWS S3, Azure Blob, or any specific vendor
  • Provides actionable information without exposing infrastructure
  • Maintains architectural abstraction

Example: Internal Error

http
HTTP/1.1 500 Internal Server Error
Content-Type: application/problem+json

{
  "type": "/problems/internal-error",
  "title": "Internal Server Error",
  "status": 500,
  "detail": "An unexpected error occurred. Please contact support if the problem persists."
}

Logging and Observability

While API responses MUST NOT expose third-party details, backend services MUST log comprehensive information for operations teams:

  • Full third-party error details and stack traces
  • Request IDs for correlation
  • Third-party service names and endpoints
  • Latency and timeout information
  • Circuit breaker state transitions

WARNING

This logging is strictly for internal operations and MUST NEVER be exposed through the API, even in development or debug modes accessible to external users.

References