[ADP-410] Third-Party Service Error Abstraction

Overview

When your API depends on third-party services (cloud providers, external APIs, databases), you MUST abstract internal implementation details and avoid exposing them to API consumers. This ADP defines how to translate third-party failures into appropriate API responses that maintain security and architectural opacity.

Guidance

Error Abstraction Requirements

API responses MUST NOT reveal third-party service names, vendors, or implementation details.
API responses MUST NOT expose third-party error codes or messages.
API responses MUST NOT reveal internal service topology or dependencies.
Error responses MUST use generic problem types as defined in ADP-403.
Error responses MUST comply with RFC 9457 HTTP Problem Details format per ADP-401.

Status Code Mapping

When translating third-party errors, APIs MUST map them to appropriate HTTP status codes:

503 vs 504: Understanding the Difference

The choice between 503 and 504 depends on your API's architectural role:

504 Gateway Timeout - Use when:

Your API acts as a gateway or proxy forwarding requests to the third-party service
The third-party service is the primary upstream dependency for fulfilling the request
Your API's role is primarily to route, aggregate, or transform responses from the third-party
Example: API gateway aggregating data from multiple microservices
Example: BFF (Backend for Frontend) forwarding requests to backend APIs

503 Service Unavailable - Use when:

The third-party is one of many internal dependencies supporting your service
Your API provides its own business logic and the third-party is a supporting component
The failure represents a temporary condition affecting your service's availability
Example: Payment service where payment gateway is one dependency among many (database, cache, etc.)
Example: User service that temporarily cannot send notification emails

Semantic Accuracy

504 explicitly signals "I'm acting as a gateway and upstream failed". 503 signals "I'm temporarily unable to serve you". Choose based on whether forwarding/proxying is your primary role or if the third-party is just a supporting dependency.

Error Mapping Table

Third-party service timeout → 503 or 504 (see above), MUST include Retry-After header
Third-party service unavailable → 503 Service Unavailable with Retry-After header
Third-party rate limit exceeded → 503 Service Unavailable (do not expose that it's a third-party limit)
Backend authentication/authorization failed → 500 Internal Server Error (never expose auth details)
Invalid backend configuration → 500 Internal Server Error
Temporary network error → 503 or 504 (depending on your architectural role)

Problem Type Design

APIs SHOULD use generic problem types that do not expose implementation:

Acceptable problem types:

/problems/service-unavailable - Generic unavailability (most abstract)
/problems/gateway-timeout - Gateway/proxy timeout
/problems/storage-unavailable - Storage capability affected (functional category)
/problems/payment-service-unavailable - Payment capability affected
/problems/notification-service-unavailable - Notification capability affected

Prohibited problem types (expose implementation):

/problems/s3-bucket-error - Exposes AWS S3 vendor
/problems/stripe-error - Exposes Stripe vendor
/problems/firebase-auth-timeout - Exposes Firebase vendor
/problems/sendgrid-unavailable - Exposes SendGrid vendor

Functional Categories

You MAY use functional categories (storage, payment, notification) in problem types to indicate what capability is affected, but you MUST NOT expose specific vendor names. This helps clients understand the degraded functionality without revealing your infrastructure choices.

Examples of proper abstraction:

✅ storage-unavailable instead of ❌ s3-error
✅ payment-service-timeout instead of ❌ stripe-timeout
✅ authentication-unavailable instead of ❌ auth0-error
✅ notification-failed instead of ❌ sendgrid-503

Response Headers

APIs SHOULD include Retry-After header for temporary failures (503 status).
APIs MAY include Cache-Control header when serving stale cached data.
APIs MAY include custom headers like X-Cache-Status: STALE to indicate degraded mode.

Resilience Patterns

While the following are backend implementation details, APIs SHOULD implement resilience patterns to improve user experience:

Circuit Breaker: Temporarily stop calling failing services to prevent cascade failures. When the circuit is open, return cached data or degraded responses.
Timeouts: Set appropriate timeouts to prevent long waits. Return 503 Service Unavailable when timeout is reached.
Fallbacks: When possible, return degraded functionality rather than complete failure.

INFO

These patterns are internal implementation details. API consumers SHOULD only see the result through standard HTTP responses—they MUST NOT see circuit breaker state or other internal resilience mechanisms.

Examples

Example: Service Unavailable (503)

http

HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60

{
  "type": "/problems/service-unavailable",
  "title": "Service Temporarily Unavailable",
  "status": 503,
  "detail": "The service is temporarily unable to process your request. Please retry after the specified time."
}

Benefits:

Generic problem type
No implementation details exposed
Actionable guidance for the client
Includes Retry-After header

Example: Gateway Timeout (504)

When your API acts as a gateway/proxy and the upstream service times out:

http

HTTP/1.1 504 Gateway Timeout
Content-Type: application/problem+json
Retry-After: 30

{
  "type": "/problems/gateway-timeout",
  "title": "Gateway Timeout",
  "status": 504,
  "detail": "The server did not receive a timely response from an upstream service. Please try again.",
  "instance": "/api/resources/123"
}

Benefits:

Uses 504 to indicate gateway/proxy role
Abstracts "upstream service" (no specific service name)
Includes Retry-After for client guidance
Proper semantic HTTP status code

Example: Functional Category (Storage Unavailable)

When you want to indicate what capability is affected without exposing which vendor:

http

HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60

{
  "type": "/problems/storage-unavailable",
  "title": "Storage Service Unavailable",
  "status": 503,
  "detail": "The storage service is temporarily unavailable. Your request has been saved and will be processed when the service recovers.",
  "instance": "/api/documents/upload"
}

Benefits:

Indicates "storage" capability is affected (helps client understand degraded functionality)
Does NOT reveal it's AWS S3, Azure Blob, or any specific vendor
Provides actionable information without exposing infrastructure
Maintains architectural abstraction

Example: Internal Error

http

HTTP/1.1 500 Internal Server Error
Content-Type: application/problem+json

{
  "type": "/problems/internal-error",
  "title": "Internal Server Error",
  "status": 500,
  "detail": "An unexpected error occurred. Please contact support if the problem persists."
}

Logging and Observability

While API responses MUST NOT expose third-party details, backend services MUST log comprehensive information for operations teams:

Full third-party error details and stack traces
Request IDs for correlation
Third-party service names and endpoints
Latency and timeout information
Circuit breaker state transitions

WARNING

This logging is strictly for internal operations and MUST NEVER be exposed through the API, even in development or debug modes accessible to external users.

ADP-401: HTTP Problem Basics - Problem details format
ADP-403: Problem Type Design - Designing problem types
ADP-139: Retry-After - When to retry
ADP-201: HTTP Status 504 - Gateway timeout

[ADP-410] Third-Party Service Error Abstraction ​

Overview ​

Guidance ​

Error Abstraction Requirements ​

Status Code Mapping ​

503 vs 504: Understanding the Difference ​

Error Mapping Table ​

Problem Type Design ​

Response Headers ​

Resilience Patterns ​

Examples ​

Example: Service Unavailable (503) ​

Example: Gateway Timeout (504) ​

Example: Functional Category (Storage Unavailable) ​

Example: Internal Error ​

Logging and Observability ​

Related ADPs ​

References ​

[ADP-410] Third-Party Service Error Abstraction

Overview

Guidance

Error Abstraction Requirements

Status Code Mapping

503 vs 504: Understanding the Difference

Error Mapping Table

Problem Type Design

Response Headers

Resilience Patterns

Examples

Example: Service Unavailable (503)

Example: Gateway Timeout (504)

Example: Functional Category (Storage Unavailable)

Example: Internal Error

Logging and Observability

Related ADPs

References