[ADP-410] Third-Party Service Error Abstraction
Overview
When your API depends on third-party services (cloud providers, external APIs, databases), you MUST abstract internal implementation details and avoid exposing them to API consumers. This ADP defines how to translate third-party failures into appropriate API responses that maintain security and architectural opacity.
Guidance
Error Abstraction Requirements
- API responses MUST NOT reveal third-party service names, vendors, or implementation details.
- API responses MUST NOT expose third-party error codes or messages.
- API responses MUST NOT reveal internal service topology or dependencies.
- Error responses MUST use generic problem types as defined in ADP-403.
- Error responses MUST comply with RFC 9457 HTTP Problem Details format per ADP-401.
Status Code Mapping
When translating third-party errors, APIs MUST map them to appropriate HTTP status codes:
503 vs 504: Understanding the Difference
The choice between 503 and 504 depends on your API's architectural role:
504 Gateway Timeout - Use when:
- Your API acts as a gateway or proxy forwarding requests to the third-party service
- The third-party service is the primary upstream dependency for fulfilling the request
- Your API's role is primarily to route, aggregate, or transform responses from the third-party
- Example: API gateway aggregating data from multiple microservices
- Example: BFF (Backend for Frontend) forwarding requests to backend APIs
503 Service Unavailable - Use when:
- The third-party is one of many internal dependencies supporting your service
- Your API provides its own business logic and the third-party is a supporting component
- The failure represents a temporary condition affecting your service's availability
- Example: Payment service where payment gateway is one dependency among many (database, cache, etc.)
- Example: User service that temporarily cannot send notification emails
Semantic Accuracy
504 explicitly signals "I'm acting as a gateway and upstream failed". 503 signals "I'm temporarily unable to serve you". Choose based on whether forwarding/proxying is your primary role or if the third-party is just a supporting dependency.
Error Mapping Table
- Third-party service timeout → 503 or 504 (see above), MUST include
Retry-After
header - Third-party service unavailable → 503 Service Unavailable with
Retry-After
header - Third-party rate limit exceeded → 503 Service Unavailable (do not expose that it's a third-party limit)
- Backend authentication/authorization failed → 500 Internal Server Error (never expose auth details)
- Invalid backend configuration → 500 Internal Server Error
- Temporary network error → 503 or 504 (depending on your architectural role)
Problem Type Design
APIs SHOULD use generic problem types that do not expose implementation:
Acceptable problem types:
/problems/service-unavailable
- Generic unavailability (most abstract)/problems/gateway-timeout
- Gateway/proxy timeout/problems/storage-unavailable
- Storage capability affected (functional category)/problems/payment-service-unavailable
- Payment capability affected/problems/notification-service-unavailable
- Notification capability affected
Prohibited problem types (expose implementation):
/problems/s3-bucket-error
- Exposes AWS S3 vendor/problems/stripe-error
- Exposes Stripe vendor/problems/firebase-auth-timeout
- Exposes Firebase vendor/problems/sendgrid-unavailable
- Exposes SendGrid vendor
Functional Categories
You MAY use functional categories (storage, payment, notification) in problem types to indicate what capability is affected, but you MUST NOT expose specific vendor names. This helps clients understand the degraded functionality without revealing your infrastructure choices.
Examples of proper abstraction:
- ✅
storage-unavailable
instead of ❌s3-error
- ✅
payment-service-timeout
instead of ❌stripe-timeout
- ✅
authentication-unavailable
instead of ❌auth0-error
- ✅
notification-failed
instead of ❌sendgrid-503
Response Headers
- APIs SHOULD include
Retry-After
header for temporary failures (503 status). - APIs MAY include
Cache-Control
header when serving stale cached data. - APIs MAY include custom headers like
X-Cache-Status: STALE
to indicate degraded mode.
Resilience Patterns
While the following are backend implementation details, APIs SHOULD implement resilience patterns to improve user experience:
- Circuit Breaker: Temporarily stop calling failing services to prevent cascade failures. When the circuit is open, return cached data or degraded responses.
- Timeouts: Set appropriate timeouts to prevent long waits. Return 503 Service Unavailable when timeout is reached.
- Fallbacks: When possible, return degraded functionality rather than complete failure.
INFO
These patterns are internal implementation details. API consumers SHOULD only see the result through standard HTTP responses—they MUST NOT see circuit breaker state or other internal resilience mechanisms.
Examples
Example: Service Unavailable (503)
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60
{
"type": "/problems/service-unavailable",
"title": "Service Temporarily Unavailable",
"status": 503,
"detail": "The service is temporarily unable to process your request. Please retry after the specified time."
}
Benefits:
- Generic problem type
- No implementation details exposed
- Actionable guidance for the client
- Includes
Retry-After
header
Example: Gateway Timeout (504)
When your API acts as a gateway/proxy and the upstream service times out:
HTTP/1.1 504 Gateway Timeout
Content-Type: application/problem+json
Retry-After: 30
{
"type": "/problems/gateway-timeout",
"title": "Gateway Timeout",
"status": 504,
"detail": "The server did not receive a timely response from an upstream service. Please try again.",
"instance": "/api/resources/123"
}
Benefits:
- Uses 504 to indicate gateway/proxy role
- Abstracts "upstream service" (no specific service name)
- Includes
Retry-After
for client guidance - Proper semantic HTTP status code
Example: Functional Category (Storage Unavailable)
When you want to indicate what capability is affected without exposing which vendor:
HTTP/1.1 503 Service Unavailable
Content-Type: application/problem+json
Retry-After: 60
{
"type": "/problems/storage-unavailable",
"title": "Storage Service Unavailable",
"status": 503,
"detail": "The storage service is temporarily unavailable. Your request has been saved and will be processed when the service recovers.",
"instance": "/api/documents/upload"
}
Benefits:
- Indicates "storage" capability is affected (helps client understand degraded functionality)
- Does NOT reveal it's AWS S3, Azure Blob, or any specific vendor
- Provides actionable information without exposing infrastructure
- Maintains architectural abstraction
Example: Internal Error
HTTP/1.1 500 Internal Server Error
Content-Type: application/problem+json
{
"type": "/problems/internal-error",
"title": "Internal Server Error",
"status": 500,
"detail": "An unexpected error occurred. Please contact support if the problem persists."
}
Logging and Observability
While API responses MUST NOT expose third-party details, backend services MUST log comprehensive information for operations teams:
- Full third-party error details and stack traces
- Request IDs for correlation
- Third-party service names and endpoints
- Latency and timeout information
- Circuit breaker state transitions
WARNING
This logging is strictly for internal operations and MUST NEVER be exposed through the API, even in development or debug modes accessible to external users.
Related ADPs
- ADP-401: HTTP Problem Basics - Problem details format
- ADP-403: Problem Type Design - Designing problem types
- ADP-139: Retry-After - When to retry
- ADP-201: HTTP Status 504 - Gateway timeout