[ADP-43] API Rate Limiting

Guidance

It SHOULD apply rate limiting to external API audience endpoints where audience>=PUBLIC_EXTERNAL.
INFO
For information on API audiences, refer to ADP-752.
It SHOULD add x-rate-limit to the endpoints with rate limiting.
Rate limiting SHOULD be implemented based on users, API keys, or IP addresses.

If the rate limit is exceeded, it SHOULD return the following error message aligned with HTTP Problem json:

http

POST /search HTTP/1.1

HTTP/1.1 429 Too Many Requests
Retry-After: 60

{
  "type": "/problems/rate-limit",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "You have exceeded the rate limit of 1000 requests per minute.",
  "instance": "/search",
  "retryAfter": "PT1M"
}

The rate limiting rules SHOULD be documented in the API documentation.

Implementation Recommendations

Consider using token bucket or leaky bucket algorithms to implement rate limiting.
Consider setting different rate limits for different endpoints or HTTP methods.
Tiered rate limiting can be provided for different customer plans.
Regularly monitor rate limiting usage and adjust limits as necessary.
Implement an alert system to notify when users are approaching or exceeding rate limits.

Rate Limiting Algorithm Explanation

Token Bucket Algorithm (Token Bucket): This algorithm allows for burst traffic by controlling traffic through tokens stored in a bucket. Each time a request arrives, the system checks if there are tokens in the bucket. If there are, the request is processed, and a token is removed from the bucket; if not, the request is rejected or delayed.
Leaky Bucket Algorithm (Leaky Bucket): This algorithm processes requests at a fixed rate, regardless of the arrival rate of requests. Requests are put into a bucket, and the bucket "leaks" requests at a fixed rate. If the bucket is full, new requests are rejected.
Sliding Window Algorithm (Sliding Window): This algorithm limits the number of requests based on a time window. It dynamically adjusts the rate limit based on the number of recent requests, allowing for changes in the number of requests within a specific time period.
Counter Algorithm (Counter): This algorithm simply counts the number of requests within a specific time period. When the number of requests exceeds the set limit, subsequent requests are rejected.
Queries Per Second (QPS): This is a metric for measuring a system's ability to process requests. By setting a QPS limit, you can control the maximum number of requests a system can handle per second, thereby preventing overload.
Circuit Breaker: This is a strategy to prevent system overload, which temporarily stops processing certain requests when the system detects an error rate exceeding a set threshold, protecting the system from further damage. The circuit breaker automatically recovers after a period of time, allowing the system to reprocess requests.

Each of these algorithms and strategies has its advantages and disadvantages, and the appropriate rate limiting and protection measures should be chosen based on specific needs.

ADP-753: Describes how to add rate limiting related headers in API responses.
ADP-138: Describes rate limiting related headers.
ADP-401: Error handling.

[ADP-43] API Rate Limiting ​

Guidance ​

Implementation Recommendations ​

Rate Limiting Algorithm Explanation ​

Related ADPs ​

References ​

[ADP-43] API Rate Limiting

Guidance

Implementation Recommendations

Rate Limiting Algorithm Explanation

Related ADPs

References