Cloud Architectural Design Patterns-Part1

With the rapid adoption of cloud computing across industries, businesses are increasingly leveraging cloud platforms to host and scale their applications. However, ensuring these applications are robust, scalable, and reliable in the cloud require us to follow certain best practices.

Cloud design patterns are solutions to common architectural challenges encountered when deploying applications or while running on the cloud platforms. These patterns help in optimizing the performance, reliability, and scalability of cloud-based applications.

Here are some key cloud design patterns :

1. Retry Pattern

The retry pattern is a design pattern used to ensure the application doesn’t throw error immediately due to transient failures(i.e. due to network timeouts, service unavailability) which may happen momentarily.

Key points in implementing Retry Pattern are:

  • Identify Transient Errors: Identify when an operation has failed and whether the failure is transient. This often involves checking for specific exceptions or error codes. We don’t want to retry if the issue is due to actual application exception as most likely if you retry the issue will not get resolved. For e.g. while making API calls you would want to retry if response codes are one of the following :408 408 Request Timeout,429 Too Many Requests,502 Bad Gateway,503 Service Unavailable,504 Gateway Timeout etc.
  • Retry logic:
    • Logic to check for transient exceptions and retry (using loop) if you encounter transient errors .
    • Before trying again , wait(delay) for certain configured time. Delay period can be fixed, incremental, or exponential.
    • Retry until maximum retry count(configuration value) or you receive success response.

Refer below example for simple implementation of the Retry logic.

public async Task<string> FetchDataWithRetryAsync(string url, int maxRetries = 3, int delay = 1000)
        for (int attempt = 0; attempt < maxRetries; attempt++)
                var response = await _httpClient.GetAsync(url);
                return await response.Content.ReadAsStringAsync();
            catch (HttpRequestException ex) when (attempt < maxRetries - 1)
                Console.WriteLine($"Attempt {attempt + 1} failed: {ex.Message}");

                // Wait before the next retry
                await Task.Delay(delay);
                delay *= 2; // Optionally increase the delay for exponential backoff

There is also a popular library called Polly which will abstract these details and will help you quickly configure the Retry pattern.

Most of the time , during interviews if you mention that you implemented retry using Polly library that will be enough. The code could be as simple as the following. There are multiple configuration in the polly which will help you retry based on

RetryPolicy retry = Policy



OR by calculation:

  .WaitAndRetry(3, retryAttempt =>
    TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))

2. Circuit Breaker Pattern

The Circuit Breaker Pattern like Retry Pattern and are strategies for handling transient failures in distributed systems, however, circuit breaker pattern applies little brain over Retry pattern and pauses reattempts for some time ,if it thinks that further retries are likely to give same failure result.

As the name suggest Circuit breaker breaks the circuit (theoretically) and stops retries for some time unlike brain faded Retry Pattern who keeps on trying until it exhaust it retries. But remember for most cases Retry patterns are enough as in most scenarios you would retry for 3-5 times only.

Theoretically Circuit Breaker has 3 states:

  • Closed: If the state is closed means you can continue to retry if you encounter failure.
  • Open: After a certain number of failures, the circuit breaker transitions to the open state, temporarily blocking all requests. During this state, any requests made will directly throw error without trying to call the method(or API call) you want to call during retry.
  • Half-Open: After a specified timeout period, the circuit breaker enters this state. It allows a limited number of requests to pass through to test if the underlying service has recovered. If successful, the circuit breaker transitions back to closed; if it fails, it returns to open.

Here is the simple example using the Polly’s Circuit Breaker way.

CircuitBreakerPolicy breaker = Policy
    exceptionsAllowedBeforeBreaking: 2,
    durationOfBreak: TimeSpan.FromMinutes(1)

Applications of Circuit Breaker

  • You can use if you are mindful of making unnecessary calls to the service. Wrapping call with circuit breaker will restrict the call if the service is down.
  • Useful when service you are trying to call can be down for prolonged period.
  • Useful in case you are using webjobs etc which is trying to call the service which can be down for longer period.

3. Bulkhead Pattern

This is one of the common architectural pattern that helps to prevent cascading failures in system when one system fails and helps to improve resiliency.

In ships and submarines, a bulkhead is a partition or wall that separates different compartments, so that any leakage/damage in one compartment can be restricted to the same compartment to prevent entire vessel from flooding.

Bulkhead pattern involves partitioning resources, such as threads, connections, or memory, into separate pools or compartments (like bulkheads on a ship), thereby limiting the impact of failures in one part of the system on other parts.

Key concepts in this pattern are :

Isolation: Isolate different parts of system that can fail so that failure in one part doesn’t affect other.

Resource Pooling :Resources such as threads, database connections, or network connections are grouped into separate pools or bulkheads.

Fault Tolerance : By limiting the number of resources available to each pool, the bulkhead pattern reduces the risk of one component consuming all resources and causing a system-wide failure.

Use Cases:

Microservices Architecture: Used to isolate different microservices from each other to prevent failures in one microservice from affecting others.

Concurrency Management: Helps manage concurrent access to shared resources such as database connections or thread pools.

E.g scenario Let’s assume Service A calls Service B and Service C: If Service B experiences a sudden surge in traffic or failures, the Bulkhead pattern ensures that Service C’s availability and performance are not affected. You can configure separate HttpClient instances for each service and use Bulkhead to limit the impact of failures.

Another scenario could be when you have critical operations that require low-latency connections and less critical operations that can tolerate higher latencies. Using separate HttpClient instances with different configuration settings (like timeouts and connection limits) helps enforce these priorities.

4. Ambassador Pattern

The Ambassador pattern is a software design pattern used in distributed systems to delegate specific tasks from clients to a dedicated service (known as the Ambassador) that manages and optimizes these tasks on behalf of the clients. This pattern helps in improving resilience, scalability, and performance by centralizing and abstracting certain responsibilities away from the clients.

Doesn’t this sound like Gang of Four’s (GoF) Adapter or Proxy pattern, but don’t confuse it with them , Ambassador pattern is a architectural pattern and is at service level ,however, GoF’s pattern are at the class/code level.

E.g usage could be – consider an application where multiple microservices need to access a legacy system’s API over unreliable networks. Instead of each microservice implementing its own retry logic, error handling, and authentication mechanisms, they delegate these tasks to an Ambassador service.

Leave a Reply

Your email address will not be published. Required fields are marked *