[EN] SAGA in der Praxis (Teil 3): State Management und Kompensation per Orchestrator

1. Get out of the event chaos

Welcome to Part 3 of our series on the SAGA Pattern. In Part 2 we implemented a choreography-based saga. The services communicated loosely coupled via events, which appeared elegant and decentralized.

But let’s be honest: When implementing the error case, we quickly noticed how confusing this approach could become. The business logic was distributed across all services. In order to compensate for an error in the Payment Service, the Order Service suddenly had to listen for a payment.failed event. What happens if 5 or 6 services are involved? The number of event connections explodes and monitoring the entire process becomes a nightmare.

This is exactly where the orchestration alternative comes in.

Instead of a group of dancers reacting to each other (choreography), we introduce a central conductor: the SAGA Orchestrator. This orchestrator is a separate service that knows the entire business process. He alone knows that the order (step 1) is followed by payment (step 2) and then the inventory check (step 3).

The individual microservices (Order, Payment, Inventory) become “dumb”. You don’t know the saga. They become pure command receivers, just waiting for instructions from the orchestrator, executing their atomic local transaction, and then reporting back “success” or “failure”.

In this article we are building exactly that: a central Spring Boot Orchestrator that controls the individual services via RabbitMQ commands and manages the state of the saga in its own database.

2. Architecture – A new boss in the ring

The core of the orchestration architecture is the new SagaOrchestrator service. This service has two main tasks:

State Management: It manages the state of each running saga (e.g. OrderID-123 is currently at STEP_PAYMENT).
Command & Control: It sends commands to the services and waits for their response events (replies).

The other services (Order, Payment, Inventory) are adjusted: They no longer listen to business events of other services (like OrderCreated), but only to direct commands that are addressed to them (e.g. CreateOrderCommand).

📡 The communication flow (Command/Reply)

We continue to use RabbitMQ, but the pattern fundamentally changes. Instead of a “Publish/Subscribe” model (events), we use a “Command/Reply” model.

The orchestrator sends a Command (e.g. ProcessPaymentCommand) to a dedicated queue (e.g. payment_command_queue).
The Payment Service listens on this queue.
After processing, the Payment Service sends a Reply event (e.g. PaymentSuccessful or PaymentFailed) back to a queue that only the Orchestrator listens on (e.g. orchestrator_reply_queue).

The “Happy Path” now looks like this:

Client -> SagaOrchestrator (REST API): A customer triggers the order. The orchestrator creates a new saga entry in its DB (status STARTED).
Orchestrator -> (Command): Sends CreateOrderCommand to the order_command_queue.
Order Service: Receives command, creates order (status PENDING), sends OrderCreatedReply.
Orchestrator: Receives OrderCreatedReply, updates saga status (e.g. to AWAITING_PAYMENT), sends ProcessPaymentCommand to payment_command_queue.
Payment Service: Receives Command, processes payment, sends PaymentSuccessfulReply.
Orchestrator: Receives PaymentSuccessfulReply, updates Saga status, sends UpdateInventoryCommand to inventory_command_queue.
Inventory Service: Receives command, books goods, sends InventorySuccessfulReply.
Orchestrator: Receives InventorySuccessfulReply, marks Saga in its DB as COMPLETED.

The crucial difference: No service speaks to any other service. All communicate exclusively with the Orchestrator. The Payment Service knows nothing about an Order Service. It only knows how to process a ProcessPaymentCommand.

3. Implementation – State Management & Commands

The orchestrator must know at all times what state each individual saga is in. A separate database table is essential for this. At the same time, he must control the command/reply communication via RabbitMQ.

🗄️ Das State Management

In the SagaOrchestrator service we define an entity that represents the state of a saga. Here we use JPA and any relational database (e.g. PostgreSQL).

// Im SagaOrchestrator Service

@Entity
public class SagaInstance {

    @Id
    private UUID sagaId; // Eindeutige ID für diesen Prozessfluss

    private UUID orderId; // Die ID des zugehörigen Business-Objekts

    @Enumerated(EnumType.STRING)
    private SagaStatus status; // z.B. STARTED, AWAITING_PAYMENT, AWAITING_INVENTORY, COMPLETED, FAILED

    private String currentStep; // z.B. "payment"
    
    // ... Konstruktoren, Getter, Setter ...
}

public enum SagaStatus {
    STARTED,
    AWAITING_PAYMENT,
    AWAITING_INVENTORY,
    COMPENSATING_ORDER,
    COMPENSATING_PAYMENT,
    COMPLETED,
    FAILED
}

// Dazu ein Spring Data JPA Repository
public interface SagaInstanceRepository extends JpaRepository<SagaInstance, UUID> {
}

When the client starts the saga via the orchestrator’s REST API, a new SagaInstance is created, saved in the DB (status STARTED) and the first command is triggered.

📬 The Command/Reply communication

We define clear queues for each service and a central reply queue for the orchestrator.

Commands: The orchestrator sends commands to order_command_queue, payment_command_queue, etc.
Replies: All services (Order, Payment, Inventory) send their replies (success or failure) to the same orchestrator_reply_queue.

Example: The orchestrator starts the saga

The orchestrator receives the initial API call, saves the state and sends the first command.

// Im SagaOrchestrator Service
@Autowired
private RabbitTemplate rabbitTemplate;
@Autowired
private SagaInstanceRepository sagaRepository;

public void startSaga(InitialOrderDto orderDto) {
    // 1. Saga-Instanz erstellen und speichern
    SagaInstance saga = new SagaInstance();
    saga.setSagaId(UUID.randomUUID());
    saga.setOrderId(orderDto.getOrderId()); // Annahme: OrderID wird hier generiert oder übergeben
    saga.setStatus(SagaStatus.STARTED);
    saga.setCurrentStep("order");
    sagaRepository.save(saga);

    // 2. Ersten Command senden
    CreateOrderCommand command = new CreateOrderCommand(saga.getSagaId(), orderDto);
    
    // Wir senden an die spezifische Command-Queue
    rabbitTemplate.convertAndSend("order_command_queue", command);
}

Example: The Order Service responds

The Order Service is now just a “dumb” worker. It listens to its command queue and sends a response to the orchestrator’s reply queue.

// Im Order Service
@Autowired
private RabbitTemplate rabbitTemplate;

@RabbitListener(queues = "order_command_queue")
public void handleCreateOrder(CreateOrderCommand command) {
    try {
        // ... Logik zum Speichern der Bestellung ...
        // Bestellung wird lokal mit Status PENDING gespeichert

        // 3. Erfolgs-Reply an den Orchestrator senden
        OrderCreatedReply reply = new OrderCreatedReply(command.getSagaId(), /* ... */);
        rabbitTemplate.convertAndSend("orchestrator_reply_queue", reply);

    } catch (Exception e) {
        // 4. Fehler-Reply an den Orchestrator senden
        OrderFailedReply reply = new OrderFailedReply(command.getSagaId(), e.getMessage());
        rabbitTemplate.convertAndSend("orchestrator_reply_queue", reply);
    }
}

Example: The orchestrator receives replies

The orchestrator has a central listener for all responses. This is where the actual “orchestration” takes place – the next step is decided based on the answer.

// Im SagaOrchestrator Service

@RabbitListener(queues = "orchestrator_reply_queue")
public void handleReplies(Object replyMessage) {
    
    SagaInstance saga; // Wird aus der DB geladen

    if (replyMessage instanceof OrderCreatedReply reply) {
        saga = sagaRepository.findById(reply.getSagaId()).orElse(null);
        if (saga == null || saga.getStatus() != SagaStatus.STARTED) return; // Idempotenz

        // 5. State aktualisieren und nächsten Command senden
        saga.setStatus(SagaStatus.AWAITING_PAYMENT);
        saga.setCurrentStep("payment");
        sagaRepository.save(saga);

        ProcessPaymentCommand command = new ProcessPaymentCommand(saga.getSagaId(), /* ... */);
        rabbitTemplate.convertAndSend("payment_command_queue", command);
    
    } else if (replyMessage instanceof PaymentSuccessfulReply reply) {
        saga = sagaRepository.findById(reply.getSagaId()).orElse(null);
        if (saga == null || saga.getStatus() != SagaStatus.AWAITING_PAYMENT) return;

        // 6. Nächster Schritt: Inventory
        saga.setStatus(SagaStatus.AWAITING_INVENTORY);
        saga.setCurrentStep("inventory");
        sagaRepository.save(saga);
        
        UpdateInventoryCommand command = new UpdateInventoryCommand(saga.getSagaId(), /* ... */);
        rabbitTemplate.convertAndSend("inventory_command_queue", command);

    } else if (replyMessage instanceof InventorySuccessfulReply reply) {
        saga = sagaRepository.findById(reply.getSagaId()).orElse(null);
        if (saga == null) return;
        
        // 7. SAGA ERFOLGREICH BEENDET
        saga.setStatus(SagaStatus.COMPLETED);
        saga.setCurrentStep(null);
        sagaRepository.save(saga);
        System.out.println("Saga " + saga.getSagaId() + " COMPLETED.");

    } else if (replyMessage instanceof PaymentFailedReply reply) {
        // ... Fehlerbehandlung (kommt im nächsten Abschnitt) ...
    }
    // ... weitere Reply-Typen ...
}

This is the basic framework. We have a clear separation: the orchestrator manages the state and controls the flow, the services only perform atomic operations.

4. The “Unhappy Path” (centrally controlled)

In our choreography example (part 2), the error case was complicated: the Order Service had to listen for a payment.failed event and “know” that it had to compensate itself.

In the orchestration model, this is drastically easier. The responsibility lies solely with the orchestrator.

Scenario: Payment Fails

The Payment Service tries to execute the payment for the ProcessPaymentCommand but it fails (e.g. “Insufficient funds”).
The Payment Service does nothing other than “report” to the Orchestrator that it has failed. It sends a PaymentFailedReply to the orchestrator_reply_queue.

The central compensation in the Orchestrator

Now let’s look at the missing else if block in our Orchestrator listener:

// Im SagaOrchestrator Service

@RabbitListener(queues = "orchestrator_reply_queue")
public void handleReplies(Object replyMessage) {
    
    SagaInstance saga;
    
    // ... (All die "Happy Path" Blöcke von oben) ...

    // HIER IST DER UNHAPPY PATH:
    else if (replyMessage instanceof PaymentFailedReply reply) {
        
        saga = sagaRepository.findById(reply.getSagaId()).orElse(null);
        if (saga == null || saga.getStatus() != SagaStatus.AWAITING_PAYMENT) {
            // Bereits kompensiert oder veraltete Nachricht, ignorieren (Idempotenz)
            return; 
        }

        // 1. FEHLER ERKANNT: State auf Kompensation setzen
        System.out.println("Saga " + saga.getSagaId() + " failed at Payment. Initiating compensation...");
        saga.setStatus(SagaStatus.COMPENSATING_ORDER);
        saga.setCurrentStep("compensating_order");
        sagaRepository.save(saga);

        // 2. Expliziten Kompensations-Command senden
        // Wir befehlen dem Order Service, seine Aktion rückgängig zu machen.
        CancelOrderCommand command = new CancelOrderCommand(saga.getSagaId(), saga.getOrderId());
        rabbitTemplate.convertAndSend("order_compensation_queue", command); // Eigene Queue für Kompensation
    
    } else if (replyMessage instanceof OrderCancelledReply reply) { // Warten auf Bestätigung
        
        saga = sagaRepository.findById(reply.getSagaId()).orElse(null);
        if (saga == null || saga.getStatus() != SagaStatus.COMPENSATING_ORDER) return;

        // 3. Kompensation war erfolgreich, Saga final als FAILED markieren
        saga.setStatus(SagaStatus.FAILED);
        saga.setCurrentStep(null);
        sagaRepository.save(saga);
        System.out.println("Saga " + saga.getSagaId() + " FAILED and fully compensated.");
    }
}

What the Order Service needs to do

The ‘Order Service’ now has to listen for this new compensation command. He’s still “stupid” - he just does what he’s told.

// Im Order Service

@RabbitListener(queues = "order_compensation_queue")
public void handleCancelOrder(CancelOrderCommand command) {
    try {
        Order order = orderRepository.findById(command.getOrderId()).orElse(null);
        if (order != null && order.getStatus() == OrderStatus.PENDING) {
            
            // Die eigentliche kompensierende Transaktion
            order.setStatus(OrderStatus.CANCELLED);
            orderRepository.save(order);
        }
        
        // Erfolg der Kompensation zurückmelden
        OrderCancelledReply reply = new OrderCancelledReply(command.getSagaId());
        rabbitTemplate.convertAndSend("orchestrator_reply_queue", reply);

    } catch (Exception e) {
        // HILFE! Die Kompensation schlägt fehl.
        // Das ist der "Worst Case". Hier braucht es robuste Retry-Mechanismen
        // oder eine "Dead Letter Queue" für manuelle Eingriffe.
        // Fürs Erste senden wir einfach eine Fehler-Reply.
        OrderCompensationFailedReply reply = new OrderCompensationFailedReply(command.getSagaId(), e.getMessage());
        rabbitTemplate.convertAndSend("orchestrator_reply_queue", reply);
    }
}

We see the difference immediately: The entire logic – “If payment fails, THEN cancel order” – lives exclusively in Orchestrator. The Order Service doesn’t know why it should cancel; he does it simply because he receives the command (CancelOrderCommand).

5. Conclusion – orchestration vs. choreography

We have now seen both SAGA implementations: the decentralized choreography (part 2) and the centralized orchestration (part 3).

With Orchestrator, we have bundled all business logic and state management into a single service. The participating services (‘Order’, ‘Payment’, ‘Inventory’) have been demoted to “dumb” but highly specialized workers who only execute commands and report back the status of their atomic transaction.

This approach addresses the choreography’s biggest weaknesses:

Benefits of Orchestration

✅ Core business logic: The entire process flow is defined in one class in Orchestrator. New developers can read the code and understand the entire process immediately.
✅ Superior Monitoring: To know the status of any order (e.g. SagaID-123), we only need to look at one table: the SagaInstance table in the orchestrator’s database.
✅ Explicit error handling: Compensation is no longer a reactive side effect, but an explicit else if (error) block in the central code. The rollback process is just as clearly defined as the “happy path”.
✅ Lower service complexity: The microservices themselves remain lean. The “Order Service” does not need to know anything about payments and vice versa.

Disadvantages of orchestration

❌ Single Point of Failure: The Orchestrator is a critical service. If it fails, no new sagas can be started or ongoing sagas can be continued. It must be designed to be highly available.
❌ Central bottleneck: All command and reply traffic goes through this one service, which can become a scaling problem under extremely high loads.
❌ Coupling to the Orchestrator: The services are decoupled from each other, but they are now firmly bound to the Orchestrator and the Commands defined by it.

The verdict

There is no “best” solution, only the “most suitable” one for your use case.

Use choreography for simple, linear sagas with few participants where maximum decoupling is more important than central monitoring.
Use orchestration for sagas that are complex, have many steps, require conditional logic (if/else), or where auditability and clear monitoring are critical to the business.