Skip to main content

Sagas

When building distributed systems, it is crucial to ensure that the system remains consistent even in the presence of failures. One way to achieve this is by using the Saga pattern.

What is a Saga?

A Saga is a design pattern for handling transactions that span multiple services. It breaks the process into a sequence of local operations, each with a corresponding compensating action.

If a failure occurs partway through, these compensations are triggered to undo completed steps, ensuring your system stays consistent even when things go wrong.

How does Restate help?

Restate makes it easy to implement resilient sagas in your code:

  • Durable Execution: Restate guarantees that your code runs to completion. If a transient failure occurs, Restate automatically retries from the point of failure and ensures that all compensations run.
  • Resilience built-in: No need to manually track state or retry logic. Restate handles all persistence and compensation orchestration for you.
  • Code-first approach: Define sagas using regular code, no DSLs. Track compensations in a list, and execute them on non-transient failures.
Sagas UI

Example

Here is a typical travel booking workflow, where you book a flight, then rent a car, and finally book a hotel. If any step fails for a non-transient reason (e.g. driver license not accepted, hotel full), we want to roll back the previous steps to keep the system consistent.

Sagas example diagram

Restate lets us implement this purely in code without any DSLs or extra infrastructure.

  • Wrap your business logic in a try-block, and throw a terminal error for cases where you want to compensate and finish.
  • For each step you do in your try-block, add a compensation to a list.
  • In the catch block, in case of a terminal error, you run the compensations in reverse order, and rethrow the error.

Note that for Golang we use defer to run the compensations at the end.

GitHub
const bookingWorkflow = restate.service({
name: "BookingWorkflow",
handlers: {
run: async (ctx: restate.Context, req: BookingRequest) => {
const { customerId, flight, car, hotel } = req;
// create a list of undo actions
const compensations = [];
try {
// For each action, we register a compensation that will be executed on failures
compensations.push(() => ctx.run("Cancel flight", () => flightClient.cancel(customerId)));
await ctx.run("Book flight", () => flightClient.book(customerId, flight));
compensations.push(() => ctx.run("Cancel car", () => carRentalClient.cancel(customerId)));
await ctx.run("Book car", () => carRentalClient.book(customerId, car));
compensations.push(() => ctx.run("Cancel hotel", () => hotelClient.cancel(customerId)));
await ctx.run("Book hotel", () => hotelClient.book(customerId, hotel));
} catch (e) {
// Terminal errors are not retried by Restate, so undo previous actions and fail the workflow
if (e instanceof restate.TerminalError) {
// Restate guarantees that all compensations are executed
for (const compensation of compensations.reverse()) {
await compensation();
}
}
throw e;
}
},
},
});
restate.endpoint().bind(bookingWorkflow).listen(9080);
Example not available in your language?

This pattern is implementable with any of our SDKs. We are still working on translating all patterns to all SDK languages. If you need help with a specific language, please reach out to us via Discord or Slack.

When to use Sagas

Restate automatically retries all transient failures, like network hiccups or temporary service outages. But not all failures are temporary.

For these failures, sagas are essential:

  1. Business logic requirements:

    • Some failures are not transient but a business decision (e.g. “Hotel is full” or “Driver license not accepted”), retrying won't help.
    • In this case, you can throw a terminal error to stop the execution and trigger the compensations.
  2. User/system-initiated cancellations:

    • If a user cancels a long-running invocation (say via UI or CLI), this triggers a terminal error.
    • Restate will not retry.
    • Again, a saga can kick in to undo previous successful operations so the system doesn't end up in an inconsistent state (e.g., booking a hotel but not a car).

Running the example

1
Download the example

restate example typescript-patterns-use-cases && cd typescript-patterns-use-cases

2
Start the Restate Server

restate-server

3
Start the Service

npx tsx watch ./src/sagas/booking_workflow.ts

4
Register the services

restate deployments register localhost:9080

5
Send a request

curl localhost:8080/BookingWorkflow/run --json '{
"flight": {
"flightId": "12345",
"passengerName": "John Doe"
},
"car": {
"pickupLocation": "Airport",
"rentalDate": "2024-12-16"
},
"hotel": {
"arrivalDate": "2024-12-16",
"departureDate": "2024-12-20"
}
}'

4
Check the UI or service logs

See in the Restate UI (localhost:9070) how all steps were executed, and how the compensations were triggered because the hotel was full.

Sagas UI

Advanced: Idempotency and compensations

Since sagas in Restate are implemented in user code, compensations are flexible and powerful, as long as they're idempotent: you can reset service state, call other services to undo prior actions, use ctx.run to delete rows or reverse database operations.

The example above uses the customer ID to guarantee idempotency, so that on retries it will not create duplicate bookings or rentals. The example assumes that the API provider deduplicates the requests based on this ID.

Based on the API you are using, generating the idempotency key and registering the compensation can be done in different ways:

  1. Two-phase APIs: First you reserve, then confirm or cancel. Register the compensation after reservation, when you have the resource ID. Reservations that are not confirmed, get automatically cancelled by the API after a timeout.
const bookingId = await ctx.run(() =>
flightClient.reserve(customerId, flight)
);
compensations.push(() =>
ctx.run(() => flightClient.cancel(bookingId))
);
// ... do other work, like reserving a car, etc. ...
await ctx.run(() => flightClient.confirm(bookingId));
  1. One-shot APIs with idempotency key: First, you generate an idempotency key and persist it in Restate. Then, you register the compensation (e.g. refund), and finally do the action (e.g. charge). We need to register the compensation before doing the action, because there is a chance that the action succeeded but that we never got the confirmation.
const paymentId = ctx.rand.uuidv4();
compensations.push(() =>
ctx.run(() => paymentClient.refund(paymentId))
);
await ctx.run(() => paymentClient.charge(paymentInfo, paymentId));