Skip to content

Failure Strategies and Compensation

In production-grade orchestration, "catching an error" is only the first step. True resilience requires a strategy for what happens next. VibeBlocks provides sophisticated failure management, including localized retries and global compensation (undo) logic.

1. Local Resilience: Retry Policies

Every Block can be configured with a RetryPolicy to handle transient errors (like network timeouts or temporary API rate limits) before the wider flow is even aware of a problem.

Configuration Options

  • max_attempts: Total number of execution tries.
  • backoff: Strategy for wait times (FIXED, LINEAR, EXPONENTIAL).
  • jitter: Adds randomness to delays to prevent "thundering herd" issues in distributed systems.
  • retry_on / give_up_on: Fine-grained control based on exception types.

from vibeblocks.policies.retry import RetryPolicy, BackoffStrategy

custom_policy \= RetryPolicy(
max_attempts=5,
delay=1.0,
backoff=BackoffStrategy.EXPONENTIAL,
jitter=True,
retry_on=[ConnectionError] # Only retry on connection issues
)

@block(retry_policy=custom_policy)
def unstable_api_call(ctx):
...

Safety Limits

To prevent accidental resource exhaustion in production, RetryPolicy enforces hard caps during initialization:

Field Limit Behavior on Violation
max_attempts MAX_ATTEMPTS_LIMIT = 100 Capped at 100; negative values set to 0.
max_delay MAX_DELAY_LIMIT = 3600 s Capped at 1 hour; negative values set to 0.0.
delay Negative values clamped to 0.0.

These limits are applied silently via __post_init__, so no exception is raised:

# Both of these are silently capped at the safety limits
policy \= RetryPolicy(max_attempts=9999, max_delay=86400.0)
print(policy.max_attempts) # 100
print(policy.max_delay) # 3600.0

2. Global Resilience: Failure Strategies

When a block fails permanently (exceeds retries), the Flow orchestrator decides the fate of the entire process based on its FailureStrategy.

Strategy Description Best Use Case
ABORT Stops execution immediately. Critical data integrity where no undo is possible.
CONTINUE Logs the error and moves to the next block. Non-critical tasks (e.g., optional tracking).
COMPENSATE Stops and triggers undo logic for all successful blocks. Financial transactions or multi-step service updates.

3. The Compensation (Undo) Pattern

The COMPENSATE strategy implements the Saga Pattern. If a step fails, VibeBlocks automatically executes the undo handlers of all previously completed blocks in reverse order (LIFO).

How it Works

  1. Completion Tracking: VibeBlocks maintains a completed_blocks set in the ExecutionContext.
  2. Reverse Trigger: Upon failure, the orchestrator walks backward through the execution history.
  3. Execution: Only blocks that actually finished successfully have their undo functions called.

graph LR
subgraph Execution
A[Step 1: OK] --> B[Step 2: OK]
B --> C[Step 3: FAIL]
end

subgraph Compensation  
C \-.-\> D\[Undo Step 2\]  
D \-.-\> E\[Undo Step 1\]  
end

style C fill:\#f96,stroke:\#333  
style D fill:\#9cf,stroke:\#333  
style E fill:\#9cf,stroke:\#333

Code Implementation

def rollback_db_record(ctx):
print(f"Deleting record for user: {ctx.data.user_id}")

@block(undo=rollback_db_record)
def create_db_record(ctx):
# Simulate database insertion
ctx.data.user_id \= "uuid_99"

@block()
def trigger_failure(ctx):
raise RuntimeError("Simulated downstream failure")

# Setting the Flow to COMPENSATE ensures create_db_record is undone
flow \= Flow("TransactionalFlow", [create_db_record, trigger_failure], strategy=FailureStrategy.COMPENSATE)

4. Engineering Best Practices

  • Idempotency: Ensure your undo functions are idempotent. If a compensation is retried or interrupted, running it again should not cause inconsistent state.
  • Atomic State: Use the ExecutionContext to store IDs or tokens generated during a block's execution so the undo function knows exactly what to revert.
  • Async Compensation: If your undo logic involves network calls (e.g., a DELETE request), define it as an async def. VibeBlocks will automatically await it if you are using the AsyncRunner.

Engineered with precision by AA Digital Business. High-end AI Architecture.