[Rails] ActiveJobのretry_onでエラーオブジェクトを活用した動的リトライ戦略が可能に

Context

これまで ActiveJob の retry_on で使用する wait Procは、実行回数（executions）のみを引数として受け取ることができました。しかし、API レート制限エラーのように、エラー自体が次のリトライ時刻の情報（例: Retry-After ヘッダー）を持っているケースでは、その情報を活用できないという制約がありました。

この変更により、wait Procがエラーオブジェクトを第2引数として受け取れるようになり、エラーの属性に基づいた柔軟なリトライ戦略の実装が可能になります。

Technical Detail

新しいインターフェース

wait Procは引き続き1引数形式をサポートしつつ、2引数形式も受け入れるようになりました：

class RemoteServiceJob < ActiveJob::Base
  # 従来の1引数形式（後方互換性を維持）
  retry_on StandardError, wait: ->(executions) { executions * 2 }

  # 新しい2引数形式
  retry_on CustomError, wait: ->(executions, error) { 
    error.retry_after || executions * 2 
  }

  def perform
    # ...
  end
end

実装の詳細

変更の核心は ActiveJob::Exceptions#determine_delay メソッドにあります：

def determine_delay(seconds_or_duration_or_algorithm:, executions:, error: nil, jitter: JITTER_DEFAULT)
  # ...
  when Proc
    algorithm = seconds_or_duration_or_algorithm
    # arityをチェックして適切な引数で呼び出し
    algorithm.arity == 1 ? algorithm.call(executions) : algorithm.call(executions, error)
  # ...
end

Procの arity（引数の数）を動的にチェックすることで、以下の挙動を実現しています：

arity == 0 または arity == 1: executions のみを渡す（既存の動作）
arity >= 2: executions と error の両方を渡す（新機能）

このアプローチにより、既存コードとの完全な後方互換性を保ちながら新機能を追加しています。

実用例: Rate Limit対応

API レート制限エラーで実際に活用する例：

class RateLimitError < StandardError
  attr_reader :retry_after

  def initialize(message, retry_after:)
    super(message)
    @retry_after = retry_after
  end
end

class ApiClientJob < ActiveJob::Base
  retry_on RateLimitError, 
    wait: ->(executions, error) { 
      # エラーが提供するretry_after値を優先し、
      # なければフォールバック戦略を使用
      error.retry_after || (executions ** 2)
    },
    attempts: 5

  def perform(endpoint)
    response = api_client.get(endpoint)

    if response.status == 429
      retry_after = response.headers['Retry-After'].to_i
      raise RateLimitError.new("Rate limited", retry_after: retry_after)
    end

    # process response
  end
end

テストコード

新機能の動作確認用テストが追加されました：

class RetryWaitIncludedInError < StandardError
  def retry_after
    10  # エラー自体が10秒後のリトライを指示
  end
end

class RetryJob < ActiveJob::Base
  retry_on RetryWaitIncludedInError, 
    wait: ->(executions, error) { error.retry_after + executions }
end

このテストでは、エラーの retry_after 値（10秒）に実行回数を加算した値が待機時間として使用されることを検証しています。

まとめ

この変更により、外部APIのレート制限、サーバー側の指定する再試行タイミング、エラーの重大度に応じた待機時間など、エラーコンテキストに基づいた高度なリトライ戦略の実装が可能になりました。arity チェックによる実装パターンは、Ruby らしい後方互換性の保ち方として参考になるでしょう。