Error Handling and Fault Tolerance in Erlang

April 8, 2024

erlang

This post delves into Erlang’s built-in support for fault tolerance and error handling. We cover supervisors, restart strategies, and the ‘let it crash’ philosophy, showcasing how Erlang enables resilient systems.

Supervisors in Erlang

Erlang’s supervision concept is fundamental to building fault-tolerant systems. A supervisor is responsible for starting, stopping, and monitoring child processes. If a child process terminates, the supervisor can decide whether to restart it based on predefined restart strategies.

Restart Strategies

Erlang provides different restart strategies to handle failures. These strategies include:

One For One: Only the failed process is restarted.
One For All: If one process fails, all processes are restarted.
Rest For One: The failed process and a specified number of processes after it are restarted.
Simple One For One: Designed for scenarios where child processes are dynamically created and terminated.

‘Let It Crash’ Philosophy

Erlang promotes a ‘let it crash’ philosophy, which encourages isolating errors and allowing processes to fail without affecting the overall system. This approach simplifies error handling and recovery, as supervisors are designed to manage the lifecycle of child processes.

Example

<pre>
  <code class="language-erlang">
-module(supervisor_example).
-behaviour(supervisor).
-export([start_link/0, init/1]).

start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    {ok, {{one_for_one, 5, 10}, []}}.
</code>
</pre>

In this example, we define a supervisor using the supervisor behavior and specify a one_for_one restart strategy with a maximum of 5 restarts within 10 seconds.