Snapshot algorithm
|
The snapshot algorithm is an algorithm used in distributed systems for recording a consistent global state of an asynchronous system.
The assumptions of the algorithm are as follows:
- There are no failures and all messages arrive intact and only once
- The communication channels are unidirectional and FIFO ordered
- There is a communication path between any two processes in the system
- Any process may initiate the snapshot algorithm
- The snapshot algorithm does not interfere with the normal execution of the processes
- Each process in the system records its local state and the state of its incoming channels
The algorithm works using marker messages. Each process that wants to initiate a snapshot records its local state and sends a marker on each of its outgoing channels. All the other processes, upon receiving a marker, record their local state, the state of the channel from which the marker just came as empty, and send marker messages on all of their outgoing channels. If a process receives a message after having recorded its local state, it records the state of the incoming channel from which the marker came as carrying all the messages received since it first recorded its local state.
Some of the assumptions of the algorithm can be facilitated using a more reliable communication protocol such as TCP/IP. The algorithm can be adapted so that there could be multiple snapshots occurring simultaneously.