We’ve recently set up alarms to monitor our background services such as image processing, Dropbox integration, etc.

When errors occur in a background service, you may not even notice until it’s too late. For example if a message notification didn’t get sent you won’t know until the message author writes again, probably angry at this point.

These alarms can help us to provide a more reliable system:

  1. Reported failure.
  2. We can take immediate action.
  3. We can deploy a bugfix before even the users notice.
  4. We can prevent the same situation in the future, having a system that gets more reliable over time.

The goal is to let the developers know about the issue as soon as possible.