Monitoring

Visibility is key when designing a webhook setup that is both reliable and fault tolerant. You can’t afford to have webhooks operate in a black box as that makes it difficult to validate their efficacy or deal with issues when they arise.

In this article, I will go through the most common monitoring problems you might run into and how to solve them with Hookdeck. Each problem we tackle will also include a small discussion section to provide more insight into the problem.

Your monitoring setup: too much or too little?

Your current monitoring stack

Monitoring can be difficult to get right, and one of the reasons for this is the attraction a lot of engineers have to monitoring tools rather than monitoring principles. Prometheus, Datadog, New Relic, and Grafana are some big names in the monitoring space and are probably already part of your monitoring setup.

However, knowing what to monitor when it comes to working with webhooks is the starting point for correctly deciding your monitoring strategy. The knowledge is what leads to figuring out the tools that will help achieve your goal.

Drawbacks and overkill

Two things you want to avoid when setting up monitoring for your webhooks are drawbacks and overkill.

Drawbacks arise when your monitoring setup is inadequate, for example when your setup captures too little information to be of any use or has limited scope/coverage, or when your monitoring strategy is flawed (say it captures the wrong metrics). This can frustrate debugging efforts.

Overkill is a situation where you’re monitoring more scenarios than required or using sophisticated monitoring tools for simple monitoring problems. This can often lead to noise and overwhelm administrators. For example, you probably don’t need to have 10+ dashboards in Grafana or collect low-level CPU metrics with Prometheus to monitor failing webhooks when you can simply collect logs and watch for HTTP 4xx and 5xx errors.

Webhooks monitoring checklist

Based on the sensitivity of the application, the scope of monitoring can either be minimal and high-level or in-depth and thorough. However, no matter the strategy you’re going with, a base checklist of scenarios to monitor when working with webhooks consists of:

Availability and health of consumers
Webhook request errors (4xx and 5xx errors)
Consumer throughput and latency
Total number of webhook requests within a time period (e.g. requests/second)

These are critical to observing the activities of your webhooks, fixing issues, and avoiding fault thresholds.

For a detailed guide on monitoring webhooks, and a full list of scenarios, metrics, and performance indicators to monitor, check out our article “What to Monitor in a Webhook Infrastructure”.

Hookdeck's solution to common webhook monitoring problems

You’re not sure if webhooks are working

Problem

You need to see the status of each webhook.

Solution

With Hookdeck, all your webhook requests and events are automatically logged and you get an intuitive dashboard where you can query the status of all your webhooks to know which ones are successful and which ones failed.

Discussion

Without visibility into the activities of your webhooks, it is difficult to ascertain if a webhook has served its purpose or failed. You need to log information about your webhooks and set up a visualization that helps query and display information about the status of each webhook.

A standard setup should contain the following components:

A log collection component on the receiving server (NGINX has logs enabled by default while Node.js servers require logging to be implemented by the developer)
A log collection and querying components like Elastisearch/Logstash or Splunk
A log data visualization component like Kibana or Grafana

You don’t know when webhooks are failing

Problem

You need to receive alerts when webhooks are failing.

Solution

Hookdeck allows you to configure notifications for failed webhooks. This helps you to be more proactive with how you handle webhook failures. These failure notifications also contain the payload of the webhook that failed and can be sent to different channels to quickly reach the individuals that need to take action.

Discussion

To troubleshoot your webhooks, you first need to be informed about a webhook that has failed. Thus, you need to set up a feedback system that informs you when there is a failure.

The recommended way to achieve this is by integrating an alerting component into your logging and monitoring system. When a webhook fails, an event is raised which then triggers an alert to the administrator. This alert can be in the form of an email, Slack message, or push notification.

You don’t know which webhooks failed

Problem

You need to find failed webhooks.

Solution

Hookdeck automatically logs all webhook events. The dashboard also comes with filters for finding failed webhooks using the server response status codes. You can also click a failed webhook to investigate the reason for its failure.

Discussion

To find webhooks that have failed, you need to be able to query your log data for the status of your webhooks.

It is recommended that a log collection system with a standard query language, like the ELK Stack or Splunk, be implemented. This way, you can aggregate all information collected from logging your webhooks and use the parameters that indicate failure to query failed webhooks.

You don’t know why webhooks are failing

Problem

You need to see the error message, status code, and payload of failed webhooks.

Solution

When a webhook fails, Hookdeck allows you to investigate the failure by providing the status code, headers, response body, and payload for the webhook. You can also replay your webhooks after applying a fix to see if the error has been cleared.

Discussion

To troubleshoot and fix a failed webhook, you need to know why it failed. Webhooks need to be tracked to collect all necessary information that can help in determining why a failure occurred.

It is recommended that you collect, at least, the following information on each webhook:

The HTTP status code to determine the nature of the error
The status message for more details and context on the error
The webhook headers and payload to help investigate the root cause of the error and/or recreate the error

Conclusion

Observability is critical to webhook communication. Having the right tool to monitor webhooks that are important to your business needs is the purpose Hookdeck aims to fulfill with the monitoring features that come built-in. With Hookdeck, you get just the right amount of monitoring to keep your webhooks reliable.

Managing Error Recovery