Sentinel
Real-time infrastructure monitoring dashboard with anomaly detection, alerting, and historical trend analysis

Every monitoring tool I had used either required constant threshold tuning or produced so many false positives that teams learned to ignore alerts. Sentinel builds a statistical baseline of normal behavior per metric and flags only meaningful deviations, using a suite of detectors and routing alerts through an escalation matrix that considers schedules and rotations.
- Multi-source metric ingestion via a unified HTTP API supporting Prometheus remote write, CloudWatch webhook, and custom agent protocol
- Statistical anomaly detection using moving-window z-score, Holt-Winters forecasting, and seasonal decomposition for metrics with periodic patterns
- Escalation-based alert routing with support for PagerDuty, Slack, email, and custom webhooks with schedule-aware rotation policies
Ingestion layer built in Go using a fan-out pattern - incoming metrics are written to a write-ahead log for durability, then asynchronously fanned out to the time-series store (TimescaleDB) and the detection pipeline. This decouples ingestion speed from detection latency, allowing the system to absorb burst traffic without backpressure.
Detection pipeline is a chain of composable detectors: z-score for level shifts, Holt-Winters for trend/seasonality changes, and a residual-based detector for irregular patterns. Each detector emits typed events that feed into a correlation engine which groups related alerts into incidents, reducing noise by deduplicating across metrics and time windows.
Query layer exposes a D3-compatible time-series API that returns aggregated data at multiple granularities - raw, 1m, 5m, 1h, 1d rollups - using PostgreSQL continuous aggregates. Dashboard queries are automatically re-routed to the appropriate granularity based on the time range, keeping response times under 200ms for queries spanning up to a year.
Alert manager implements a state machine per alert rule: normal → pending (after breach) → firing (after duration) → acknowledged → resolved. Escalation policies are defined as DAGs with configurable delays, allowing sequential or parallel notification paths. The system records alert history for post-mortem analysis and tuning recommendations.