[post-mortem] 2026-03-27 - Channel Partners Service Degradation
Incident Overview
Date: 2026-03-27
Time (PDT): 5:00 PM - 6:20 PM (00:00 - 01:20 UTC)
Duration: 70 minutes
Severity: Medium
Duration breakdown:
Degradation: ~50 minutes
Significant degradation: ~20 minutes
Services Impacted:
Channel Partners (internal API)
Cloud connectivity (downstream impact on user lookups)
Customer Impact:
Organization systems were inaccessible by Organization users.
VMS systems experienced delayed responses when submitting usage reports and requesting user information during the affected window.
Cloud-connected systems may have experienced slower authentication and user synchronization.
No data loss occurred. All requests were eventually retried and processed successfully.
Root Cause
A scheduled batch of requests from connected VMS systems at midnight UTC (5:00 PM PDT) overwhelmed a backend service that was under-provisioned for the traffic volume. The service had insufficient capacity to handle the concurrent request load, causing response times to increase significantly.
A routine deployment that occurred during the degradation window was initially suspected but was confirmed through analysis to not be the cause. The traffic spike began 50 minutes before the deployment started.
How We Fixed It
The service recovered automatically as the batch of scheduled requests completed. The backend service capacity was increased after the incident to handle future traffic spikes.
Corrective Actions
Already Implemented
Increased backend service capacity to handle peak traffic loads
Short Term
Adding monitoring and alerting for service latency and traffic volume
Adjusting deployment windows for this service to avoid peak traffic periods
Long Term
Evaluate if scheduled requests can be spread over a wider time window to reduce peak load
Reviewing traffic routing to ensure optimal load distribution across services