Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Incident Overview

  • Date & Time:

    • Start: 2024-11-26, 06:40 PM AEST

    • Resolved: 2024-11-27, 2:00 PM AEST

  • Duration: 19 hours, 20 minutes

  • Service Impacted: Connection time to systems routed through Melbourne 1 Relay Server (~15% of the Australian users) was increased (up to 2 minutes)

  • Severity: Low

  • Customer Impact: Small percent of users experienced longer connection times.

  • Action Required: Update Firewall Passlist configurations and monitoring endpoints according to the https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article.

Investigation Log Timeline (AEST)

2024-11-25 26 2:19 PM: We started getting complaints about the connection time that was significantly increased in Oceania.

2024-11-25 26 7:07 PM: We narrowed the issue down to the Melbourne 1 Relay Server and started investigation.

...

2024-11-27 1:02 AM: We identified the issue and started working on the solution.

2024-11-27 2:00 PM:

  • We deployed TWO new Relay Servers in the Asia-Pacific region:

    • Sydney, Australia 1 relay-au-syd-1-prod-dp.vmsproxy.com (95.173.193.212)

    • Sydney, Australia 2 relay-au-syd-2-prod-dp.vmsproxy.com (95.173.193.213)

  • We disabled TWO existing Relay Servers in the Asia-Pacific region:

    • Sydney 4, Australia vultr-syd-4.vmsproxy.com (45.77.51.96)

    • Melbourne 1, Australia vultr-mel-2.vmsproxy.com (67.219.103.112)

  • We restarted the Connection Mediator in that area to re-route traffic to the new Relay Servers.

2024-11-27 2:10 PM: The performance increase is confirmed. The https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article has been updated.

Root Cause

The incident is caused by high load on the Melbourne 1 Relay Server.

Corrective Actions

  1. Relay Servers Update
    The issue will be fixed once all Relay Servers will be updated in all regions

  2. Enhanced Monitoring
    The monitoring system will be updated to track issues associated with the Relay Servers performance.

  3. Required Actions
    Update Firewall Passlist configurations and monitoring endpoints according to the https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article.