Blog from November, 2024

Incident Overview

  • Date & Time:

    • Start: 2024-11-26, 06:40 PM AEST

    • Resolved: 2024-11-27, 2:00 PM AEST

  • Duration: 19 hours, 20 minutes

  • Service Impacted: Connection time to systems routed through Melbourne 1 Relay Server (~15% of the Australian users) was increased (up to 2 minutes)

  • Severity: Low

  • Customer Impact: Small percent of users experienced longer connection times.

  • Action Required: Update Firewall Passlist configurations and monitoring endpoints according to the https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article.

Investigation Log Timeline (AEST)

2024-11-26 2:19 PM: We started getting complaints about the connection time that was significantly increased in Oceania.

2024-11-26 7:07 PM: We narrowed the issue down to the Melbourne 1 Relay Server and started investigation.

2024-11-26 10:50 PM: We noticed significant service degradation at 6:50 PM on Melbourne 1 Relay Server vultr-mel-2.vmsproxy.com (67.219.103.112).

2024-11-27 1:02 AM: We identified the issue and started working on the solution.

2024-11-27 2:00 PM:

  • We deployed TWO new Relay Servers in the Asia-Pacific region:

    • Sydney, Australia 1 relay-au-syd-1-prod-dp.vmsproxy.com (95.173.193.212)

    • Sydney, Australia 2 relay-au-syd-2-prod-dp.vmsproxy.com (95.173.193.213)

  • We disabled TWO existing Relay Servers in the Asia-Pacific region:

    • Sydney 4, Australia vultr-syd-4.vmsproxy.com (45.77.51.96)

    • Melbourne 1, Australia vultr-mel-2.vmsproxy.com (67.219.103.112)

  • We restarted the Connection Mediator in that area to re-route traffic to the new Relay Servers.

2024-11-27 2:10 PM: The performance increase is confirmed. The https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article has been updated.

Root Cause

The incident is caused by high load on the Melbourne 1 Relay Server.

Corrective Actions

  1. Relay Servers Update
    The issue will be fixed once all Relay Servers will be updated in all regions

  2. Enhanced Monitoring
    The monitoring system will be updated to track issues associated with the Relay Servers performance.

  3. Required Actions
    Update Firewall Passlist configurations and monitoring endpoints according to the https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article.

UPDATE OVERVIEW

The Relay Servers will be added in each region one by one. Then the traffic will be re-routed to the new Relay Servers.

Please add the new IP addresses and FQDNs to your firewall configurations and monitoring endpoints:

 North America

New Relay Servers:

Ashburn 1, VA
relay-us-abn-1-prod-dp.vmsproxy.com (37.19.207.250)

Chicago 1, IL
relay-us-chi-1-prod-dp.vmsproxy.com (143.244.60.94)

Chicago 2, IL
relay-us-chi-2-prod-dp.vmsproxy.com (143.244.60.77)

Dallas 1, TX
relay-us-dal-1-prod-dp.vmsproxy.com (95.173.216.240)

Los Angeles 1, CA
relay-us-lax-1-prod-dp.vmsproxy.com (89.187.187.173)

Los Angeles 2, CA
relay-us-lax-2-prod-dp.vmsproxy.com (89.187.185.106)

Miami 1, FL
relay-us-mia-1-prod-dp.vmsproxy.com (121.127.41.217)

New York 1, NY
relay-us-nyc-1-prod-dp.vmsproxy.com (79.127.243.103)

New York 2, NY
relay-us-nyc-2-prod-dp.vmsproxy.com (79.127.243.102)

Seattle 1, WA
relay-us-sea-1-prod-dp.vmsproxy.com (79.127.221.31)

Old Relay Servers (will be removed once traffic is re-routed):

New York, NY 2
relay-ny2.vmsproxy.com (89.187.177.166)

New York, NY 3
dp-ny-3.vmsproxy.com (138.199.41.85)

New York, NY 4
dp-ny-4.vmsproxy.com (156.146.58.154)

Los Angeles, CA 1
relay-la.vmsproxy.com (185.152.67.150)

Los Angeles, CA 2
Dp-la-2.vmsproxy.com (84.17.45.136)

Chicago, IL
relay-chi.vmsproxy.com (89.187.181.221)

Miami, FL
relay-mia.vmsproxy.com (212.102.60.89)

Dallas, TX
relay-dp-dal-1.vmsproxy.com (89.187.175.87)

Ashburn, VA
relay-dp-ash-1.vmsproxy.com (37.19.207.90)

Seattle, WA
relay-dp-sea-1.vmsproxy.com (138.199.12.70)
 Asia-Pacific

New Relay Servers:

Sydney, Australia 1
relay-au-syd-1-prod-dp.vmsproxy.com (95.173.193.212)

Sydney, Australia 2
relay-au-syd-2-prod-dp.vmsproxy.com (95.173.193.213)

Singapore 1
relay-sg-sin-1-prod-dp.vmsproxy.com (169.150.243.58)

Singapore 2
relay-sg-sin-2-prod-dp.vmsproxy.com (169.150.207.238)

Old Relay Servers (will be removed once traffic is re-routed):

Sydney, Australia 4
vultr-syd-4.vmsproxy.com (45.77.51.96)

Melbourne 1, Australia
vultr-mel-2.vmsproxy.com (67.219.103.112)
 Europe

New Relay Servers:

Prague 1, Czech
relay-cz-prg-1-prod-dp.vmsproxy.com (143.244.58.97)

Frankfurt 1, Germany
relay-de-fra-1-prod-dp.vmsproxy.com (87.249.129.114)

Amsterdam 1, Netherlands
relay-nl-ams-1-prod-dp.vmsproxy.com (79.127.227.187)

Stockholm 1, Sweden
relay-se-sto-1-prod-dp.vmsproxy.com (79.127.249.120)

Old Relay Servers (will be removed once traffic is re-routed):

Frankfurt 1, Germany
relay-fr.vmsproxy.com (195.181.174.35)

Frankfurt 2, Germany
relay-dp-fr-2.vmsproxy.com (109.61.80.226)

Stockholm, Sweden
relay-dp-sto-1.vmsproxy.com (121.127.46.174)

Once the traffic is fully re-routed, old Relay Servers will be turned off (see Schedule below).

See https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist for updated IP addresses.

RELEASE NOTES / SCOPE OF WORK

  • Update OS to Ubuntu 24.04.

  • Add the Coturn service (WebRTC-specific traffic relay in Nx WebRTC infrastructure).

  • Enhance monitoring of AWS hosts and docker_containers.

  • Update versions of all docker containers and optimize their Docker files.

  • Unify all Relay Servers settings (except domain names).

  • Adjust RAM and swap files limitations for all docker containers on Relay Servers.

SCHEDULE

Thursday, Dec 19, 2024 - updating Relay Servers by regions and re-routing traffic:

  • 6:30 AM PST - North America

  • 7:30 AM PST - Asia-Pacific (Friday, Dec 20, 2:30 AM Sydney Time)

  • 8:30 AM PST - Europe (Thursday, Dec 19, 5:30 PM Europe time)

  • 9:30 AM PST - Deprecating old relays via API

  • 9:30 AM PST - Testing

Monday, Dec 23, 2024:

  • 5:30 PM PST - Confirming that the traffic is fully migrated from old Relay Servers

  • 5:40 PM PST - Disabling old Relay Servers

Wednesday, Dec 25, 2024:

  • 4:00 AM PST - Turning off old Relay Servers

DOWNTIME

  • None

MAINTENANCE LOG

Dec 19th (PST):

  • 6:30 AM - North America update started

  • 7:25 AM - North America update completed

  • 7:30 AM - Asia-Pacific update started

  • 8:25 AM - Asia-Pacific update completed

  • 8:30 AM - Europe update started

  • 9:25 AM - Europe update completed

  • 9:30 AM - Deprecating old relays via API started

  • 9:40 AM - Deprecating old relays via API completed

  • 9:40 AM - Testing started

  • 10:15 AM - Testing successfully ended.

RELEASE NOTES

During the last update, we had to roll back Doc DB (it is responsible for the Cross system layouts functionality) because of the internal issues discovered during the final test round. The issues were fixed.

DOWNTIME

No downtime.

MAINTENANCE LOG

Nov 19th:

17:20 - Preparation started
17:40 - DocDB update started
17:43 - DocDB update completed
17:43 - Testing started
18:17 - Testing ended

2024-11-13 12:29 AM: We are currently experiencing issues with the Amsterdam 1 Relay Server relay-dp-ams-1.vmsproxy.com (89.187.174.241).

We do not expect service degradation from customers point of view.  Other Relays are handling the traffic.

The issue is related to the DataPacket hosting server. We are waiting to get the replacement server from DataPacket.

We will send another update once the replacement server is back online. 

2024-11-13 11:07 AM: New server has been provided.

IMPORTANT: New IP and Hostname: https://relay-nl-ams-1-prod-dp.vmsproxy.com (79.127.227.187). Please update your firewall settings and monitoring endpoints.

We update you once the server is online. 

2024-11-13 11:25 AM: New server is up and running. The https://support.networkoptix.com/hc/en-us/articles/360010795813-Firewall-Passlist article has been updated. Please update your firewall settings and monitoring endpoints.

RELEASE NOTES

IMPROVEMENTS

  • Added support for the upcoming Mobile Client 25.1.

  • Improved load balancing logic for Connection Mediators.

  • Improved the logic of selecting the most suitable Relay Server.

BUG FIXES

  • Cross-system layouts did not show up in the Desktop Client if a user specified capital letters in the Cloud Email address. Fixed.

DOWNTIME

No downtime. Current connections MAY be affected for the VMS versions less than 5.1.3. Users might need to log back in to their systems.

MAINTENANCE LOG

Nov 14th:

  • 17:00 - Preparation started

  • 17:40 - DocDB and US-EAST1 Mediator update started

  • 17:50 - DocDB and US-EAST1 Mediator update completed

  • 17:52 - Starting Update for the rest of mediators (region by region)

  • 18.29 - Mediators update completed

  • 18:30 - Testing started

  • 18:55 - Testing ended

  • 18:55 - Bug is found with DocDB

  • 18:55 - Manual Testing

  • 19:10 - Call to RollBack DocDB Service

  • 19:12 - Rollback changes applied

  • 19:19 - Rollback for DocDB completed

Release Notes

BUG FIXES:

  • Streams from cameras in 5.1.x systems could not be displayed on Cloud Portal (View tab). Fixed.

  • Internal fixes for the upcoming Channel Partners feature.

DOWNTIME

  • There might be up to 4 minutes downtime of the following services:

    • Cloud Portal

    • Email Notifications

    • Push Notifications.

  • Сloud connectivity will not be affected.

MAINTENANCE LOG

  • 17.20 - Preparation started

  • 17.40 - Cloud Portal update started

  • 17:56 - 17:57 - Cloud Portal and Push Notifications unavailable (downtime)

  • 18.06 - Cloud Portal update completed

  • 18:07 - Testing started

  • 18:18 - Testing successfully ended

Incident Overview:

  • Date & Time:

    • Start: 2024-11-01, 00:00 AM

    • Resolved: 2024-11-05, 12:40 PM

  • Duration: 4 days, 12 hours, 40 minutes

  • Service Impacted: Emails sent from VMS (partially)

  • Severity: Medium

  • Customer Impact: Emails originated from VMS via Cloud SMTP servers (not AWS) were sent with significant delays. Emails originated from Oct 31 00:00 AM to November 1 18:00 PM were lost. Emails originated from Cloud Portal were not affected.

Investigation Log

  • 2024-11-01 12:45 PM: We started getting complaints about Emails from Cloud Portal not going through. We started trying to investigate and reproduce the issue.

  • 2024-11-04: The issue was reproduced and escalated to the Cloud team for further investigation.

  • 2024-11-05 9:18 AM: The issue was narrowed down to the systems using the new feature that allows sending Emails through Cloud SMTP server.

  • 2024-11-05 11:30 AM: The issue was fixed on the infrastructure side by restarting the Email service and then the Email queue started processing.

  • 2024-11-05 12:40 PM: The Emails queue was fully processed, however certain Emails originated from Oct 31 00:00 AM to November 1 18:00 PM were lost.

Root Cause

The incident is likely related to the latest Cloud Portal update. The exact root cause is still unknown.

Corrective Actions

  1. Enhanced Monitoring
    The monitoring system will be updated to track issues associated with the Emails originated from VMS and processed through Cloud SMTP Servers.

  2. Further Investigation
    We will be looking at the Email queue and monitor the performance. Enhanced Monitoring will help us to investigate the root cause and prepare the hotfix

Summary

We identified the ongoing issue with the Cloud Portal - streams from cameras cannot be displayed on 5.1.x systems.

Investigation Log

  • 2024-10-31: After the recent Cloud Portal update, we started getting various reports about the issues with playing back streams from cameras on Cloud Portal.

  • 2024-11-1: After closer investigation, we realized that the issues occur on some 5.1.x systems only.

  • 2024-11-4: We confirmed that the issue is affecting all customers on 5.1.x systems and that 6.0 systems are working fine.

Corrective Actions

  • We are preparing the hotfix to be deployed within a few days. We will send the update and schedule the hotfix once we are done with the testing.

  • We will follow up with the Post-Mortem / Root Cause Analysis