Network issues in Dallas

Incident Report for ZerOne Hosting

Postmortem

On February 10th of 2022, at our datacenter provider's network in the Dallas region, the Top-of-the-Rack switch (AS-R40) experienced a malfunction during a standard operation. A new customer was getting installed via a trunk hand-off to the switch causing the event. Unfortunately, the root cause is still unknown to our provider’s engineering department. The DC’s engineering team will attempt to reproduce the issue in their lab to understand the event's root cause further.

Events observed:

While implementing the new customer setup, the DC engineers noticed multiple BGP session flapping and Layer 2 issues. The engineering team also noticed that one of the virtual-chassis members was functioning abnormally, causing the aforementioned issues.

Remedial steps:

After much effort troubleshooting the issue, the DC’s engineers decided to remove the FPC member causing the disruption. After the malfunctioning FPC was removed, all standard functionality was resumed. The malfunctioning FPC member was wiped clean of its configuration and added back to the virtual chassis.

We sincerely apologize for the inconvenience this event has caused you. During our 10+ years of operations, this was the first downtime event that affected so many clients and services for more than one hour (most clients were not affected, however some had intermittent network connectivity for up to 4 hours). If you were affected by this downtime please open a support ticket and we’ll issue the necessary SLA refunds as described in our TOS.

Posted Feb 14, 2022 - 13:07 UTC

Resolved

This incident has been resolved.

Posted Feb 11, 2022 - 02:31 UTC

Identified

A network switch in one of our racks is down. We are working to get it back up as soon as possible.

Posted Feb 10, 2022 - 23:03 UTC

Investigating

We are currently investigating network downtime in our Dallas datacenter.

We'll provide an update as soon as possible.

For any questions email us at support@zeronehosting.com

Posted Feb 10, 2022 - 22:13 UTC

This incident affected: US - Central.