Last week, Microsoft 365 services around the world suffered a major outage that lasted about five hours. As the company said now, the problem was caused by a mistake when changing the IP address of the router, which led to problems with forwarding packets between other WAN routers and cascading.
Let me remind you that we also wrote that Facebook explained the reasons for the global failure, and also that Janet Jackson Song Killed Hard Drives on Old Laptops.
You might also be interested in: First Patches of 2023: Microsoft Fixes 98 Vulnerabilities, Including 0-Day under Attacks.
Initially, immediately after the problems occurred, Microsoft reported that the failure was caused by problems with the DNS and WAN configuration, which provoked a WAN update. This led to erratic crashes that peaked about every 30 minutes, as evidenced by the Microsoft Azure status page (which was also affected by the problems, since the page periodically showed the “504 Gateway Time-out” error).
The affected services include Microsoft Teams, Exchange Online, Outlook, SharePoint Online, OneDrive for Business, PowerBi, Microsoft 365 Admin Center, Microsoft Graph, Microsoft Intune, Microsoft Defender for Cloud Apps, and Microsoft Defender for Identity.
As the company now said, the problem arose after changing the IP address of the WAN router using a command that “was not thoroughly tested and showed different behavior on different network devices.”
Although the network eventually began to recover on its own, the automated systems responsible for keeping the WAN up and running were suspended due to the impact on the network. The shutdown affected systems for identifying and eliminating inoperable devices, as well as traffic management and optimization systems.
As a result of this pause, some network paths continued to lose packets until the systems were manually restarted and the WAN returned to optimal operating conditions, completing the recovery process.
Microsoft specialists assure that they will now block the execution of commands that can lead to such a “resonance”, and will also require that all commands executed strictly comply with the recommendations for safe configuration changes.