TLDR.Chat

SourceHut's Unprecedented Network Outage and Emergency Migration

SourceHut network outage post-mortem 🔗

The post-mortem report from SourceHut discusses the unprecedented 170-hour network outage and the subsequent emergency migration of services to a different datacenter. The outage was caused by a layer 3 DDoS attack on the primary datacenter, impacting the network and leading to service unavailability. The team worked to restore services by migrating to a new datacenter in Amsterdam, addressing challenges with network solutions and provisioning services. The incident highlighted the need for improved resilience and redundancy, prompting accelerated progress towards long-term infrastructure goals. Despite the challenges, the team successfully restored full service and is now focusing on internal tasks to finalize the new installation and deprecate the old datacenter. Additionally, they are planning further improvements to increase reliability and scalability, including the exploration of an on-premise Kubernetes cluster. The report concludes with acknowledgements for the team, network operators, community members, and peer organizations who supported SourceHut during the challenging period.

Related