Core service update

To our monday.com community,

On April 11th and 16th, we experienced some service interruptions on our US server that may have impacted your account. We want to share a more detailed update on what happened and how we plan to improve your experience moving forward.

The incidents resulted in a total of 59 minutes of system downtime and 62 minutes of degraded performance across the two dates. There was no data loss or security risk and EU and APAC servers were not affected.

We know that you rely on our platform and that any disruption to the platform impacts your workflow and for that we deeply apologize. We take these interruptions very seriously and we’ll continue to invest in preventing them from happening again.

A more detailed update follows below on exactly what happened and our plans for improving your experience moving forward.

What happened

Thursday, April 11th – Status page retro

The root cause of the incident on April 11th was a rare service issue in one of our service provider’s infrastructure, which resulted in 20 minutes of downtime.

Our monitoring system alerted our R&D teams to the issue, and we subsequently began moving the system to a backup. Some technical difficulties in doing so caused system instability and read-only mode for another 62 minutes.

At this stage, we’ve already updated our system readiness processes and fine-tuned our backup servers, enabling us to recover faster and reduce interruptions.

Total downtime on the US server: 20 minutes

Total time of degraded performance on the US server: 62 minutes

No service interruption to EU and APAC servers

Tuesday, April 16th – Status page retro

In the early hours of Tuesday, we experienced a 20-minute platform downtime. Our monitors detected the issue, and our teams worked to reboot the platform and begin investigating the cause.

During the investigation, a second issue occurred, causing fluctuations in the platform’s availability across one hour, totaling an additional 19 minutes of downtime.

Our investigations confirmed that the issue was caused by a recently deployed monitoring service designed to flag large or complex queries. While the new monitoring service had initially been running smoothly within the platform, it was identified as the cause of the infrastructure lock, and we promptly reverted to an older version of the service that had been operating for many months without any issues.

Total downtime on the US server: 39 minutes

No service interruption to EU and APAC servers

What’s next

Prevention and faster recovery

We continue to take a proactive approach to ensuring platform stability, with both immediate action items and long-term plans in place to reduce risk, improve our ability to recover quickly, and, most importantly, prevent platform instability through enhanced resilience.

Prevention & reduced risk:

Ongoing improvements to monitoring abilities and system flags
Comprehensive processes around implementation and deployment
Continued investments to isolate core flows and make them independently resilient

Recovery & agility:

Implementation of checks & procedures that enable much faster recovery
Additional ready-to-go fallback systems
Processes that will allow our teams to be more agile and make timely decisions

We’ll share progress on these action items in this blog post and via our X.com support account.

More communication

We understand that there’s room for improvement in how we share real-time updates with you, whether around planned maintenance or an unexpected issue so that you know what to expect and how to get more support from us in mitigating the impact on your daily work.

We recommend following our X.com support account and our Status Page updates, which are updated in real-time.

Building a platform you can rely on

The bottom line is that the recent platform experience doesn’t align with what we strive to provide, and we sincerely apologize for the interruptions and any frustration or inconvenience caused. We continue to strive to deliver a platform that creates efficiency, brings impact, innovates along with you, and ultimately helps you meet your goals with ease.

The entire monday.com team and I thank you for your patience, ongoing trust, and understanding.

Sergei Liakhovetsky

VP Engineering, Infrastructure

Home > Product updates > Core service update