Incident Post Mortem: October 27, 2021

Coinbase News

Summary

Between approximately 6:40 am and 10:42 am PT, and again between 12:20 pm and 2:32 pm PT on Wednesday, October 27th, we experienced intermittent outages on Coinbase.com, Coinbase mobile apps, and Coinbase Pro. During these outages, many users experienced slow loading times and errors while attempting to access Coinbase, or were unable to use features like buying, selling, and trading through our Retail and Pro websites and apps. The Exchange itself was not materially impacted. This post is intended to describe what occurred and the causes, and to discuss how we plan to avoid such problems in the future.

We’re continuing to learn more about these events, and will continue to update this post with additional details that may be of interest.

How To Get Free Crypto  

The Incident

On the morning of October 27th PT, we experienced a significant increase in traffic. As traffic increased, our engineers were alerted about elevated error rates appearing across a number of services.

The following functionality was affected:

  • Logged-out experience: users that were not logged in experienced errors when visiting coinbase.com or our mobile apps.
  • Coinbase Pro: users were temporarily unable to log in to Coinbase Pro.
  • Transfers: There was a higher rate of cancelled and refunded transfers during this time, as well as delays in processing on-chain money movements. Users may have been unable to see their latest transfer history.

Root Cause Analysis

These issues were caused by two separate but related outages. Both were triggered by system bottlenecks caused by the elevated traffic.

Traffic to Coinbase — 10/27/2021

In the first outage, we observed traffic patterns that were several times greater than previous peaks. This increase in traffic began to overload a datastore responsible for our rewards functionality. As latency increased on this database, related services became saturated and started to deplete resources as well. This resulted in a chain of failures and a more widespread outage.

Query capacity to key database cluster

The second outage was also triggered by a spike in traffic levels. In the early afternoon, engineers were alerted that our payment processing was being similarly overloaded. Unfortunately, an automated maintenance event that was already underway slowed our ability to scale this cluster up to meet with demand, and a set of failures similar to those that occurred during the first outage followed.

Coinbase Banner  
Elevated query latency for Payments cluster

In this instance, the servers that power our logged-out experience were also affected. As these servers became overwhelmed, they were unable to serve new traffic and were ultimately marked by our load balancer as unhealthy and removed from its pool, causing coinbase.com to become unavailable to users who were logged out or who were attempting to log in. Other impacted functionality included the ability to buy, sell, and trade in both Coinbase’s retail application as well as Coinbase Pro.

At 2:32pm PT, our services returned to normal operation.

Resolution & Improvements

For the first outage, once the caching changes were deployed, the rewards database was scaled up, and additional replicas became available. Afterwards, our system was able to resume normal operation.

To resolve the second outage, we upgraded the under-capacity payments cluster to a larger instance size and introduced additional read-only replicas.

To prevent similar issues in the future, we are taking several additional actions:

  1. Reorganizing our largest services: we will continue to shard and isolate our largest services to avoid hitting limits like those mentioned previously.
  2. Enhanced load testing: we’re enhancing our load testing framework to be more representative of new traffic patterns that we saw during this event.
  3. Additional scaling: we are further scaling several of our databases that we observed operating close to limits at Wednesday’s elevated traffic levels.

We take the uptime and performance of our infrastructure very seriously, and we’re working hard to support the millions of customers that choose Coinbase to manage their cryptocurrency. If you’re interested in solving scaling challenges like those presented here, come work with us.

FreeBitcoin Banner  

Incident Post Mortem: October 27, 2021 was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Go to Source
Author: Coinbase


Recommended Crypto Services, Products and Strategies:

The first thing any crypto investor needs is is a reliable and secure Crypto Wallet.  Whether you’re looking for an online wallet, hardware wallet, desktop or mobile wallet, Crypto Renegade provides you with all the Best Crypto Wallets in each category.

Best Crypto Wallets Banner

When you’re ready to buy more crypto, or exchange your coins for others, Crypto Renegade’s list of the Best Crypto Exchanges has you covered.  The Crypto Exchanges recommended here offer everything from simplicity and convenience to advanced trading platforms and profit sharing. 

Best Crypto Exchanges Banner

If you want to learn more about the methods and tools that can be used to find Great Crypto Projects, then be sure to check out Crypto Renegade’s strategy for How To Find The Best Cryptocurrency.

Crypto Strategy Banner

For those people that don’t have any money to invest right now, or just want to understand the technology a bit more, you’ll definitely want to check out Crypto Renegade’s Free Crypto Strategy and start collecting Free Coins today!

Free Crypto Banner

What do you think about cryptocurrency? Do you have any questions about it? Be sure to leave a comment below.

This site uses Akismet to reduce spam. Learn how your comment data is processed.