Capture

A Blockbuster SQL

I’m not a fan of Black Friday. Lines, crowds, chaos. In fact this year, I avoided it all, played golf (poorly), then did most of my holiday shopping online in less than 4 hours. It seems I’m not alone.

This year, Black Friday and Cyber Monday set new records for online shopping with sales over $5 billion 1. While this is small potatoes compared to the $9.3 billion2 of November 11 “Singles Day” sales generated by Taobao (Alibaba’s online shopping site), it illustrates a significant shift in how we shop.
Just for fun, I looked up a few Ecommerce statistics at www.selz.com3. :

  • Ecommerce is expected to grow 20% to $1.5 trillion in 2014
  • 80% of the Internet population purchased online in 2014
  • There are 191.1 million US online buyers
  • Yet, only 28% of US small businesses are selling their products online

Clouds with a SQL Lining

One of the things that online retailers like Amazon.com, eBay, Zappos, Walmart, China’s Alibaba/Taobao, and India’s largest online retailer FlipKart have in common is the use of MySQL database technology. Oracle calls it “the world’s most popular open source database technology”. In fact, as of March 2014, MySQL had 16 millionmysql installations which grew the entire operational database market by almost 50%4.

MySQL, and its variants power these online retailer environments thanks to a scalable architecture that’s perfect for online shops, recommendations engines, content management, billing applications, data warehousing and analytics.

So what does this have to do with HGST, you may ask? Isn’t all of the traffic that visits online retailers just transactional web server activity? Why should we care?

Let me try to explain without going all geek here.

 

Sharding – the Secret Sauce to Scaling

MySQL and many other types of scale-out databases use a technique called ‘sharding’. Sharding is done by partitioning a database horizontally such that rows of a database table are held separately on different physical servers. MySQL then keeps track of the shards across the physical servers as it ‘scales out’. Sharding allows for performance and growth as you simply buy more servers as you increase the size of the database.

Let’s dig a little deeper.

A traditional database will keep all information in a table like this:

 

database1
Traditional database table

 

Obviously, this is a tiny table, and any modern database can easily handle this workload, but what happens when you are Alibaba’s Taobao and have 8 million merchants 5 and more than 618 million6 users ? This is where sharding comes in.

The illustration below shows how sharding removes load from a single database server.

 

How sharding removes load from a single database server to multiple servers

 

And it keeps growing, and growing, to hundreds or thousands of servers. Harnessing the power of all of these servers is what makes MySQL special. You can build the world’s largest web infrastructures with commodity hardware that might have previously required a supercomputer.

Not all Smooth SQLing

As you scale out with lots of servers, suddenly new problems appear like hardware failures and data protection.

To address the data protection issues, a typical MySQL deployment uses Master/Slave replication. Here the Master server handles the sharding and serves write traffic, while an identical twin server, the Slave, is used as an asynchronous replication target that can be invoked when the Master fails. Additionally, Slaves are used to handle read traffic to balance overall performance.

The key take away is that most sharded MySQL servers are deployed in Master/Slave pairs. As the shards grow, the total number of servers grows by 2x. When you consider the exponential nature of today’s data and ecommerce growth, we might have to question if this model can be sustainable.

Of course there are solutions to make MySQL deployment extremely sustainable. You might even say there are pools of solutions. Stay tuned for my next blog on 12/16/2014 as I elaborate on our new hardware/software solution called HGST Flash Pools for MySQL. Flash Pools utilizes HGST Virident Space software and FlashMAX SSDs for clustering and volume management when scaling out MySQL environments.

Until then, happy shopping to you!


 

[1] http://fortune.com/2014/11/10/ecommerce-predictions-adobe/

[2] http://www.reuters.com/article/2014/11/11/us-china-singles-day-idUSKCN0IV0BD20141111

[3] https://selz.com/blog/40-online-shopping-ecommerce-statistics-know/

[4] https://www.scalebase.com/the-state-of-the-open-source-database-market-mysql-leads-the-way/

[5] http://blogs.wsj.com/digits/2014/05/06/quick-hit-on-alibabas-taobaotmall/

[6] http://thenextweb.com/asia/2014/05/06/alibaba-files-ipo-618-million-users-5-55-billion-annual-revenue/

 

About Walter Hinton

avatarWalter Hinton is Sr. Global Director for Marketing at Western Digital Corporation. He brings over 25 years of experience in storage, networking including roles in Product Management at McDATA, Chief Strategist for StorageTek and CTO of ManagedStorage International.