We all know that data growth is exploding in every data center around the world. This isn’t news. Everyone loves to show charts with how big data is, how much it’s growing, and how much bigger it will be in 5 years.
I’m not going to show you a chart.
What I do want to talk about is the challenges this growth poses to your business. Your storage budget is not growing at the same exponential rates as your applications generate data.
Despite the efforts of your IT to avoid silos, the reality is that you have many locations for data. You have copies of data in places you might not even know about. Cloud hasn’t helped this problem at all – it just means you have more storage in even more places.
Now, let’s talk about backups and archives.
How many of these do you have spread across your data centers? It’s likely that you have something like this:
How many copies is that?
How do you analyze all that data?
How much data do you have on tape backups that you can’t analyze at all?
How long does it take to move all that data around? Long enough that you are starting to suspect that it takes longer to move the data than it does to actually run the analytics against it – particularly if you’re trying to do real time analytics with something like Apache Spark. What if there was a better way?
Start with a central repository.
Store all your data in a highly durable, extremely efficient archive. Of course, you realize that HGST makes the best one? Our vertical innovation gives us industry-leading storage density and the lowest power utilization. It is the best way to get the most bang for your buck in today’s data center.
Get your data all in one place.
There are many ways to consolidate your existing storage footprint of tapes, backup appliances, optical media, old project folders, and all the other places data is hiding away, using backup and archive solutions. Then get all that data into one place: the HGST Active Archive System.
Analyze all your data in one place.
Take your Hadoop or Spark analytics clusters and point them at our simple storage solution, using the s3a connector. This is a fully supported part of the Apache Hadoop project. HGST has been very involved in optimizing this code, and we have done extensive testing with the Active Archive System.
One of the first places we have put our solution into been put into use is right here at HGST. For every device we produce in our manufacturing facilities around the world, we generate a set of test data. Historically, that data sits on local storage in the manufacturing plant, and after a few months it is cycled off to tape.
That data is useful, but it’s sits in a format that makes it expensive to access, and impossible to analyze across large data sets.
The test data is now streaming onto our own HGST Active Archive System, so that our internal engineering and quality teams have immediate access when investigating issues with our devices. All of that historical data can now be analyzed, improving our manufacturing processes, speeding our design work, and increasing overall quality of our products.
Better data, all in one place = better analysis for better products.
I’ll be talking about how to keep your data active at the ISC Cloud & Big Data Conference in Frankfurt, Germany, on September 30th. If you are there, let’s meet up.