The Key to Big Data Analytics: Flash Storage
Big Data Analytics Are in Major Demand
Big data analytics is in big demand. It appears that every organization is looking for new ways to gain insights from the vast amount of data that is available. In fact, according to a survey by Accenture, 87% of organizations expect big data to shift the competitive landscape in their particular industries within 3 years.
Big data analytics is being used by oil and gas organizations, such as Royal Dutch Shell, to improve drilling production and delivery. It is helping governments like Indiana to reduce infant mortality rates. Big data delivers dramatic improvements in healthcare. Pharmaceutical companies are revolutionizing their R&D.
The Data Dilemma
Despite the drive to gain insights from data, Forrester estimates that organizations only analyze 12% of the data they already have. There are multiple reasons for this, including a shortage of data analysts, a lack of awareness about what analytics can do, or not possessing the compute resources needed to analyze all of the data we currently have.
The amount of data that organizations have to analyze is staggering. I am sure you have seen the statistic, 90% of the world’s data has been generated in just the last 2 years. The problem with that statistic is that it does not address the velocity of data growth “today”. And if estimates of the internet of things (IoT) are even close, new data will skyrocket. Gartner suggests that there will be 6.4 billion connected devices in 2016 and that will grow to 20.8 billion by 2020.
The challenge of working with this much data is probably best summed up by the big data initiatives of Coca-Cola. Their goal was to improve demand forecasting across their 5 major production facilities and 47 distribution centers. They produce about 18,000 cases of beverages every hour. With all of the data that they had to analyze, batch processing was taking hours longer than what was acceptable.
To combat the storage bottleneck, some organizations turn to massive amounts of RAM. The problem with this solution is that servers could only hold a limited amount of RAM. The RAM size limitation then is overcome by implementing large server farms. As you can imagine, this is an incredibly expensive proposition and out of reach of most organizations.
This is where flash storage & analytics can become your big data hero. The storage challenge of big data has been to quickly analyze the vast amounts of data that big data environments create, thereafter to automate workflows based on this analysis and then finally engage with clients and customers in a timely fashion in order to realize the benefit from the analysis performed. Flash is an enabler of this process to ensure businesses can act and more importantly respond quickly to their client’s needs. As mentioned in my prior blog, MIT recently completed a test that found that 20 servers using 20TB of flash were just as fast as 40 servers using 10TB of RAM. The flash solution was also far cheaper and consumed less power. That is why Coca-Cola ultimately implemented a flash solution.
We are not suggesting that your project will require that kind of horsepower. After all, MIT was working with a 10TB dataset, which is well beyond most big data analytics projects today. It does demonstrate that you have options. The current needs of most organizations are much more modest than that.
It is recommended that you start by implementing a flash layer. IBM has repeatedly demonstrated that by installing a flash layer that is only 5% of the total storage environment can improve performance by 3x. This is accomplished by keeping frequently accessed data in the high speed flash storage.
This is just one of the many ways to provide additional business value to your organization. There are additional benefits to implementing flash storage beyond your big data analytics projects, which will be reviewed in this blog series.