As originally published in Forbes on October 25, 2011.
Pairing Big Data with business intelligence (“BI” or analytics) is revolutionizing the business world in profound ways and at a dizzying pace. In terms of pure volume, the latest IDC Digital Universal Study reports that more than 1.8 zettabytes—10 to the power of 21 bytes—of information will be created and stored in 2011 alone. McKinsey projects 40 percent annual growth in global data beginning this year, compared to a mere five percent annual growth in global information technology spending.
The vast majority of this growth is occurring in the realm of unstructured data, i.e. data that does not fit well into relational tables such as databases. As a result, corporations of all sizes must meet the challenges posed by the growth of data such as electronic mail, video, images, and scientific research (e.g., medical records in various forms), to name a few. McKinsey’s Global Institute notes that “[e]ach second of high-definition video . . . generates more than 2,000 times as many bytes as required to store a single page of text. In a digitized world, consumers going about their day—communicating, browsing, buying, sharing, searching—create their own enormous trails of data.” It is estimated that by 2020, there will be a 42(x) growth in data alone from 2009.
While this data explosion cannot be ignored, the real story lies in how corporations can store, govern, and manage their data in order to leverage it as an asset. In particular, corporations are virtualizing—that is, replicating and recombining—previously siloed structured and unstructured data, then applying business intelligence analytics to extract and gain increasingly more valuable insight from their information. Turning Big Data into actionable data is made possible by at least two key factors: (1) cloud computing, which scales in processing power, thereby allowing (2) the application of business intelligence, which works most efficiently when integrated with the cloud.
In order to understand these issues more clearly, I recently spoke at length with Jack Domme, CEO of Hitachi Data Systems (“HDS”), who shared his expertise on both the data storage market and virtualization, the cornerstone of HDS’s approach to Big Data. By way of introduction, HDS’s data storage systems employ a three-tiered strategy. First, HDS has what it calls an Infrastructure Cloud, which serves as a platform for data center convergence, including integrated data management. Second, it has a Content Cloud to leverage content virtualization, as well as search, discovery, and the repurposing of content of every sort. And third, an Information Cloud where virtualized information is integrated with business intelligence to allow corporations to leverage their data assets to achieve tactical and strategic goals. In the Information Cloud, the rubber hits the road: Big Data meets BI. At this intersection, HDS’s architecture accomplishes at least three critical goals:
- It untethers data from its native applications;
- BI-driven analysis occurs in a cloud-based data warehouse; and
- Memory storage is optimized.
The substance of my conversation with Mr. Domme follows below. My questions are in bold.
Jack, can you give me an overview of the data storage market and how HDS fits into it?
From Hitachi’s perspective, we’re very excited about our success and about Big Data. We see that the amount of data that’s out there being produced, shared, and consumed is growing at rates that we would never have imagined just a few years ago. A lot of the data that is being created is of a different type—unstructured data. That is not your typical structured data that sits in an Oracle database or a spreadsheet. We decided many years ago that IT groups were not really ready for this onslaught of data, which is being created by all kinds of applications. Shared data. Unstructured data in the health and life sciences industry around imagery, lab tests,and CT scans. Unstructured data around electronic discovery documents. This kind of data is growing at ten times the rate of structured data. HDS believes that this amount of data is going to need a lot of scale to handle very large growth. And that growth has a cumulative effect.
The industry focus on Big Data has been about analyzing spreadsheets, P&L, and transactional data. But now there’s also an industry focused on unstructured data. For example, how do I analyze one million CT scans? Lab tests? Voice recordings from a doctor in a different department? Records from another health care provider? The question is how to analyze this data independent of the application in which it was created. And this is a global problem. HDS has the solutions to analyze this data independent of native applications—a massive data warehouse in the cloud where our customers can search all of that data and bring it together in what we call a virtual container. A virtual portal of your information at a scale beyond what has been architected in the past. Previous architectures could not even dream about handling data of this scale.
When you untether data from its native applications, do you have to convert it into a language that’s agnostic with respect to your analytical tools?
Yes, that’s XML—extensible markup language. We convert everything to XML but then you may have to repurpose it on the way back out because XML is just a wrapper around the type of data. All you can do is put an XML header on the data, but you don’t know what’s actually inside of that CT scan or video, and this is where we at Hitachi are bringing in more of our many analytics group—for example, our energy group, where we have deep expertise. We have software that can search for facial recognition and other human patterns. We have a very large medical business that creates medical devices such as CT scans.
So the world now has to ask how we can scan not only XML headers, but also what’s contained in the video. What can we learn from the content of a brain scan or test on a pancreas? Is something really wrong with that CT scan? So we believe that the future is the bridging of Big Data and this type of analytics that can dive into the unstructured data world that is growing much faster than the structured data world. It’s going to take very specific knowledge for someone to say that they can discover all the relevant scans and then analyze what’s inside. This is not a typical search of just XML headers. This will be 100 brain scans that have views that may match your own brain scan in a way that allows doctors to proceed in an informed manner. We believe that when you look at the data and its analysis, there’s an entirely new set of data around that, which is the iterative effect of Big Data. Analysis becomes interpretive, iterative, and decisive as a result of all this data.
How does your acquisition of BlueArc further these goals for Hitachi Data Systems?
As unstructured data grows, we needed a great platform and solution for people to be able to manage the growth of files—PPT, Word, scans, photos, videos, and so forth. We have a content platform that takes those files not only from BlueArc, but from all our content depos, and we index all the actual content. We’re ingesting files from different native applications and allowing you to search independent of the application in which it was created. This allows us to go into serious data mining.
The other aspect is that this data is going to outlive you and me. It will definitely outlive the media on which it is stored and the application that created it. The IT world has to figure out how to govern all this data. In five years, that format or that application may not be around. So in our Content Cloud we’re talking about the governance of data throughout the data’s life, including changes in standards. We make sure that all that data is fresh in the sense that it has been migrated to new, appropriate media and to relevant formats. That’s true cloud governance.
How do you see your competitors responding to this acquisition and how long is the window of the competitive advantage it may give you?
This all started back a few years ago when virtualization became a really hot topic—sharing server resources. On our side of this world, we were concerned about data virtualization. We understand how powerful virtualization is in the server world, but it’s far more powerful in the cloud. We virtualize all of the data and storage behind applications. An application has no idea where data is actually stored. I can now place data anywhere in the cloud where we or our customers feel that it is optimized. This changed everything. If a file hasn’t been accessed in five days, we can move it to less expensive media. And if you don’t touch it again for another 10 days, we can move it again to further optimize resources.
What we’ve done at HDS that differentiates us from everyone else is that we can effectively handle all applications and virtualize all their data in the same cloud. Our competitors can only do this within individual boxes, which doesn’t allow for the same level of analysis. We can also virtualize even across our competitors’ storage systems. So a customer might have an EMC or IBM storage system, and our solutions can virtualize even across the data in those systems. That’s a huge advantage for us.
When you speak of storage optimization, is that a human decision or is it technology-based?
It’s both. We see the actual content of the file. Also, for example, who has accessed the file. Data becomes static quickly. But our customers want to retrieve it. Let me give you an example of how this works in the health care world. You might generate data during a hospital stay—say, 25 files. The hospital can analyze access rates and move files accordingly. So we will dynamically tier the data and move it down to a less expensive media, as we discussed. Then, should you check back into the hospital again a year later, your data, which is still accessible and the content of which is searchable, can be moved up to higher levels as necessary.
Email offers an excellent example of how tiered storage architectures lower costs. Not all email data has the same value. Typically, only relatively recent messages have the high-end requirements associated with top-of-the-line storage systems. As messages age and are archived, access frequency declines and needs change. Storing all email data on high performance, highly available storage is wasteful.
In a tiered storage environment, the IT administrator can establish policies where only new email messages are stored on tier one, high-end, enterprise-class storage. Older email messages can be hosted on tier two storage, offering performance and availability slightly below tier one at a much lower cost.
What’s really interesting is that energy and power consumption is becoming a huge issue for customers. Growing data not only takes up space, it takes up power. So if a customer can take 95% of its data and move it down to optimal levels of storage that require less power, that’s a big win not only for our customers, but also for the planet.
Your income growth rate is outpacing competitors like EMC and IBM and currently is at twice the storage market rate growth. From an earnings perspective, it seems as though IBM, NetApp, and HP are within reach. What’s your take on this?
From my perspective, our industry never thought that an infrastructure play would be talked about in terms of content. When we look at competitors’ earnings, part of what we look at is acquisitions. Does an acquisition automatically bring an integrated system to the market? The answer is obviously no. When you look at our competitors, most of their growth rate last quarter was attributable to acquisitions, yet they haven’t brought an integrated system to market.
We believe that our growth rate over the past two to three years has been mostly organic. And our growth is far outpacing IBM, EMC, HP, and now recently NetApp. The reason for this is that virtualization is a very difficult technology to master. The fact that our competitors cannot virtualize outside the box is a huge advantage for us. If you cannot integrate across systems, you cannot call yourself a truly integrated cloud. Our base foundation around virtualization of Big Data has been tried around the world under the most pressing conditions and across applications and transactional systems. I have an example of a customer that virtualized its infrastructure and didn’t have to buy any storage from us. Because we have an agnostic platform, they analyzed the data stored in our competitors’ systems and saved the costs of any new storage from HDS over a period of four years, which is an effective use of resources. It’s also a perfect example of the manner in which corporations protect the value of their assets. I just don’t think that our competitors have caught up to this and it’s a foundational advantage for us. Customers look for savings. Tested virtualization. Reductions in capitalization and operational spend.
With respect to your comments about earnings from acquisitions, is it ironic that HDS itself announced its acquisition of BlueArc recently.
We have been partners with BlueArc for five years. They were integrated into our strategy. It was almost a no-brainer. We bought them for leverage purposes because they integrated into our system very well. This was a technology decision on a company that we had already made a substantial bet on. For us, it made perfect sense to acquire them because it represented organic growth given our relationship with them. David Hill, an analyst who writes for Network Computing, said that this acquisition reaffirms our storage strategy around a broad-based view of what virtualization can do in the cloud. He’s exactly right.
A lot of acquisitions exist to enhance revenue rather than to integrate solutions. It’s not easy to create an integrated solution unless you already have very close relationships with key partners.
We’ve bet again on the unstructured data world just as we did when we acquired Archivas five years ago. We know that the unstructured data world needs to be governed in a cloud architecture with analytics. You can imagine how excited we are at HDS because now we can manage that unstructured data, index its content, and now with our other global units, we have the unique expertise to be able to analyze our customers’ data and decipher business decisions out of that.
We have a two-fold view of the next five years. We have a very nice infrastructure solution through virtualization. It saves our customers money. We guarantee it. We’re outpacing the infrastructure market by 2(x). On the flip side, we have a largely untapped market built around unstructured data.
HDS is a subsidiary of Hitachi (NYSE: HIT), which has a market cap of $240 billion. How has this relationship helped you compete in this very competitive market?
Hitachi is a really powerful company when you understand what it can bring to the table. Way back when, we at HDS wanted to globalize and we did that very successfully. We have R&D at HDS working closely with its counterpart at Hitachi.
Under the leadership of Mr. [Hiroaki] Nakanishi, Hitachi is globalizing all of the great technologies that we have. We have a vision that goes across Hitachi, and in this information space, HDS can outperform its competitors because they don’t have a medical division; they don’t build MRIs; they don’t build video cameras; they don’t have an energy company; and they haven’t built transportation systems such as the hybrid train for the 2012 London Olympics. We have decades of experience and expertise in all these areas. They don’t have a lot of the core competencies to go out and analyze Big Data. This takes a very specific vocabulary and vernacular and it takes decades to understand. From a vertical perspective, Hitachi has the infrastructure, the data untethered from its native applications, and the analytics. We can outdo our competitors because the game is changing dramatically around Big Data. It’s one thing to say you have nice databases. It’s another thing altogether to be able to use that data to make actionable business decisions.
I’m so excited because under Mr. Nakanishi, Hitachi wants to go to market as a unified company, not just in segments. We are finding a vision that is really bringing the power of all of our groups to bear, and Big Data is a focal point.