For most decision makers in large enterprises these days, “Big Data” is one of the leading topics of the executive agenda and a key driver for technology investments. However, the adoption of the Big Data technology in any given environment continues to remain a big challenge. The multitude of definitions are adding complexity to the discussion – such as the 3V definition from Gartner, and available open-source technologies (and vendors/service providers) – such as the Apache Hadoop Framework or the NoSQL databases or Big Data platform providers.
Defining this Trough of Disillusionment
Enterprises are feeling the pressure that they should be doing “something” with Big Data. There are a few organizations that have figured it out and are creating breakthrough insights. However, there’s a much larger set that have may be reached the stage of installing say 10 Hadoop nodes and are wondering “now what?”
Per Gartner, this is the phase where excitement over the latest technology gives way to confusion or ambiguity – referred to as the “Trough of Disillusionment.”
Data Democracy – The Foundation for Big Data
Use cases involving Analytics or Data Mining with an integrated Social Media component are being thrown at enterprise executives. These use cases appear “cool” and compelling upfront but upon thorough analysis reveal that they are missing some necessary considerations such as Data/Info Security, Privacy regulations, Data Lineage from an implementation perspective and in addition fail to build a compelling ROI case.
One needs to realize that for any “cool” use case to generate eventual ROI, it is very important to focus on Big Data Integration (i.e. Access, Preparation, and Availability of the data – see firms must not overlook importance of big data integration). Doing so, essentially will empower the enterprises to implement ANY use case that makes the most sense to their particular business.
“Data Democracy” should be the focus. This focus also helps address the technology challenge of handling ever-growing enterprise data efficiently and leverage the scalable and cost effective nature of these technologies – and an instant ROI!
Concept to Realization – Real Issues
Once this is understood, the next step is to figure out a way to introduce the use of these new technologies to achieve the above goals and doing so in the least disruptive and most cost effective way. In fact, enterprises are looking at ETL as a standard use case for Big Data technologies like Hadoop. Using Hadoop as a Data Integration or ETL platform requires developing Data Integration applications using programming languages such as Map Reduce. This presents a new challenge in combining of Java skillsets with expertise of ETL design and implementation. Most ETL designers do not have Java skills as they are used to working in a tool environment and most Java developers do not have experience in handling large volumes of data resulting in massive overheads of training, maintaining and “firefighting“ coding issues. This can cause massive delays and soak up valuable resources for organizations to solve half the problem.
Moreover, while making the investments in the form of hardware and skillsets like Map Reduce, when the underlying technology platforms inevitably would advance, development teams would be forced to rewrite the application to leverage these advancements.
Concept to Realization – a Possibility?
Yes it is. One of the key criteria for any data integration development environment on Hadoop is code abstraction to allow users to specify the data integration logic as a series of transformations chained together in a directed acyclic graph that models how users think about data movement making it significantly simpler to comprehend and change than a series of Map Reduce scripts.
Another important feature to look out for is technology insulation – provisions in design to change the run-time environments such as Hadoop with any future technologies prevalent at that time.
The “3 V’s” in Big Data implementations are well defined – Volume, Variety, and Velocity – and relatively quantifiable. We should begin to define a 4th ‘V’, for “Value.” The fourth is equally important, or more important in some cases, but less tangible and less quantifiable.
Having said that, jumping off a diving board into a pool of Big Data doesn’t have to be a lonely job. The recommended approach would be to seek help from Big Data experts like BitWise to assess whether you really need Big Data. If yes, what business areas will you target for the first use case, which DI platform will you use? And lastly, how will you calculate ROI of the Big Data initiative?