Enterprises are feeling the pressure that they should be doing “something” with Big Data. There are a few organizations that have figured it out and are creating breakthrough insights. However, there’s a much larger set that have may be reached the stage of installing say 10 Hadoop nodes and are wondering “now what?”
Per Gartner, this is the phase where excitement over the latest technology gives way to confusion or ambiguity – referred to as the “Trough of Disillusionment.”
One needs to realize that for any “cool” use case to generate eventual ROI, it is very important to focus on Big Data Integration (i.e. Access, Preparation, and Availability of the data – see firms must not overlook importance of big data integration). Doing so, essentially will empower the enterprises to implement ANY use case that makes the most sense to their particular business.
“Data Democracy” should be the focus. This focus also helps address the technology challenge of handling ever-growing enterprise data efficiently and leverage the scalable and cost effective nature of these technologies – and an instant ROI!
Once this is understood, the next step is to figure out a way to introduce the use of these new technologies to achieve the above goals and doing so in the least disruptive and most cost effective way. In fact, enterprises are looking at ETL as a standard use case for Big Data technologies like Hadoop. Using Hadoop as a Data Integration or ETL platform requires developing Data Integration applications using programming languages such as Map Reduce. This presents a new challenge in combining of Java skillsets with expertise of ETL design and implementation. Most ETL designers do not have Java skills as they are used to working in a tool environment and most Java developers do not have experience in handling large volumes of data resulting in massive overheads of training, maintaining and “firefighting“ coding issues. This can cause massive delays and soak up valuable resources for organizations to solve half the problem.
Moreover, while making the investments in the form of hardware and skillsets like Map Reduce, when the underlying technology platforms inevitably would advance, development teams would be forced to rewrite the application to leverage these advancements.
Yes it is. One of the key criteria for any data integration development environment on Hadoop is code abstraction to allow users to specify the data integration logic as a series of transformations chained together in a directed acyclic graph that models how users think about data movement making it significantly simpler to comprehend and change than a series of Map Reduce scripts.
Another important feature to look out for is technology insulation – provisions in design to change the run-time environments such as Hadoop with any future technologies prevalent at that time.
The “3 V’s” in Big Data implementations are well defined – Volume, Variety, and Velocity – and relatively quantifiable. We should begin to define a 4th ‘V’, for “Value.” The fourth is equally important, or more important in some cases, but less tangible and less quantifiable.
Having said that, jumping off a diving board into a pool of Big Data doesn’t have to be a lonely job. The recommended approach would be to seek help from Big Data experts like BitWise to assess whether you really need Big Data. If yes, what business areas will you target for the first use case, which DI platform will you use? And lastly, how will you calculate ROI of the Big Data initiative?
Hemant heads the leading edge Research and Development Team at Bitwise which is involved in two key offerings - working on a proprietary offering on Big Data using Hadoop Technology Stack and consulting large Fortune enterprises in deploying Big Data in their IT environment