ETL best handled by a tool specifically designed to Extract, Transform and Load
Big Data Development Frameworks, like MapReduce, have only been around for 10 years or so; incubating in the open-source community. It comes with good backing from the likes of Google and Yahoo, however, when compared to the extensive support and research done by proprietary ETL Tool vendors, it is still at a nascent stage.
ETL Tool vendors across the board have always been at the cutting edge of BI needs. They are also constantly looking for feedback or “demands” from their customer base. Regular updates/upgrades incorporating these suggestions, along with backward compatibility, ensure continuity for business processes without the need for extensive retooling or relearning for technical and subject matter experts.
ETL tools have generally recognized the need for businesses to connect to HDFS and process Big Data. They are now providing the necessary components to do this within the development environment that developers are familiar with.
The self-documenting nature of tool-based ETL, over script-based Big Data ELT, leads to better or improved ramp-up times for new developers. This fits in well with Human Resourcing Needs where the Hadoop framework still has to achieve a maturity level to be supported by a large pool of developers. As far as learning curves go, developers respond better to visual aids over extensive scripting. Their experience in training warehouse solution specialists has indicated a significantly better throughput with tool-based ETL.
Let me take a leap of faith here and say Metadata Management is better with certain tools. Sure the NameNode does know where the data lies – but as far as a business analyst looking for Data Lineage or a batch support analyst looking for CDC, the GUI nature of tools creates a distinct edge in favor of tool-based ETL.
Business rules management
A few ETL tools provide value-added products to the base ETL suite that enable business analysts, data modelers, or even end business users to effectively build the rules that govern how raw data is processed. The immediate and measurable effect this has is primarily seen in the high availability and quality of data. The ability to define and apply rules using the same tools that an ETL developer uses provides for an effective collaborative medium for a project team. This becomes even more important when a BI requirement is run in an agile war room scenario.
The genesis of Hadoop lies in Google (or it’s developers) time and effort. As far as a bottom line, driven corporations are concerned – why make the investment? Simply because there was nothing there in the market to tackle the volume of data that they were dealing with at the time.
ETL are biggies gearing up to handle big data and offering services that Hadoop is not (like metadata management). This convert to ETL from distributed applications still thinks the future is challenging enough for ETL Tools to continue to excite the IT specialist and business analyst. The key to recognizing whether or not your ETL tool is relevant is focusing on the value the ETL tool brings to the data.
Bitwise has built a development platform on top of Hadoop named BHS. It is a developer and maintenance-friendly ELT solution which addresses some of the developer concerns related to maintainability and the ability to ramp up new teams. It provides an abstraction layer on Hadoop and helps a normal ETL developer to build ETLs on Hadoop without any understanding of Big Data technologies.
Talk to us about how BHS can add value to your Big Data ETL.