Proofing for the technology stack underwent a couple of iterations. At the end of these evaluations, Bitwise’s and Capital One’s architecture team decided to develop a custom solution, HYDROGRAPH, to overcome the limitations identified in all the available options.
The frameworks were carefully selected to minimize the development time and at the same time deliver a robust product fulfilling the needs of an ETL developer. Hydrograph’s execution service is built on top of Cascading with an UI built as an eclipse plugin project.
We designed Hydrograph’s architecture to be decoupled. The development environment, user interface, talks to the execution service using well-defined XML interfaces. The execution service consumes the XML’s generated by the user interface and generates cascading flows to run on Hadoop. This decoupled architecture provides great amount of flexibility; for example, changing the compute fabric from MR to Tez in Hydrograph is a matter of configuration.
|File Delimited||File Delimited||Cumulate||Limit|
|File Fixed Width||File Fixed Width||Filter||Remove Dups|
|File Mixed Scheme||File Mixed Scheme||Join||Sort|
|File Parquet||File Parquet||Lookup||Union All|
|Hive Text File||Hive Text File||Normalize|
|Hive Parquet File||Hive Parquet File||Transform|
Following features are currently work in progress:
Hydrograph will be available as an open sourced project under the Capital One organization on Github later this year.
Want to learn more?Let's talk about how Bitwise can empower your organization with a better approach to data and analytics.