Bitwisers are an elite team of technology professionals who share a passion for hard work, client service and continuous improvement.

Hydrograph

Helps moving and processing loads of Big Data efficiently

Hydrograph is a next generation ETL tool that does not just extract, transform and load data into the new big data platform, but also fits seamlessly into the existing enterprise architecture while automating the entire ETL offload process to Hadoop. Bitwise partnered with Capital One and came up with a phased approach to proof and validate a viable solution in the big data landscape.
Hydrograph

Hydrograph

Proofing for the technology stack underwent a couple of iterations. At the end of these evaluations, Bitwise’s and Capital One’s architecture team decided to develop a custom solution, HYDROGRAPH, to overcome the limitations identified in all the available options.

The frameworks were carefully selected to minimize the development time and at the same time deliver a robust product fulfilling the needs of an ETL developer. Hydrograph’s execution service is built on top of Cascading with an UI built as an eclipse plugin project.

We designed Hydrograph’s architecture to be decoupled. The development environment, user interface, talks to the execution service using well-defined XML interfaces. The execution service consumes the XML’s generated by the user interface and generates cascading flows to run on Hadoop. This decoupled architecture provides great amount of flexibility; for example, changing the compute fabric from MR to Tez in Hydrograph is a matter of configuration.

High level product architecture

Hydrograoh


Current feature list

  • Fully functional drag and drop UI with Job canvas, Component palette, Output console, Project explorer
  • Out of the box support for MR and Tez
  • Full featured components with component properties, validations
  • Out of the box integration with Git based source control tools as well as integration with build tools like Maven and Gradle
  • Local / remote job run functionality with basic debugging facilities
  • Parameterization, external schemas, ability to plug custom java code in data pipeline
  • Multi-platform support (Windows + Mac)

Current list of components

Input Output Transform Straight Pull
Generate records Discard Aggregate Clone
File Delimited File Delimited Cumulate Limit
File Fixed Width File Fixed Width Filter Remove Dups
File Mixed Scheme File Mixed Scheme Join Sort
File Parquet File Parquet Lookup Union All
Hive Text File Hive Text File Normalize
Hive Parquet File Hive Parquet File Transform
Unique Sequence

Get Started with Hydrograph

The open source version of Hydrograph is available on GitHub through Capital One.

Access Hydrograph Now!


 

Share this page

Want to learn more?

Let's talk about how Bitwise can empower your organization with a better approach to data and analytics.

TOP