Efficiently Process Big Data with GUI-based ETL Tool

Hydrograph, a next-gen data integration tool, addresses a need for ETL functionality on Hadoop in enterprises with big data workloads without the need to write MapReduce/Spark code.

Built with practitioners that understand the pains of offloading ETL on Hadoop/Big Data, Hydrograph is engineered to accelerate ETL development in the big data ecosystem.


Hydrograph Enterprise Version

  • UI - Desktop based UI for creating ETL jobs graphically by connecting the components with links. Job is saved as XML.
  • Execution Service - Core pluggable service to convert XMLs into underlying Execution Engine jobs.
  • MR Flow - Generates MR jobs to run on local or Hadoop cluster. In built performance tuning.
  • Spark Batch and Streaming Flow – Generates Spark Batch/Streaming jobs depending on underlying execution engine selected.
  • Glue Flow – Generates AWS Glue job which is a fully managed ETL service on AWS used for analytics.
  • Hadoop - HDFS  and MapReduce – Scalable, High availabililty, cost efficient and distributed processing platform.
Key Benefits
  • Easy-to-use development environment that requires minimal retraining for ETL developers.
  • Harness the processing power of Big Data (up to 33% faster execution time in Spark compared to legacy ETL tools).
  • Ability to offload legacy Teradata, Netezza or Big Data Appliance ETLs to modern Big Data ecosystem.
  • Ability to deliver new use case applications in an agile environment.
  • Ability to create scalable applications that meet today’s technology requirements but flexible for future growth.
Key Differentiators
  • Bitwise partnered with business users to design a tool that would help them pivot from a traditional ‘closed’ system for ETL development to a more modern data processing environment.
  • Hydrograph helps enterprises bridge gaps between the ETL tools their developers are familiar with and Hadoop/Spark for meeting critical reporting and analytical requirements.
  • Hydrograph is available in both On-Premise and Cloud platforms (AWS, GCP and Azure).
  • Hydrograph Enterprise Version Integrates with Hadoop Adaptor for Mainframe Data to convert EBCDIC to ASCII data.
Hydrograph ushers in a new era of Hadoop and Spark adoption for enterprises that are implementing an open source big data strategy by facilitating the migration of complex data integration jobs to more flexible and future-proof environments

Open Source


Now Available on GitHub


From the Bit Blog

Traditional ETL vs ELT on Hadoop

Read More

The advent of Hadoop has taken enterprises by storm. The majority of enterprises today have one or more Hadoop cluster at various stages of maturity within their organization. Enterprises are trying to cut down on infrastructure and licensing costs by offloading storage and processing to Hadoop.

Open Source / Enterprise Version Comparison

Here are a few key features of Hydrograph. For a complete feature list, contact sales@bitwiseglobal.com.

Category Open Source Enterprise Version (includes all features of Open Source version)
Input/Output Components
  • File I/O: Delimited, Fixed Width, Mixed Scheme, XML, Avro, Parquet
  • Hive I/O: Text, Parquet
  • RDBMS:  Oracle, Teradata, MySQL, Redshift
  • Semi-Structured Files: EBCDIC, JSON, Regex Input, Excel files
  • Hive I/O: RC, ORC, Avro, Sequence
  • RDBMS: MS SQL Server, Generic JDBC I/O, IBM DB2 I/O, Netezza I/O,
  • Kafka I/O
  • Cloud Data Warehouse: Snowflake I/O, BigQuery, AWS Glue, Cloud SQL, PubSub
  • NoSQL - MongoDB I/O, Hbase I/O, Cassandra I/O
Execution Engine
  • Spark Batch, MapReduce
  • Spark Streaming, AWS Glue
  • Subjobs and parameterization for creating reusable jobs
  • Execution Tracking visuals
  • Generic Jobs – By externalization of transform logic
  • Problems View
  • Grid view for displaying logs
  • Unit Testing Framework
  • Schema Import Wizard
Transformation Capabilities
  • Date, String, Numeric Functions for transformations
  • Standard ETL transformation components
  • Support for encryption/decryption, hashing, geospatial functions
  • REST and SOAP web service component
  • Conditional schema while reading files
  • Vector Functions
  • CDC (Change Data Capture) component

Accelerate ETL in modern, open stack technology environments