Data Analytics and AI

Centralized Enterprise Data Warehouse Build for Modern Data Platform

Centralized Enterprise Data Warehouse Build for Modern Data Platform

One of the largest vertically integrated cannabis companies in U.S. needed to build a modern data platform on the cloud with a centralized Enterprise Data Warehouse to provide accurate and timely data to users and ensure data availability for reporting and BI functions.

Client Challenges and Requirements

  • Build an end-to-end cloud agnostic data platform that is secure, flexible and high performant.
  • Solve current data quality and accuracy challenges.
  • Identify and implement different open source or native tool stacks for cost-effective solutions.
  • Onboard, cleanse and enrich data for companywide consumption.
  • Build a scalable architecture that can be extended to future use cases.

Bitwise Solution

  • Use Talend to replicate data from various SQL Server and Domo sources into a target data warehouse raw layer.
  • Use dbt to transform raw data in data warehouse into the format required by data models. Run automated data quality tests using dbt-expectations to ensure data quality.
  • Use Airflow to schedule the execution of Talend data ingestion pipelines, dbt transformations and dbt models. Trigger automated emails in case of DAG failure.
  • Implement a branching strategy for each environment to facilitate a controlled release process and maintain separate codebases for different stages.
  • Configure Kubernetes container to host Airflow, internal load balancer to expose Airflow UI to schedule. Create separate log workspace for Kubernetes logs.
  • Support migrated data during all testing phases and fix reported issues with minimal turnaround time.

Tools & Technologies We Used

Talend Open Studio
dbt core
Airflow
Azure DevOps
Kubernetes
BCP
Postgres

Key Results

skill-icon

Data gets refreshed every 15 minutes in production to achieve data sync with source

skill-icon

100% data integrity achieved with automated data quality checks of data loads

skill-icon

Improved efficiencies with automated code deployments to higher environments

skill-icon

High scalability, portability and stability of the Airflow scheduler with Kubernetes

Download Case Study

    To get our latest updates subscribe to our Newsletter.

    Ready to start a conversation?