Confiz Logo
Fortune 500 retail client

Modernizing a Unified Data Pipeline (UDP) to enhance connectivity

June 30, 2022

Project Overview

Modernized in-store systems built on Cloud-based microservices architecture

Confiz partners with America’s largest retail chain to migrate in-store legacy services to a cloud-based microservices architecture for their pharmaceutical division.

The need

A Unified Data Pipeline (UDP) for seamless job execution

With the ambition to swiftly establish a UDP, one of our Fortune 500 clients which faced an internal capacity limitation sought a team of certified data professionals and software engineers to augment their team’s capacity, and develop out-of-the-box data source-and-sink connectors for key data systems. Revamping pre-built connectors and at the same time building new ones would fulfill the client’s vision to create a unified platform for all users that would assist in seamlessly manipulating batch data, assessing data lineage, and visualizing information through interactive portals for successful job execution.

The Solution

Modernizing the UDP by building and revamping data connectors

In response, Team Confiz embarked on the journey to enhance the pre-existing client UDP framework through Apache Beam and Spark. As part of modernizing the data architecture, a wide set of ingress and egress connectors were developed and revamped to support the data flow from source to destination. Additionally, to create an end-to-end solution, our team introduced special features including monitoring and alerting, and auditing capabilities to adhere to data quality, meta-data and lineage, and governance protocols. Meta-data and lineage focused on enabling data analysts to visually understand data provenance so that they could gain key data insights which are imperative to big data job execution.

To allow users to read directly from complex data files, and migrate data into any product-supported data type, Avro was used as a data communication tool. UI dashboards were also made available to get holistic insights through an interactive portal. In addition, UDP regression suite certifications were exercised to validate features as reliable before they could be published. Costs were saved by using automated connectors/use cases when applicable, hence minimizing manual testing before each release.

What we did

Apache BeamSparkbig dataData analytics

The Outcome

Enhanced agility and data connectivity

Empowered decision making

Enhanced decision making through highly personalized dashboards for the management layer, helping them take crucial decisions around costing and budgeting.

Use case automation

90% of use cases automated to save time in manual testing and ensure continuity of business functions in case of rapid changes in the UDP.

Faster job completion

20% increase in the jobs completed on time through automated system efficiency checkpoints, positively contributing to scalability and cost-effectiveness.

Improved code quality

Improved code quality by increasing code unit test coverage by more than 80%

See our strategy in action!