Bluespark Solutions — Data Engineering Consultancy

About

Data platforms
that people
actually trust.

Bluespark Solutions is a data engineering consultancy founded by Sriram Murali, specialising in modern cloud data platforms. We design and build the infrastructure that turns raw, messy data into governed, analytics-ready systems.

Our work spans the full data lifecycle: ingestion architecture, transformation pipelines, data quality engineering, semantic modelling, and executive-facing dashboards. We work on Microsoft Fabric, Azure, and Databricks, applying industry-standard patterns like Medallion Architecture and Delta Lake wherever reliability and scale are non-negotiable.

We build with the rigour of a production engineering team. Every pipeline is version-controlled, every data quality failure is traced, and every platform is documented for the engineers who maintain it after we leave.

🏗️

Lakehouse Architecture

Medallion Bronze / Silver / Gold on OneLake and ADLS Gen2

Core

⚡

Distributed Processing

PySpark and Spark SQL for batch and streaming at scale

Scale

🛡️

Data Quality Engineering

Dead Letter Queue, validation layers, full lineage tracking

Core

🔄

Pipeline Orchestration

Scheduled, validated, zero-intervention Data Factory pipelines

Ops

📊

Semantic Modelling

Star schema, governed Power BI models, executive dashboards

🔔

Real-Time Alerting

IoT integration and anomaly detection via Azure Logic Apps

IoT

Case Studies

Work that shipped
to production.

These are not prototypes or demos. Each project below was designed, built, version-controlled, and documented to enterprise standards. The kind of platform you hand off to a team and it keeps running.

Databricks · dbt · B2B Sales Analytics

Databricks dbt Delta Lake Unity Catalog AI/BI Dashboard

B2B Sales Pipeline Analytics

"We have data on every deal, every rep, every activity. But we can't answer the one question that matters: where exactly are deals dying, and why?"

An end-to-end production pipeline built entirely on Databricks. Raw CRM data lands in Bronze as immutable Delta tables. dbt governs every transformation through Silver into five analytical Gold mart tables, with 26/26 tests enforced before any model is production-ready. A Databricks Workflow runs the full stack daily. A live AI/BI dashboard queries Gold directly across 5,000 deals and 43,160 activities.

5,000

deals analysed

43,160

activity events

26/26

dbt tests passing

3m 28s

full pipeline run

View on GitHub

AI/BI Dashboard1 / 5

Pipeline Funnel

Sales Rep Performance

Deal Velocity

Lost Deal Analysis

Activity Effectiveness

dbt Lineage1 / 6

mart_pipeline_funnel

mart_sales_rep_performance

mart_activity_effectiveness

mart_deal_velocity

mart_lost_deal_analysis

int_deal_stage_transitions

Technology stack

Databricks + Unity Catalog

dbt (Databricks adapter)

Delta Lake + Medallion Architecture

Databricks Workflows + AI/BI Dashboards

Python + SQL + Git

Databricks · Regulated Finance

Databricks Regulated Data Enterprise Scale

Customer Risk and Loss ETL Platform

"We can't trust our loss reports. Manual reconciliation happens every cycle before anyone will sign them off."

Built for a client in a regulated environment, this ETL platform processes millions of financial loss records daily using Databricks and Spark SQL. The Silver Validated layer enforces strict quality gates: every record carries a validation status, rejection code, confidence score, and hash-based row ID for complete traceability. An average of 5 to 8 percent of daily records are flagged and reported, giving business and compliance teams full visibility into upstream data quality rather than discovering problems at reporting time.

300M+

historical records

2TB+

data across layers

1 to 3M

daily records

Azure · IoT · Facilities

Azure Data Factory IoT Integration Near Real-Time

HVAC Climate Intelligence Pipeline

"We have sensors everywhere but our indoor and outdoor datasets live in different systems. We can't see the full picture."

An Azure-native platform that unifies IoT sensor data from 120 to 180 building sensors with live external weather feeds, two streams that were previously analysed in isolation. The Gold layer joins them on aligned timestamps, computing temperature and humidity deltas for operational reporting. Azure Logic Apps delivers near real-time anomaly alerts to facilities teams when HVAC conditions breach defined thresholds, enabling intervention before occupant comfort is affected.

450K

daily records max

10 to 15m

end-to-end latency

18 mo

data retention

Microsoft Fabric · E-Commerce

Microsoft Fabric Power BI Medallion

E-Commerce Analytics Platform on Microsoft Fabric

"Our pipeline runs fine until it doesn't. When it breaks, we spend days figuring out which records were lost and why."

A complete end-to-end analytics platform on Microsoft Fabric that automates the journey from raw CSV files to a governed Power BI dashboard. The Silver layer uses a Dead Letter Queue pattern: instead of silently dropping invalid records, every failure is quarantined with full error metadata and remains reprocessable without touching the main pipeline. Three PySpark notebooks chain into a single Data Factory Pipeline on a zero-intervention schedule. Workspace lineage is tracked across all 42 artifacts.

39,507

rows ingested

artifacts governed

DLQ records traced

5m 30s

pipeline run

In the Lab

More from our
engineering work.

Real-Time Streaming Orders Pipeline

⚡

A production-style streaming pipeline using Apache Spark Structured Streaming and Delta Lake on Databricks. Processes continuously arriving order events through Bronze, Silver, and Gold layers. Invalid records are routed to a quarantine table. The Gold layer produces RFM segmentation, revenue aggregations, and return rate KPIs ready for BI consumption.

Databricks Structured Streaming Delta Lake PySpark

Data Warehouse with Dimensional Modelling

🏗️

A batch data warehouse built with Spark SQL and Delta Lake, featuring SCD Type 2 customer tracking, star schema design, and full audit fields on every layer. Every SQL script is atomic and modular, written for Airflow and dbt compatibility. The validated Gold layer serves KPIs for monthly revenue, top customers, and brand performance.

Spark SQL SCD Type 2 Delta Lake Star Schema

We turn
unreliable data
into trusted decisions.

Data platforms
that people
actually trust.

Work that shipped
to production.

B2B Sales Pipeline Analytics

Customer Risk and Loss ETL Platform

HVAC Climate Intelligence Pipeline

E-Commerce Analytics Platform on Microsoft Fabric

More from our
engineering work.

Real-Time Streaming Orders Pipeline

Data Warehouse with Dimensional Modelling

Ready to build a platform
your team can rely on?

We turn unreliable data into trusted decisions.

Data platformsthat peopleactually trust.

Work that shippedto production.

B2B Sales Pipeline Analytics

Customer Risk and Loss ETL Platform

HVAC Climate Intelligence Pipeline

E-Commerce Analytics Platform on Microsoft Fabric

More from ourengineering work.

Real-Time Streaming Orders Pipeline

Data Warehouse with Dimensional Modelling

Ready to build a platformyour team can rely on?

We turn
unreliable data
into trusted decisions.

Data platforms
that people
actually trust.

Work that shipped
to production.

More from our
engineering work.

Ready to build a platform
your team can rely on?