Mobox/Services/Big Data/Data Engineering

03 / DATA-ENGINEERING

Big Data

Data EngineeringBig DataPipelines that handle petabytes and teams that keep them running.

Pipelines that handle petabytes and teams that keep them running.

We design and build modern data platforms: batch and streaming ingestion, lakehouse, data warehouse, orchestration, data quality. Open-source and cloud-native stacks, predictable cost.

−60%

Storage cost vs legacy DWH

100%

Pipeline test coverage

<5 min

Typical streaming latency

§ A

Overview

An effective data platform isn't a stack of tools: it's a coherent architecture that separates storage from compute, handles schema evolution, supports time travel and structured/semi/unstructured data through the same interface.

We work with modern lakehouse architectures (Delta Lake, Iceberg, Hudi) on any cloud, with open formats that avoid vendor lock-in. Declarative, tested, monitored pipelines with predictable costs.

§ B

What's included

  • Source discovery and target architecture definition
  • Ingestion from databases (CDC), APIs, files, events (Kafka, Kinesis)
  • Storage layer on lakehouse or warehouse (Snowflake, BigQuery, Databricks)
  • Transformations with dbt, Spark, native SQL
  • Orchestration (Airflow, Dagster, Prefect)
  • Data quality (Great Expectations, dbt tests, Soda)
  • Catalog, lineage, discovery (DataHub, Unity Catalog)
  • FinOps: storage and compute cost optimisation

§ C

Deliverables

What you get at the end — or along the way — of an engagement on Data Engineering.

  1. D/01Documented target architecture
  2. D/02Reproducible IaC pipelines
  3. D/03Versioned gold/silver/bronze data models
  4. D/04Quality tests and monitoring dashboard
  5. D/05Documentation and training

§ D

Use cases

Legacy migration

From on-prem SQL Server / Oracle to cloud lakehouse with CDC and zero downtime.

Customer 360

Unification of customer data from CRM, e-commerce, support and marketing into a dimensional model.

Real-time analytics

Streaming pipelines for operational use cases (fraud, IoT, logistics).

Data mesh

Data decentralisation for product teams with federated governance.

§ E

Our process

01

Discovery

Source inventory, use cases, freshness and quality requirements.
02

Architecture

Stack choice, logical model, cost strategy.
03

Foundation

Environment setup, CI/CD, first end-to-end data domain.
04

Scale-out

Onboarding new domains with templates and self-service.
05

Operations

Data SRE: monitoring, optimisation, evolution.

§ F

Technologies

Snowflake · BigQuery · DatabricksDelta Lake · Apache Icebergdbt · Apache SparkAirflow · Dagster · PrefectKafka · Kinesis · Pub/SubFivetran · Airbyte · Debezium

Indicative stack. We adapt choices to your context, internal skills and existing constraints.

§ G

Frequently asked questions

Q/01Which cloud do you recommend?+

The one you already operate on, unless there's a strong reason. We prefer open formats (Iceberg, Delta) that let you switch provider without redoing everything.

Q/02How much does a data platform cost?+

Initial setup €50–150k. Cloud run-rate from €2k to €50k+/month depending on volumes. We work to keep it predictable.

Q/03Can I keep using Power BI / Tableau?+

Yes, any BI tool connects to the lakehouse via standard SQL.

Next step

Let's talk about data engineering.

A 30-minute call to understand your context and whether we can really help. No commitment.