top of page

Brazilian E-Commerce Public Dataset by Olist

Project Overview

​​

This project was developed as an end-to-end Business Intelligence solution with the goal of designing and building a complete data pipeline, from raw data ingestion to dashboard delivery.

Starting from publicly available Olist e-commerce datasets, the objective was to transform a normalized, OLTP-style data structure into a robust OLAP analytical model capable of supporting business reporting. The focus was not only on visualization, but on building a solid, scalable, and reliable data architecture, ensuring data consistency, performance, and maintainability.

The final output is an interactive Power BI dashboard powered by a structured BigQuery backend, enabling analysis of key business metrics such as revenue, order volume, customer behavior, and operational performance.

Design idea

The project design is based on the transformation of a raw OLTP dataset into a star schema (OLAP) optimized for analytical workloads (shown in the image below)

PBI DataModel.jpg

Execution and implementation

The transformation pipeline was implemented entirely in BigQuery SQL, following a modular and production-oriented structure aligned with modern BI engineering practices. Key implementation steps include:

  • Data cleaning and normalization using functions such as TRIM, UPPER, INITCAP, and SAFE_CAST

  • Explicit definition of table grain (order-level, item-level, payment-level)

  • Aggregation of transactional data into business-level metrics such as revenue, freight, item counts, delivery performance, and payment signals

  • Construction of the main fact table (fct_orders) by combining and aggregating multiple raw sources (orders, items, payments, reviews)

  • Implementation of partitioning (by date) and clustering (by high-cardinality fields) to optimize query performance and reduce processing costs

  • Creation of a semantic layer (views) to centralize business logic and ensure metric consistency across the dashboard

  • Creation of a physical aggregation layer (agg_*) to support high-performance queries for frequently used dashboard views

To ensure data quality and reliability, validation checks were implemented, including:

  • Uniqueness checks on primary keys

  • Referential integrity checks between fact and dimension tables

  • Reconciliation logic between order-level totals and payment-level values

The final model was then connected to Power BI, where the data is consumed through a clean semantic layer and visualized in an interactive dashboard.

bottom of page