GitHub - Hung4865/H-CSDL

Real-time E-commerce Data Pipeline: Kafka to SQL Server A robust end-to-end data engineering project that simulates a real-time e-commerce order processing system. This project demonstrates advanced Database Management (RDBMS) techniques, message queuing with Apache Kafka, and data visualization with Power BI.

I. Key Features

Real-time Data Streaming: Utilizes Apache Kafka (KRaft mode) as a message broker to handle high-velocity order streams.
Advanced SQL Server Logic: * Stored Procedures: Centralized business logic for parsing JSON data and managing order flow.
- ACID Transactions: Implements BEGIN TRANSACTION, COMMIT, and ROLLBACK to ensure data integrity (no overselling, no orphan records).
- Find-or-Create Logic: Automated user normalization during the order process.
- Triggers: Automatic audit logging of stock changes into inventory_logs.
3NF Database Design: A fully normalized schema with 7 tables to ensure zero data redundancy.
Interactive Analytics: A comprehensive Power BI Dashboard connected via SQL Views for real-time business insights.

II. System Architecture The pipeline follows a modern data stack architecture:

Producer (Python): Simulates a storefront generating random orders (JSON format) every 4 seconds.
Kafka Broker: Buffers and queues orders in the ecommerce_orders topic.
Consumer (Python): Listens to Kafka and invokes SQL Stored Procedures via pyodbc.
SQL Server (The Brain): Executes business logic, maintains integrity, and stores data.
Power BI: Visualizes sales performance, customer demographics, and shipping efficiency.

II. Database Schema (3NF) The database is designed following Third Normal Form (3NF) principles to ensure scalability and integrity. Core Tables:

orders & order_details: Fact tables storing transaction data.
products, categories, users, shippers: Dimension tables for detailed attributes.
inventory_logs: Audit table populated by SQL Triggers.

IV. Tech Stack Database: Microsoft SQL Server (T-SQL) Streaming: Apache Kafka (KRaft) Language: Python 3.x (kafka-python, pyodbc, faker) BI Tool: Microsoft Power BI

V. Getting Started

Database Setup Execute the scripts in /sql/init_database.sql using SQL Server Management Studio (SSMS). This will create the database, tables, stored procedures, and triggers.
Kafka Configuration Ensure Kafka is running in KRaft mode and create the required topic: .\bin\windows\kafka-topics.bat --create --topic ecommerce_orders --bootstrap-server localhost:9092
Run the Pipeline First, start the consumer to listen for data: python python/consumer.py Then, start the producer to generate orders: python python/producer.py

VI. Visualizations The Power BI dashboard provides real-time insights into:

Revenue Analysis: Sales trends and top-performing categories.
Customer Demographics: Order distribution by age, gender, and occupation.
Logistics: Shipping provider performance and usage.

VII. Future Improvements Implement Indexing for query optimization on large datasets. Add Window Functions for advanced analytics (e.g., Top 3 products per category). Integrate Row-Level Security (RLS) for multi-tenant reporting.

Author: Đỗ Minh Hoàng Project: Database Management System Final Project

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Tiểu luận Hệ quản trị cơ sở dữ liệu.pdf		Tiểu luận Hệ quản trị cơ sở dữ liệu.pdf
consumer.py		consumer.py
csdl.sql		csdl.sql
producer.py		producer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages