Repository files navigation
This repo contains projects relating to data engineering concepts
Further information and details about certain concepts can be found in the Intro to Basics folder
Linux and Shell Scripting
This project applies my abilities of Linux and shell scripting to complete a fictional scenario as a linux developer at a top-tech company.
Building Data Pipelines with Airflow
Apache Airflow is a great open source workflow orchestration tool that lets you build and run workflows
This project will collect data available in different formats, and consolidate it into a single file
Building Data Pipelines with Kafka
Apache Kafka is a very popular open source event streaming pipeline
This project will create a data pipeline that collects streaming data and loads it into a database using Kafka
Building Data Pipelines with Shell
Create a shell scripts to extract, transform, and load data
Create and populate a PostgreSQL table
Data Warehousing with Postgres
Apply my knowledge and skills to design and load data into a data warehouse using facts and dimension tables
Write aggregation queries using CUBE and ROLLUP functions and create materialized query tables (materialized view)
NoSQL with MongoDB, Cassandra and IBM Cloudant
This project applies my abilities to work with several NoSQL databases to move and analyze data
Move data from one type of database to another and run basic queries on various databases
Data Engineering and Machine Learning with Spark
Use Apache Spark for Data Engineering and Machine Learning
Create a Spark application end-to-end that includes ETL and model training
About
Data engineering projects
Topics
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.