Skip to content

RedFin Data Pipeline built using Amazon EMR, S3, Snowflake, Apache Airflow and Power BI

Notifications You must be signed in to change notification settings

sundarmd/Redfin-Data-Pipeline

Repository files navigation

Redfin Data Analytics Pipeline

This pipeline is designed to streamline & automate the processed of extracting, transforming & visualizing real estate data.

It leverages the following services:

Amazon EC2: Hosts Apache Airflow for orchestrating the ETL processes.

AWS S3: Holds the data which is to be later used by Amazon EMR for data transformation.

Amazon EMR: Handles data transformation tasks & loads the transformed data back to S3 to be further consumed by Snowflake.

Snowflake: Leverages Snowpipe to auto-populate the transformed data from S3 to Snowflake Databases.

Power BI: Visualizes the Snowflake data for insightful reporting.

Architecture

download

S3 Bucket

image

Airflow DAG

image

Snowflake Database

image

Power BI Dashboard

image

Features

  1. Scalable :- This pipeline processes over a 100,000 records every month
  2. Efficient :- Ensures efficient data handling via automated ETL processes
  3. Insightful :- Provides detailed and interactive visualizations for real estate market analysis

About

RedFin Data Pipeline built using Amazon EMR, S3, Snowflake, Apache Airflow and Power BI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published