This pipeline is designed to streamline & automate the processed of extracting, transforming & visualizing real estate data.
It leverages the following services:
Amazon EC2: Hosts Apache Airflow for orchestrating the ETL processes.
AWS S3: Holds the data which is to be later used by Amazon EMR for data transformation.
Amazon EMR: Handles data transformation tasks & loads the transformed data back to S3 to be further consumed by Snowflake.
Snowflake: Leverages Snowpipe to auto-populate the transformed data from S3 to Snowflake Databases.
Power BI: Visualizes the Snowflake data for insightful reporting.
- Scalable :- This pipeline processes over a 100,000 records every month
- Efficient :- Ensures efficient data handling via automated ETL processes
- Insightful :- Provides detailed and interactive visualizations for real estate market analysis