Skip to content

Demo for ingesting OpenCity and OSM data from S3 to DynamoDB with EMR, Kinesis Streams and Lambda with PoC web frontend

Notifications You must be signed in to change notification settings

shuva10v/osm-dymamodb-storage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenCity & OSM DynamoDB backed geospatial storage demo

The project uploads Open City Model database and Open Street Maps to DynamoDB with EMR, Kinesis and Lambda. It also provides PoC web application to perform geospatial requests, view buildings and add tags on it.

How it works:

  1. EMR cluster launched every N hours. The only step it has - Spark job spark-kinesis-ingester.
  2. spark-kinesis-ingester reads data from S3 Data Lake (tables metadata comes from Glue Data Catalog) and puts it into Kinesis stream. Open City Model data partitioned by US states and on each invocation one random US state data processed.
  3. Lambda function OpenCityDDBWriter reads records from the Kinesis stream and puts it into DynamoDB table.
  4. Web application for navigating OpenCity data build with Lambda and API Gateway.

Overall architecture:

Deployment

  1. Create Glue table as described here

  2. Build spark-kinesis-ingester module:

mvn clean install
  1. Put jar /target/spark-kinesis-ingester-1.0-SNAPSHOT.jar to your S3 bucket.

  2. Go to deploy folder and prepare terraform config file config.tfvars:

region="es-east-1"
jar_path="s3://your_bucket/jars/spark-kinesis-ingester-1.0-SNAPSHOT.jar"
s3_static_bucket_name="static-content-bucket"
  1. Go to webapp folder and build frontend:
npm install
gulp
  1. Apply it:
terraform init
terraform plan -var-file=config.tfvars
terraform apply -var-file=config.tfvars

It outputs API Gateway endpoint:

Outputs:

backend_api_url = https://???????.execute-api.eu-west-1.amazonaws.com/opencity
  1. Go to webapp folder and build frontend with api url from the previous step:
gulp --api_endpoint https://???????.execute-api.eu-west-1.amazonaws.com/opencity
  1. Apply terraform one more time:
terraform apply -var-file=config.tfvars

Init OSM data

To init OSM data:

  1. Run CMR cluster

  2. Create planet table as described here

  3. Run Job with:

spark-submit 
--deploy-mode cluster 
--conf spark.sql.catalogImplementation=hive 
--conf spark.yarn.maxAppAttempts=1 
--class io.shuvalov.spark.kinesis.ingester.IngesterJob 
%jar_path%
"SELECT concat('', id) as hash, type, to_json(tags) as tags, lat, lon, to_json(nds) as nds, to_json(members) as members, 
unix_timestamp(timestamp) as timestamp, uid, user, version FROM opencitymodel.planet limit 10" 
OSM

Web Application

Web app provides simple UI with a map based on Leaflet:

About

Demo for ingesting OpenCity and OSM data from S3 to DynamoDB with EMR, Kinesis Streams and Lambda with PoC web frontend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published