Skip to content

sunnygleason/st9-proto-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

ST9 is a nimble and nifty data store that is designed for scalability.

  • Nimble : because you only lock down the attributes of the schema that you have to
  • Nifty : because it introduces some features that SQL databases don't have -- is not your average SQL data store
  • Scalable : because ST9 is built from scalable primitives (key-value store, separate secondary indexes, and constant-time counters)

Some other noteworthy features of ST9:

  • REST API : designed to be super easy to work with any language or even the command line (i.e. Ruby, node.js, curl)
  • Developer-Friendly : provides Sqlite3 and H2 data stores for easy development, as well as the super-scalable MySQL-based store
  • Document-Oriented : stored entities may contain complex JSON structure (nested arrays and objects)
  • Multi-Get : retrieving several entities at one time without breaking a sweat!
  • 2-Query Model: secondary index queries return identifiers (primary keys), allowing client to decide which entities to load
  • Caching : can use a near cache (Memcached or Redis) to offload key-value lookups and support multi-get
  • Soft Deletion : the "delete" operation marks content as deleted but keeps original content until expunged
  • Quarantine : content "quarantining" complements deletion - quarantined content is available in KV store by passing an extra flag, removed from counters and hidden from index queries (unless an extra flag is passed to the query)
  • Secondary Indexes : define indexes (think: database tables with just the data you need) to support range queries and unique constraints
  • Unique Index Optimization : ST9 populates KV cache entries for unique indexes to optimize entity loading without hitting the database (think: load user by email or SSN)
  • Counters: schemas may define counters (think: select count(*) where ... group by X,Y,Z) to support data aggregation with minimal (constant time) overhead per update
  • Binary Encoding: under the hood, ST9 stores data in SMILE (binary JSON) format to minimize data storage requirements
  • Compression: under the hood, ST9 stores data with LZF compression to minimize data storage requirements even further

Building ST9

To build st9, run:

$ ./scripts/build.sh

Running ST9

To run st9, run:

$ ./scripts/run-sqlite.sh

(There is also a mysql version for power users, and an H2 (embedded in-memory) store for serious development.)

Using ST9

Schema API - the /s/ endpoint

A cook's tour of the Schema API, used to define entity types (think: tables but not tables):

# creates a schema for a new entity type

$ curl -v -v -X POST --data-binary @src/test/resources/schema07.json.txt -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/s/point"

# retrieves a schema for the given entity type

$ curl -v -v -X GET "http://localhost:7331/1.0/s/point"

# updates a schema for the given entity type (including migrating counters and indexes as necessary)

$ curl -v -v -X PUT --data-binary @src/test/resources/schema07b.json.txt -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/s/point"

# removes a data type (caution: do not use)

$ curl -v -v -X DELETE "http://localhost:7331/1.0/s/point"

Entity API - the /e/ endpoint

Creating, retrieving, updating and deleting entities (think: rows but more like documents):

# using POST to create a new entity

$ curl -v -v -X POST --data "{"x":1,"y":-1}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/e/point"

# example of POST invalid data (type mismatch)

$ curl -v -v -X POST --data "{"x":1,"y":true}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/e/point"

# example of GET a newly-created entity

$ curl -v -v -X GET "http://localhost:7331/1.0/e/@point:5eae81437e481933"

# updating an entity with PUT

$ curl -v -v -X PUT --data "{"x":7,"y":-3}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/e/@point:5eae81437e481933"

# using DELETE to remove an entity

$ curl -v -v -X DELETE "http://localhost:7331/1.0/e/@point:5eae81437e481933"

Index API - the /i/ endpoint

Querying entities using secondary (non-primary key) indexes.

# Query point entities using the 'xy' index where 'x' equals 1:

$ curl -v -v -X GET "http://localhost:7331/1.0/i/point.xy?q=x+eq+1"

# Query point entities using the 'xy' index where 'x' not equal -1 and where 'y' not equal 1

$ curl -v -v -X GET "http://localhost:7331/1.0/i/point.xy?q=x+ne+-1+and+y+ne+1"

# Query point entities using the 'xy' index where 'x' is in the list (1, 2, 3) and where 'y' ne 1

$ curl -v -v -X GET "http://localhost:7331/1.0/i/point.xy?q=x+in+(1,2,3)+and+y+ne+1"

A query is a conjunction (list AND'ed together) of query terms.

Query term format (binary operators): field op value

Query term format (in operator): field in ( value1 , value2 )

Operators:

  • eq : equals operator
  • ne : not equals operator (try to avoid, requires full index scan)
  • gt : greater than operator
  • ge : greater than or equals operator
  • lt : less than operator
  • le : less than or equals operator

Possible value literals:

  • Integers: 1, -1, 0, ...
  • String literals: "some text between double quotes"
  • Boolean literals: true, false
  • Null literal: null

A query is a conjunction (list AND'ed together) of query terms.

Advanced schema and query stuff:

# defining a schema (entity type)

$ curl -v -v -X POST --data "{"attributes":[{"name":"msg","type":"UTF8_SMALLSTRING"},{"name":"hotness","type":"ENUM","values":["COOL","HOT"]}],"indexes":[{"name":"hotmsg","cols":[{"name":"hotness","sort":"ASC"},{"name":"msg","sort":"ASC"},{"name":"id","sort":"ASC"}]}]}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/s/message"

# creating a couple entities

$ curl -v -v -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data "{"msg":"hello world","hotness":"COOL"}" "http://localhost:7331/1.0/e/message"

$ curl -v -v -X POST -H "Content-Type: application/json" -H "Accept: application/json" --data "{"msg":"fly like a G6","hotness":"HOT"}" "http://localhost:7331/1.0/e/message"

# querying entities by enum value

$ curl -v -v -X GET "http://localhost:7331/1.0/i2/message.hotmsg?q=hotness+eq+\"COOL\""

$ curl -v -v -X GET "http://localhost:7331/1.0/i2/message.hotmsg?q=hotness+eq+\"HOT\""

$ curl -v -v -X GET "http://localhost:7331/1.0/i2/message.hotmsg?q=hotness+eq+\"FOO\""

# querying entities by enum value and string range queries

$ curl -v -v -X GET "http://localhost:7331/1.0/i2/message.hotmsg?q=hotness+eq+\"HOT\"+and+msg+lt+\"he\""

$ curl -v -v -X GET "http://localhost:7331/1.0/i2/message.hotmsg?q=hotness+eq+\"HOT\"+and+msg+gt+\"he\""

Unique Entity API - the /u/ endpoint

Queries "unique" entities using unique secondary (non-primary key) indexes, saving a second round-trip for the entity fetch.

This example assumes that the user has created a "point" schema with x and y attributes and a unique index on the x attribute called "uniq_x".

# Query point entities using the 'uniq_x' index where 'x' equals 1:

$ curl -v -v -X GET "http://localhost:7331/1.0/u/point.uniq_x?q=x+eq+1"

Counters API - the /c/ endpoint

Counters are pre-computed aggregations (think: GROUP BY, but maintained during entity updates) that allow efficient counts of entities.

# Define an entity with counters

$ curl -v -v -X POST --data @src/test/resources/schema20.json.txt -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/s/awesome"

# Post a couple entities

$ curl -v -v -X POST --data "{"target":"foo:1","isAwesome":true,"hotness":"COOL","year":1970}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/e/awesome"

$ curl -v -v -X POST --data "{"target":"foo:2","isAwesome":false,"hotness":"TEH_HOTNESS","year":1980}" -H "Content-Type: application/json" -H "Accept: application/json" "http://localhost:7331/1.0/e/awesome"

# Retrieve the first page of counts for the 'byTarget' counter

$ curl "http://localhost:7331/1.0/c/awesome.byTarget"

# Retrieve the specific counters corresponding to 'foo:1' and 'foo:2'

$ curl "http://localhost:7331/1.0/c/awesome.byTarget/foo:1"

$ curl "http://localhost:7331/1.0/c/awesome.byTarget/foo:2"

# Counters may be queried hierarchically by providing column values in left to right order

$ curl "http://localhost:7331/1.0/c/awesome.byAwesome"

$ curl "http://localhost:7331/1.0/c/awesome.byTargetHotnessYear"

$ curl "http://localhost:7331/1.0/c/awesome.byTargetHotnessYear/@foo:ce4ad6a1cd6293d9/TEH_HOTNESS"

$ curl "http://localhost:7331/1.0/c/awesome.byTargetHotnessYear/@foo:ce4ad6a1cd6293d9/TEH_HOTNESS/1980"

Quarantine Endpoints

# quarantine a point

$ curl -v -v -X POST "http://localhost:7331/1.0/q/@point:5eae81437e481933"

# get quarantine status of a point

$ curl -v -v -X GET "http://localhost:7331/1.0/q/@point:5eae81437e481933"

# unquarantine a point

$ curl -v -v -X DELETE "http://localhost:7331/1.0/q/@point:5eae81437e481933"

# retrieve an entity that has been quarantined (note /e/ endpoint)

$ curl -v -v -X GET "http://localhost:7331/1.0/e/@point:5eae81437e481933?includeQuarantine=true"

# include quarantined records in index searches (note /i/ endpoint)

$ curl -v -v -X GET "http://localhost:7331/1.0/i/point.xy?q=x+eq+1&includeQuarantine=true"

Administrative Endpoints

# ping the data store

$ curl -v -v -X GET "http://localhost:7331/ping"

# nuke (delete) all data in the data store, including schema

$ curl -v -v -X POST "http://localhost:7331/1.0/nuke"

# nuke (delete) all data in the data store, including schema

$ curl -v -v -X POST "http://localhost:7331/1.0/nuke?preserveSchemas=true"

# export all data in the data store in one-json-object-per-line format

$ curl -v -v -X GET "http://localhost:7331/1.0/x"

# import data into the data store, deleting all previous data

$ curl -v -v -X POST -H "Content-Type: application/json" --data-binary @the-file.json.txt "http://localhost:7331/1.0/x"

System Configuration Properties

Properties that are currently supported:

  • http.port : port to listen on (default 8080)
  • st9.storageModule : module name of storage module (default "com.g414.st9.proto.service.store.SqliteKeyValueStorage$SqliteKeyValueStorageModule", "com.g414.st9.proto.service.store.MySQLKeyValueStorage$MySQLKeyValueStorageModule")
  • st9.cacheClass : class name of cache module (default none, "com.g414.st9.proto.service.cache.RedisKeyValueCache", "com.g414.st9.proto.service.cache.MemcachedKeyValueCache")
  • log.dir : log directory (default "logs")
  • jdbc.url : url for jdbc data store (default "jdbc:sqlite:thedb.db")
  • jdbc.user : username for jdbc data store
  • jdbc.password : password for jdbc data store
  • redis.host : hostname of redis service (default "localhost")
  • redis.port : port of redis service (default "")
  • memcached.host : hostname and port of memcached service in "host:port" form (default "localhost:11211")
  • nuke.allowed : whether nukes are allowed on the data store (default "false")
  • strict.type.creation : whether types must be created via the schema endpoint before use (default "true")

That's all for now.

About

A RESTful Content Store designed with scalability in mind

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages