Skip to content

REST API

weisenje edited this page Feb 19, 2021 · 20 revisions

SemTK REST endpoints can be accessed via many languages or tools such as curl, Swagger and Postman. Additionally, REST client code is available via:

There are a two main types of endpoints:

  • asynchronous job execution - a sequence of calls to launch a job, wait until complete, and retrieve results. Details are below.
  • synchronous call - returns results in a single call. This is used during testing and for simpler operations which are likely to return quickly.

Queries are often run by using Nodegroup IDs, which refer to nodegroups that already been saved in SemTK.

Many services are designed to accept SPARQL connection strings, to indicate the graphs and triplestore(s) with which the application will interact. The SPARQL connection string(s) should be easily configured so they can be changed, for example, when an application moves from development to production, or when a data source changes. Use of nodegroup default connections is strongly discouraged in production environments.

Many jobs return data in the form of a SemTK table.

All calls should check for errors.

Default ports

The default ports for the most commonly used services are:

  • nodegroup execution 12058 - this is the most commonly used service
  • nodegroup store 12056 - storage of nodegroups by id
  • nodegroup service 12059 - interrogating and changing nodegroups
  • ontology info 12057 - information about the model

Lesser used services have their common endpoints available in the nodegroup execution service:

  • status 12051- job status
  • results 12052- retrieve results
  • query 12050- running queries

Each of these ports has a swagger page with a full listing of endpoints (e.g. host:12058/swagger-ui.html). The most commonly used are shown below.

Common endpoints

Execute a select query

Here is the simplest way to launch a query to select data.

curl -X POST protocol://host:12058/nodeGroupExecution/dispatchSelectById \
-H "Content-Type: application/json" \
-d '{"nodegroupId":"MyNodegroup", "sparqlConnection": "NODEGROUP_DEFAULT"}' 

The above assumes that Nodegroup ID "MyNodegroup" is already stored in SemTK, and uses the default sparqlConnection stored with that nodegroup. See here to override the sparqlConnection (recommended), limit the number of results, or provide runtime constraints to the query.

A successful response will return a JobId, which should be used as follows to wait for and retrieve results.

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

Execute a delete query

Delete query is similar to select.

POST: host:12058/nodeGroupExecution/dispatchDeleteById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection": "NODEGROUP_DEFAULT"       // override connection is recommended
  runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"   // very common optional parameter
}  

Success will generate a response just like a select query:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

simpleresults.JobId should be used for completing asynchronous jobs. The table returned from a successfully-completed delete query will have a @message column with a single cell containing a message from the triplestore describing the success.

Ingest CSV data

Ingesting data is performed using /ingestFromCsvStringsByIdAsync. This endpoint uses templateId instead of nodegroupId, with a sparqlConnection just like the query endpoints. csvContent is a string table of input data, whose column headings must match those specified by the template.

POST: host:12058/nodeGroupExecution/ingestFromCsvStringsByIdAsync
{  
 "templateId":"ingestTemplateName",  
 "csvContent":"column1, column2, column3\nvalue 1a, 2, 3.5\nvalue2a,3,42.6\n",
 "sparqlConnection":"{  \"name\":\"My_conn\",  \"domain\":\"\",  \"enableOwlImports\":true,  \"model\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/model\"  }],  \"data\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/data\"  }] }"  
}

A successful execution will generate a jobId which should be handled as specifiec in the section completing asynchronous jobs

{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "JobId":"job-8c2bd241-6633-4866-b525-2e91cb9a4800"
  }
}

However, the results are opposite: a message is generated upon success, and a table upon failure.

  • If /jobStatus returns "Failure", a table of detailed error messages can be retrieved through /getResultsTable
  • If /jobStatus returns "Success" then a message may be retrieved via /jobStatusMessage.

Both described in get the data. The resulting table should be treated as an error message.

Get nodegroup information

POST: http://host:12056/nodeGroupStore/getNodeGroupMetadata

This runs synchronously and returns a table of nodegroup id, comments, creation date, and creator.

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "ID",
        "comments",
        "creationDate",
        "creator"
      ],
      "rows": [
        [
          "My favorite nodegroupID",
          "query returns information about something",
          "2020-04-19",
          "205000999"
        ],
        ...
      ]
    }
  }
}

Get nodegroup's runtime constraints

POST: host:12056/nodeGroupStore/getNodeGroupRuntimeConstraints
{  
 "nodeGroupId": "My nodegroup ID"
}  

Synchronously returns a table of the variables id, item type (PROPERTYITEM or NODE), and data type

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "valueId",
        "itemType",
        "valueType"
      ],
      "rows": [
        [
          "?productionStage",
          "PROPERTYITEM",
          "STRING"
        ],
        [
          "?alloyName",
          "PROPERTYITEM",
          "STRING"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "string",
        "string",
        "string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Completing asynchronous jobs given a jobId

The general flow for asynchronous jobs is:

  • launch a query, e.g. with /dispatchSelectById
  • wait with /waitForPercentOrMsec
  • if succeeded, get table with /getResultsTable

Wait for job to complete

Once a job is successfully launched, track its call with /waitForPercentOrMsec which will return when either maxWaitMsec passes or the job reaches percentComplete. You want maxWaitMsec to be short enough to avoid any timeout at the HTTP layer. percentComplete may simply set to 100, or it can be changed incrementally in order to control a status bar in your app.

curl -X POST protocol://host:12058/nodeGroupExecution/waitForPercentOrMsec \
-H "Content-Type: application/json" \
-d '{"jobID": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0", "maxWaitMsec":10000, "percentComplete":5 }'

Sample response:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "percentComplete": "100",
    "statusMessage": "",
    "status": "Success"
  }
}

If simpleresults.percentComplete is less than 100, make repeated calls until it reaches 100.

When interpreting this response, note that status and message refer to the REST call, while simpleresults.status indicates "Success" or "Failure" of the job. A sample REST failure has status of "failure" and a rationale, as below.

{
  "message": "operations failed.",
  "rationale": "nodeGroupExecutionService/waitForPercentOrMsec threw java.lang.Exception Can't find Job Xreq_2e5089be-ac98-4cf9-8492-f57b77b3c0c0\ncom.ge.research.semtk.edc.JobTracker.getJobPercentComplete(JobTracker.java:179)\ncom.ge.research.semtk.edc.JobTracker.waitForPercentOrMsec(JobTracker.java:1195)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.waitForPercentOrMsec(NodeGroupExecutionRestController.java:303)\n...",
  "status": "failure"
}

Retrieve results

If /waitForPercentOrMsec returns a simpleresults.status of "Success", data can be retrieved via /getResultsTable.

curl -X POST protocol://host:12058/nodeGroupExecution/getResultsTable \
-H "Content-Type: application/json" \
-d '{"jobID": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0" }'

Sample response:

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "startdate",
        "finaltotal",
        "oppt_status"
      ],
      "rows": [
        [
          "2019-10-01T00:00:00",
          "0.0",
          "Proposal in progress"
        ],
        [
          "2019-10-01T00:00:00",
          "1.75028e+06",
          "Outstanding"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "http://www.w3.org/2001/XMLSchema#dateTime",
        "http://www.w3.org/2001/XMLSchema#double",
        "http://www.w3.org/2001/XMLSchema#string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Note: if your query may return very large results, you may need to switch to the SemTK results service /getTableResultsJsonForWebClient endpoint, which returns a URL.

Note: successful ingestion jobs have no table, but simply a success status message

Note: some versions of SemTK require the capitalization jobID instead of jobId

Common REST parameters

Limit number of results, or retrieve offset results

 "limitOverride": -1,                           // optional query LIMIT
 "offsetOverride": -1,                          // optional query OFFSET

sparqlConnection

For the vast majority of cases, a sparqlConnection should be provided. It is a JSON string, so all quotes have to be escaped and the newlines shown below for clarity may not be allowed. This parameter is typically called an override connection and is loaded as part of your app's configuration so that dev, test, and stage work off different data connections, and the app is easy to update if data is moved.

POST: host:12058/nodeGroupExecution/dispatchSelectById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection":"{
  \"name\":\"My_conn\",
  \"domain\":\"\",
  \"enableOwlImports\":true,
  \"model\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/model\"
  }],
  \"data\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/data\"
  }]
 }"  
}  

runtimeConstraints

Many endpoints accept runtime constraints of the form

runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"

Each constraint object consists of the SPARQL ID of the item being constrained, an operator, and operands.

Valid operators:

  • MATCHES - operands are list of matches joined by "OR"
  • REGEX
  • GREATERTHAN
  • GREATERTHANOREQUALS
  • LESSTHAN
  • LESSTHANOREQUALS
  • VALUEBETWEEN - accepts two operands
  • VALUEBETWEENUNINCLUSIVE - accepts two operands

Error responses

All REST calls should be checked for both HTTP errors and SemTK errors.

HTTP level may have a **status **number and **error **and message:

Response:
{
  "timestamp": "2019-07-24T20:02:26.035+0000",
  "status": 400,
  "error": "Bad Request",
  "message": "JSON parse error: Unexpected character ('{' (code 123))",
  "path": "/nodeGroupExecution/dispatchSelectById"
}

Failures inside SemTK, on the other hand, always have a status of "failure" and a rationale

Response:
{
  "message": "operations failed.",
  "rationale": "service: nodeGroupExecutionService method: dispatchAnyJobById() threw java.lang.Exception Could not find nodegroup with id: BLAST_GRC_ExpectedFunding NOPE\ncom.ge.research.semtk.api.nodeGroupExecution.NodeGroupExecutor.dispatchJob(NodeGroupExecutor.java:376)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchAnyJobById(NodeGroupExecutionRestController.java:475)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchSelectJobById(NodeGroupExecutionRestController.java:604)\n...",
  "status": "failure"
}

Note that these should not be confused with SemTK successfully indicating that a job failed. This is not an service layer "error" but successful handling of job failure. The outer status indicates the status of the service call, where the inner simpleresults.status indicates the status of the job. For example:

Response:
{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "status":"Failure"
  }
}

Conversely, a failure retrieving results or status may be caused by HTTP layer or service failures. When this happens, the status of the actual job is unknown until the error is corrected.

SPARQLgraph
Clone this wiki locally