Skip to content

REST API

weisenje edited this page Feb 19, 2021 · 20 revisions

Using the SemTK REST API

SemTK REST endpoints can be accessed via many languages or tools such as Swagger and Postman.

Additionally, REST client code is available via:

Launching Jobs

There are a few main types of endpoints:

  • asynchronous job execution - the most common type of job. Call chain is described below in completing asynchronous jobs.
  • synchronous - used during testing and for simpler operations which are likely to return quickly.

Many jobs return data in the form of a SemTK table.

Many services are also designed to accept SPARQL connection strings, to indicate the graphs and triplestore(s) with which the application will interact. The SPARQL connection string(s) should be easily configured so they can be changed, for example, when an application moves from test to stage, or when a data source changes. For an example of this functionality, see select query. Use of nodegroup default connections is strongly discouraged in production environments.

All calls should check for errors.

Recipes for common endpoints

Default ports

The default ports for the most commonly used services are:

  • nodegroup execution 12058 - this is the most commonly used service
  • nodegroup store 12056 - storage of nodegroups by id
  • nodegroup service 12059 - interrogating and changing nodegroups
  • ontology info 12057 - information about the model

Lesser used services have their common endpoints available in the nodegroup execution service:

  • status 12051- job status
  • results 12052- retrieve results
  • query 12050- running queries

Each of these ports has a swagger page with a full listing of endpoints (e.g. host:12058/swagger-ui.html). The most commonly used are shown below.

Query using nodegroup id

Nodegroup IDs can be used to execute queries. The most common example is a SELECT query. This will be performed using the NodeGroupExecutionService, which is typically on port 12058.

Details follow.

Select query

Here is the simplest way to launch a query.

POST: host:12058/nodeGroupExecution/dispatchSelectById
{  
 "nodeGroupId": "MyNodegroup",  

 "sparqlConnection": "NODEGROUP_DEFAULT",       // optional override connection is recommended
 "limitOverride": -1,                           // optional query LIMIT
 "offsetOverride": -1,                          // optional query OFFSET
 "runtimeConstraints": "[{\"SparqlID\":\"?id\",\"Operator\":\"MATCHES\",\"Operands\":[\"98243-T\"]}]",  // optional 
}  

A successful response will return a JobId:

Response:
{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

simpleresults.JobId should be used for completing asynchronous jobs

Delete query

Delete query is similar to select.

POST: host:12058/nodeGroupExecution/dispatchDeleteById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection": "NODEGROUP_DEFAULT"       // override connection is recommended
  runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"   // very common optional parameter
}  

Success will generate a response just like a select query:

Response:
{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

simpleresults.JobId should be used for completing asynchronous jobs. The table returned from a successfully-completed delete query will have a @message column with a single cell containing a message from the triplestore describing the success.

Ingesting CSV data

Ingesting data is performed using /ingestFromCsvStringsByIdAsync. This endpoint uses templateId instead of nodegroupId, with a sparqlConnection just like the query endpoints. csvContent is a string table of input data, whose column headings must match those specified by the template.

POST: host:12058/nodeGroupExecution/ingestFromCsvStringsByIdAsync
{  
 "templateId":"ingestTemplateName",  
 "csvContent":"column1, column2, column3\nvalue 1a, 2, 3.5\nvalue2a,3,42.6\n",
 "sparqlConnection":"{  \"name\":\"My_conn\",  \"domain\":\"\",  \"enableOwlImports\":true,  \"model\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/model\"  }],  \"data\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/data\"  }] }"  
}

A successful execution will generate a jobId which should be handled as specifiec in the section completing asynchronous jobs

Response:
{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "JobId":"job-8c2bd241-6633-4866-b525-2e91cb9a4800"
  }
}

However, the results are opposite: a message is generated upon success, and a table upon failure.

  • If /jobStatus returns "Failure", a table of detailed error messages can be retrieved through /getResultsTable
  • If /jobStatus returns "Success" then a message may be retrieved via /jobStatusMessage.

Both described in get the data. The resulting table should be treated as an error message.

Get nodegroup information

POST: http://host:12056/nodeGroupStore/getNodeGroupMetadata

This runs synchronously and returns a table of nodegroup id, comments, creation date, and creator.

Response:
{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "ID",
        "comments",
        "creationDate",
        "creator"
      ],
      "rows": [
        [
          "My favorite nodegroupID",
          "query returns information about something",
          "2020-04-19",
          "205000999"
        ],
        ...
      ]
    }
  }
}

Get nodegroup's runtime constraints

POST: host:12056/nodeGroupStore/getNodeGroupRuntimeConstraints
{  
 "nodeGroupId": "My nodegroup ID"
}  

Synchronously returns a table of the variables id, item type (PROPERTYITEM or NODE), and data type

Response:
{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "valueId",
        "itemType",
        "valueType"
      ],
      "rows": [
        [
          "?productionStage",
          "PROPERTYITEM",
          "STRING"
        ],
        [
          "?alloyName",
          "PROPERTYITEM",
          "STRING"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "string",
        "string",
        "string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Error responses

All REST calls should be checked for both HTTP errors and SemTK errors.

HTTP level may have a **status **number and **error **and message:

Response:
{
  "timestamp": "2019-07-24T20:02:26.035+0000",
  "status": 400,
  "error": "Bad Request",
  "message": "JSON parse error: Unexpected character ('{' (code 123))",
  "path": "/nodeGroupExecution/dispatchSelectById"
}

Failures inside SemTK, on the other hand, always have a status of "failure" and a rationale

Response:
{
  "message": "operations failed.",
  "rationale": "service: nodeGroupExecutionService method: dispatchAnyJobById() threw java.lang.Exception Could not find nodegroup with id: BLAST_GRC_ExpectedFunding NOPE\ncom.ge.research.semtk.api.nodeGroupExecution.NodeGroupExecutor.dispatchJob(NodeGroupExecutor.java:376)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchAnyJobById(NodeGroupExecutionRestController.java:475)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchSelectJobById(NodeGroupExecutionRestController.java:604)\n...",
  "status": "failure"
}

Note that these should not be confused with SemTK successfully indicating that a job failed. This is not an service layer "error" but successful handling of job failure. The outer status indicates the status of the service call, where the inner simpleresults.status indicates the status of the job. For example:

Response:
{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "status":"Failure"
  }
}

Conversely, a failure retrieving results or status may be caused by HTTP layer or service failures. When this happens, the status of the actual job is unknown until the error is corrected.

Completing asynchronous jobs given a jobId

The general flow for asynchronous jobs is:

  • launch a query, e.g. with /dispatchSelectById
  • wait with /waitForPercentOrMsec
  • get status with /jobStatus - no longer needed as this is returned by final call to /waitForPercentOrMsec
  • if failed, /jobStatusMessage- no longer needed as this is returned by final call to /waitForPercentOrMsec
  • if succeeded, get table with /getResultsTable

Wait for job to complete

Once a job is successfully launched, track its call with /waitForPercentOrMsec which will return when either maxWaitMsec passes or the job reaches percentComplete. You want maxWaitMsec to be short enough to avoid any timeout at the HTTP layer. percentComplete may simply set to 100, or it can be changed incrementally in order to control a status bar in your app.

POST: host:12058/nodeGroupExecution/waitForPercentOrMsec
{  
 "jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",  
 "maxWaitMsec":10000,
 "percentComplete":5
}

A response will take the form:

Response:
{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "percentComplete": "100"
  }
}

If simpleresults.percentComplete is less than 100, make repeated calls until it reaches 100.

A sample failure has status of "failure" and a rationale. Note that this means the REST call failed. This does not reflect upon the success of a job.

Response:
{
  "message": "operations failed.",
  "rationale": "nodeGroupExecutionService/waitForPercentOrMsec threw java.lang.Exception Can't find Job Xreq_2e5089be-ac98-4cf9-8492-f57b77b3c0c0\ncom.ge.research.semtk.edc.JobTracker.getJobPercentComplete(JobTracker.java:179)\ncom.ge.research.semtk.edc.JobTracker.waitForPercentOrMsec(JobTracker.java:1195)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.waitForPercentOrMsec(NodeGroupExecutionRestController.java:303)\n...",
  "status": "failure"
}

Get job status

After a job is complete, its status can be obtained through /jobStatus

POST: host:12058/nodeGroupExecution/jobStatus

{  
 "jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",  
}

This call will generate a response of the form:

Response:
{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "status":"Success"
  }
}

When interpreting this response, note that status and message refer to the REST call, while simpleresults.status indicates "Success" or "Failure" of the job. simpleresults.status will not exist unless status is "success"

Get the error message

If /jobStatus returns a simpleresults.status that is NOT "Success", its message can be retrieved through /jobStatusMessage

Note: failed ingestion jobs should retrieve a JSON table of error messages instead of a status message. This is done using the same method queries use to retrieve results shown in get the data

POST: host:12058/nodeGroupExecution/jobStatusMessage
{  
 "jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",  
}

which will generate a return with simpleresults.message. Again, note that message and status refer to the REST call, and not the job. status will be "success" regardless of the job status.

Response:
{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "message": "Failed due to bad weather"
  }
}

Get the data

If /jobStatus returns a simpleresults.status of "Success", data can be retrieved via /getResultsTable.

Note: successful ingestion jobs have no table, but simply a success status message Note: some versions of SemTK require the capitalization jobID instead of jobId

POST: host:12058/nodeGroupExecution/getResultsTable
{  
 "jobID":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",  
}

Note: if your query may return very large results, you may need to switch to the SemTK results service /getTableResultsJsonForWebClient endpoint, which returns a URL.

/getResultsTable returns a JSON table inside table.@table, e.g.:

Response:
{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "startdate",
        "finaltotal",
        "oppt_status"
      ],
      "rows": [
        [
          "2019-10-01T00:00:00",
          "0.0",
          "Proposal in progress"
        ],
        [
          "2019-10-01T00:00:00",
          "1.75028e+06",
          "Outstanding"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "http://www.w3.org/2001/XMLSchema#dateTime",
        "http://www.w3.org/2001/XMLSchema#double",
        "http://www.w3.org/2001/XMLSchema#string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Common REST parameters

sparqlConnection

For the vast majority of cases, a sparqlConnection should be provided. It is a JSON string, so all quotes have to be escaped and the newlines shown below for clarity may not be allowed. This parameter is typically called an override connection and is loaded as part of your app's configuration so that dev, test, and stage work off different data connections, and the app is easy to update if data is moved.

POST: host:12058/nodeGroupExecution/dispatchSelectById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection":"{
  \"name\":\"My_conn\",
  \"domain\":\"\",
  \"enableOwlImports\":true,
  \"model\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/model\"
  }],
  \"data\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/data\"
  }]
 }"  
}  

runtimeConstraints

Many endpoints accept runtime constraints of the form

runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"

Each constraint object consists of the SPARQL ID of the item being constrained, an operator, and operands.

Valid operators:

  • MATCHES - operands are list of matches joined by "OR"
  • REGEX
  • GREATERTHAN
  • GREATERTHANOREQUALS
  • LESSTHAN
  • LESSTHANOREQUALS
  • VALUEBETWEEN - accepts two operands
  • VALUEBETWEENUNINCLUSIVE - accepts two operands
SPARQLgraph
Clone this wiki locally