-
Notifications
You must be signed in to change notification settings - Fork 6
REST API
SemTK REST endpoints can be accessed via many languages or tools such as Swagger and Postman.
Additionally, REST client code is available via:
- Java - Java API page
- Python - semtk-python3
- Javascript - sample code can be found at sample_async_calls.js
There are a few main types of endpoints:
- asynchronous job execution - the most common type of job. Call chain is described below in completing asynchronous jobs.
- synchronous - used during testing and for simpler operations which are likely to return quickly.
Many jobs return data in the form of a SemTK table.
Many services are also designed to accept SPARQL connection strings, to indicate the graphs and triplestore(s) with which the application will interact. The SPARQL connection string(s) should be easily configured so they can be changed, for example, when an application moves from test to stage, or when a data source changes. For an example of this functionality, see select query. Use of nodegroup default connections is strongly discouraged in production environments.
All calls should check for errors.
The default ports for the most commonly used services are:
- nodegroup execution 12058 - this is the most commonly used service
- nodegroup store 12056 - storage of nodegroups by id
- nodegroup service 12059 - interrogating and changing nodegroups
- ontology info 12057 - information about the model
Lesser used services have their common endpoints available in the nodegroup execution service:
- status 12051- job status
- results 12052- retrieve results
- query 12050- running queries
Each of these ports has a swagger page with a full listing of endpoints (e.g. host:12058/swagger-ui.html). The most commonly used are shown below.
Nodegroup IDs can be used to execute queries. The most common example is a SELECT query. This will be performed using the NodeGroupExecutionService, which is typically on port 12058.
Details follow.
Here is the simplest way to launch a query.
POST: host:12058/nodeGroupExecution/dispatchSelectById
{
"nodeGroupId": "MyNodegroup",
"sparqlConnection": "NODEGROUP_DEFAULT", // optional override connection is recommended
"limitOverride": -1, // optional query LIMIT
"offsetOverride": -1, // optional query OFFSET
"runtimeConstraints": "[{\"SparqlID\":\"?id\",\"Operator\":\"MATCHES\",\"Operands\":[\"98243-T\"]}]", // optional
}
A successful response will return a JobId:
Response:
{
"message": "operations succeeded.",
"status": "success",
"simpleresults": {
"JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
}
}
simpleresults.JobId should be used for completing asynchronous jobs
Delete query is similar to select.
POST: host:12058/nodeGroupExecution/dispatchDeleteById
{
"nodeGroupId": "BLAST_GRC_ExpectedFunding",
"sparqlConnection": "NODEGROUP_DEFAULT" // override connection is recommended
runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]" // very common optional parameter
}
Success will generate a response just like a select query:
Response:
{
"message": "operations succeeded.",
"status": "success",
"simpleresults": {
"JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
}
}
simpleresults.JobId should be used for completing asynchronous jobs. The table returned from a successfully-completed delete query will have a @message column with a single cell containing a message from the triplestore describing the success.
Ingesting data is performed using /ingestFromCsvStringsByIdAsync. This endpoint uses templateId instead of nodegroupId, with a sparqlConnection just like the query endpoints. csvContent is a string table of input data, whose column headings must match those specified by the template.
POST: host:12058/nodeGroupExecution/ingestFromCsvStringsByIdAsync
{
"templateId":"ingestTemplateName",
"csvContent":"column1, column2, column3\nvalue 1a, 2, 3.5\nvalue2a,3,42.6\n",
"sparqlConnection":"{ \"name\":\"My_conn\", \"domain\":\"\", \"enableOwlImports\":true, \"model\":[{ \"type\":\"neptune\", \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\", \"graph\":\"http://blast-test/model\" }], \"data\":[{ \"type\":\"neptune\", \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\", \"graph\":\"http://blast-test/data\" }] }"
}
A successful execution will generate a jobId which should be handled as specifiec in the section completing asynchronous jobs
Response:
{
"message":"operations succeeded.",
"status":"success",
"simpleresults":{
"JobId":"job-8c2bd241-6633-4866-b525-2e91cb9a4800"
}
}
However, the results are opposite: a message is generated upon success, and a table upon failure.
- If /jobStatus returns "Failure", a table of detailed error messages can be retrieved through /getResultsTable
- If /jobStatus returns "Success" then a message may be retrieved via /jobStatusMessage.
Both described in get the data. The resulting table should be treated as an error message.
POST: http://host:12056/nodeGroupStore/getNodeGroupMetadata
This runs synchronously and returns a table of nodegroup id, comments, creation date, and creator.
Response:
{
"message": "operations succeeded.",
"table": {
"@table": {
"col_names": [
"ID",
"comments",
"creationDate",
"creator"
],
"rows": [
[
"My favorite nodegroupID",
"query returns information about something",
"2020-04-19",
"205000999"
],
...
]
}
}
}
POST: host:12056/nodeGroupStore/getNodeGroupRuntimeConstraints
{
"nodeGroupId": "My nodegroup ID"
}
Synchronously returns a table of the variables id, item type (PROPERTYITEM or NODE), and data type
Response:
{
"message": "operations succeeded.",
"table": {
"@table": {
"col_names": [
"valueId",
"itemType",
"valueType"
],
"rows": [
[
"?productionStage",
"PROPERTYITEM",
"STRING"
],
[
"?alloyName",
"PROPERTYITEM",
"STRING"
]
],
"type": "TABLE",
"col_type": [
"string",
"string",
"string"
],
"col_count": 3,
"row_count": 2
}
},
"status": "success"
}
All REST calls should be checked for both HTTP errors and SemTK errors.
HTTP level may have a **status **number and **error **and message:
Response:
{
"timestamp": "2019-07-24T20:02:26.035+0000",
"status": 400,
"error": "Bad Request",
"message": "JSON parse error: Unexpected character ('{' (code 123))",
"path": "/nodeGroupExecution/dispatchSelectById"
}
Failures inside SemTK, on the other hand, always have a status of "failure" and a rationale
Response:
{
"message": "operations failed.",
"rationale": "service: nodeGroupExecutionService method: dispatchAnyJobById() threw java.lang.Exception Could not find nodegroup with id: BLAST_GRC_ExpectedFunding NOPE\ncom.ge.research.semtk.api.nodeGroupExecution.NodeGroupExecutor.dispatchJob(NodeGroupExecutor.java:376)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchAnyJobById(NodeGroupExecutionRestController.java:475)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchSelectJobById(NodeGroupExecutionRestController.java:604)\n...",
"status": "failure"
}
Note that these should not be confused with SemTK successfully indicating that a job failed. This is not an service layer "error" but successful handling of job failure. The outer status indicates the status of the service call, where the inner simpleresults.status indicates the status of the job. For example:
Response:
{
"message":"operations succeeded.",
"status":"success",
"simpleresults":{
"status":"Failure"
}
}
Conversely, a failure retrieving results or status may be caused by HTTP layer or service failures. When this happens, the status of the actual job is unknown until the error is corrected.
The general flow for asynchronous jobs is:
- launch a query, e.g. with /dispatchSelectById
- wait with /waitForPercentOrMsec
- get status with /jobStatus - no longer needed as this is returned by final call to /waitForPercentOrMsec
- if failed, /jobStatusMessage- no longer needed as this is returned by final call to /waitForPercentOrMsec
- if succeeded, get table with /getResultsTable
Once a job is successfully launched, track its call with /waitForPercentOrMsec which will return when either maxWaitMsec passes or the job reaches percentComplete. You want maxWaitMsec to be short enough to avoid any timeout at the HTTP layer. percentComplete may simply set to 100, or it can be changed incrementally in order to control a status bar in your app.
POST: host:12058/nodeGroupExecution/waitForPercentOrMsec
{
"jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",
"maxWaitMsec":10000,
"percentComplete":5
}
A response will take the form:
Response:
{
"message": "operations succeeded.",
"status": "success",
"simpleresults": {
"percentComplete": "100"
}
}
If simpleresults.percentComplete is less than 100, make repeated calls until it reaches 100.
A sample failure has status of "failure" and a rationale. Note that this means the REST call failed. This does not reflect upon the success of a job.
Response:
{
"message": "operations failed.",
"rationale": "nodeGroupExecutionService/waitForPercentOrMsec threw java.lang.Exception Can't find Job Xreq_2e5089be-ac98-4cf9-8492-f57b77b3c0c0\ncom.ge.research.semtk.edc.JobTracker.getJobPercentComplete(JobTracker.java:179)\ncom.ge.research.semtk.edc.JobTracker.waitForPercentOrMsec(JobTracker.java:1195)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.waitForPercentOrMsec(NodeGroupExecutionRestController.java:303)\n...",
"status": "failure"
}
After a job is complete, its status can be obtained through /jobStatus
POST: host:12058/nodeGroupExecution/jobStatus
{
"jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",
}
This call will generate a response of the form:
Response:
{
"message":"operations succeeded.",
"status":"success",
"simpleresults":{
"status":"Success"
}
}
When interpreting this response, note that status and message refer to the REST call, while simpleresults.status indicates "Success" or "Failure" of the job. simpleresults.status will not exist unless status is "success"
If /jobStatus returns a simpleresults.status that is NOT "Success", its message can be retrieved through /jobStatusMessage
Note: failed ingestion jobs should retrieve a JSON table of error messages instead of a status message. This is done using the same method queries use to retrieve results shown in get the data
POST: host:12058/nodeGroupExecution/jobStatusMessage
{
"jobId":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",
}
which will generate a return with simpleresults.message. Again, note that message and status refer to the REST call, and not the job. status will be "success" regardless of the job status.
Response:
{
"message": "operations succeeded.",
"status": "success",
"simpleresults": {
"message": "Failed due to bad weather"
}
}
If /jobStatus returns a simpleresults.status of "Success", data can be retrieved via /getResultsTable.
Note: successful ingestion jobs have no table, but simply a success status message Note: some versions of SemTK require the capitalization jobID instead of jobId
POST: host:12058/nodeGroupExecution/getResultsTable
{
"jobID":"req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0",
}
Note: if your query may return very large results, you may need to switch to the SemTK results service /getTableResultsJsonForWebClient endpoint, which returns a URL.
/getResultsTable returns a JSON table inside table.@table, e.g.:
Response:
{
"message": "operations succeeded.",
"table": {
"@table": {
"col_names": [
"startdate",
"finaltotal",
"oppt_status"
],
"rows": [
[
"2019-10-01T00:00:00",
"0.0",
"Proposal in progress"
],
[
"2019-10-01T00:00:00",
"1.75028e+06",
"Outstanding"
]
],
"type": "TABLE",
"col_type": [
"http://www.w3.org/2001/XMLSchema#dateTime",
"http://www.w3.org/2001/XMLSchema#double",
"http://www.w3.org/2001/XMLSchema#string"
],
"col_count": 3,
"row_count": 2
}
},
"status": "success"
}
For the vast majority of cases, a sparqlConnection should be provided. It is a JSON string, so all quotes have to be escaped and the newlines shown below for clarity may not be allowed. This parameter is typically called an override connection and is loaded as part of your app's configuration so that dev, test, and stage work off different data connections, and the app is easy to update if data is moved.
POST: host:12058/nodeGroupExecution/dispatchSelectById
{
"nodeGroupId": "BLAST_GRC_ExpectedFunding",
"sparqlConnection":"{
\"name\":\"My_conn\",
\"domain\":\"\",
\"enableOwlImports\":true,
\"model\":[{
\"type\":\"neptune\",
\"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
\"graph\":\"http://blast-test/model\"
}],
\"data\":[{
\"type\":\"neptune\",
\"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
\"graph\":\"http://blast-test/data\"
}]
}"
}
Many endpoints accept runtime constraints of the form
runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"
Each constraint object consists of the SPARQL ID of the item being constrained, an operator, and operands.
Valid operators:
- MATCHES - operands are list of matches joined by "OR"
- REGEX
- GREATERTHAN
- GREATERTHANOREQUALS
- LESSTHAN
- LESSTHANOREQUALS
- VALUEBETWEEN - accepts two operands
- VALUEBETWEENUNINCLUSIVE - accepts two operands