Skip to content
weisenje edited this page Dec 4, 2020 · 46 revisions

SemTK: Semantics Toolkit

SemTK is an open source project intended to provide easy interactions with semantic triplestores (RDF stores). It is built on the W3C Semantic Web standard.

It is composed of two main parts:

  • SemTK Java API / REST Services - code and services to facilitate interacting with semantic triplestore data (e.g. querying data, ingesting data)
  • SPARQLgraph - a Javascript-based graphical web application providing drag-and-drop access to many SemTK features

SemTK was developed by the Knowledge Discovery Lab at GE Research. Contact: Paul Cuddihy

Demos are available at semtk.research.ge.com and "Hello World" demo

SemTK is licensed under Apache 2. Please include our logo whenever possible.

Key Features

Below are some of the features that SemTK provides:

  • SPARQL query generation & execution (supports Virtuoso, Fuseki, Neptune, Jena triplestores, extensible to other SPARQL 1.1 stores)
  • Ingestion of tabular data
  • Storing queries by id
  • Utility functions (e.g. loading OWL/TTL files, clearing data)
  • Instance data browsing

The tool is designed for triplestores with an ontology-based model. We use SADL for ontology authoring.

Latest Additions

2020

  • visJs display of CONSTRUCT query results in SPARQLgraph
  • moving EDC (external data connections) to opensource
  • moving FDC (federated data connections) and FDCCache to opensource
  • improved ingestion speed using Jena in-memory cache

Capabilities Overview

SemTK was designed as a SPARQL generator, first mainly for SELECT queries. It then evolved to include important features for ingesting data with SPARQL auto-generated INSERT queries. A cloud infrastructure was added to support the storage of nodegroups (subgraphs of interest along with their connection information) and stored-procedure like capabilities which support application development.

Loading a connection

Part of each session, and stored with each nodegroup is a **connection **as shown below.

Note that this connection lists **server ** and dataset for a model graph. The dataset is essentially a named graph within the given server. Each connection may have multiple model connections where the ontology is stored, and multiple data connections.

OWL Imports checkbox indicates that SemTk should honor import triples in the model graph. Imports are performed by

  • finding the graph on the same server with the name specified in the OWL <rdf:RDF xml:base> field.
  • loading the model from that graph, following imports recursively

After a connection is loaded, the main SPARQLgraph screen might look like this.

Ontology Info (Top left)

The top left represents a cached version of the ontology, with only the following relationships captured. This sub-set of owl is most useful for generating SPARQL queries.

  • Classes
  • Sub-class relationships
  • Properties
  • Domain / Range of properties - note that complex ranges are not yet supported
  • Enums (SADL "must be one of")

Mousing over an item will display a tooltip with the items full URI, and any aliases or notes.

Nodegroup Pane (Top right)

In this section, classes can be dragged-and-dropped and properties chosen for returning, deleting, constraining, etc. Queries are then generated off these nodegroups.

The ontology info is critical in building a query that matches the model.

The connection is used such that the proper FROM or USING clauses are included in the SPARQL query so that it is performed against the entire ontology and collection of instance data.

Query (Center)

This is the query generated by SemTK. It may be INSERT, COUNT, DELETE.

SPARQL will include:

  • subclass inference (subClassOf *) for any class that has subclasses in the ontology
  • subproperty inference (subPropertyOf *) for any properties that have subproperties in the ontology

Results (Bottom)

Results of the most recent query

SPARQLgraph
Clone this wiki locally