Skip to content
dentoir edited this page Sep 5, 2016 · 5 revisions

An experimental RESTful API is available for accessing DMI-TCAT.

It is experimental, because not all DMI-TCAT functionality is available. In fact, other than getting some information about query bins, only the function to purge tweets has been implemented.

Invoking the API functions

A Representational State Transfer (REST) interface is characterised by three main features:

  • Resources, identified by URIs.
  • Operations, performed on the resources.
  • Representations, of the resources.

The representation is controlled by the standard HTTP "Accept" header in the HTTP request.

All API operations support JSON, HTML and plain text representations. Some API operations may support other representations too.

JSON

Normally, the API will be invoked by a computer program and will want to use the machine-readable JSON representation. This is also the default representation, if there is no HTTP "Accept" header in the HTTP request.

For example, using curl as the client the following will list all the query bins in JSON:

curl -H "Accept: application/json" -u admin:«password» http://«hostname»/api/querybin.php

Note: actually specifying the Accept header for JSON is unnecessary with curl, since curl doesn't send an Accept header by default and therefore the default JSON representation will be returned.

HTML

For interactive use, the functions can be accessed via a Web browser, in which case the HTML representation is returned. Note: this is because Web browsers indicates in the HTTP "Accept" header that they would like to receive "text/html".

For example, visit in a Web browser:

http://«hostname»/api/querybin.php

Plain text

Human readable text representation.

For example,

curl -H "Accept: text/plain" -u admin:«password» http://«hostname»/api/querybin.php

Command line invocation

In addition to implementing access to the API via HTTP, the API scripts can also be invoked from the command line.

For example, on the machine where DMI-TCAT is installed, run:

php /var/www/dmi-tcat/api/querybin.php --help

The "--help" command line provides a description of the available command line options.

In most cases, the output from the command line is the same as the plain text representation.

API functions

API version

Show the version of the API.

Resource: http://«hostname»/api/

Operation: GET

Note: the trailing slash is mandatory.

List query bins

List the names of all query bins in the deployment of DMI-TCAT.

Resource: http://«hostname»/api/querybin.php

Operation: GET

Query bin details

Shows some basic information about a specific query bin.

Resource: http://«hostname»/api/querybin.php/«binName»

Operation: GET

Query bin tweets

Warning: the information returned is different between JSON/HTML and CSV/TSV representations. This design "flaw" might be fixed in the future.

In JSON or HTML representations, shows the number of tweets in the selected time period (or all tweets, if no time period is specified).

In CSV or TSV, exports the actual tweets in the selected time period. Note: this is currently implemented as a HTTP redirection to the URL of the existing export function in DMI-TCAT.

Resource: http://«hostname»/api/querybin.php/«binName»/tweets

Operation: GET

Query parameters:

  • startdate: tweets before this timestamp are not included. If this parameter is not specified, it is as if the timestamp of the earliest tweet is specified. See timestamp syntax below.

  • enddate: tweets after this timestamp are not included. If this parameter is not specifed, it is as if the timestamp of the latest tweet is specified. See timestamp syntax below.

  • export: optional query parameter to set representation to either 'csv' or 'tsv'. Useful when using a Web browser where the HTTP Accept header cannot be set.

The time duration specified by startdate and/or enddate includes tweets which have timestamps equal to those times. If only startdate is specified, tweets equal to or after that time are included. If only enddate is specified, tweets from the beginning of capture up to and including that time are included. If neither startdate nor enddate is specifed, all captured tweets are included.

Other values for HTTP Accept header:

  • text/csv - export selected tweets in Comma Separated Values format.
  • text/tab-separated-values - export selected tweets in TSV format.

Purge tweets

Deletes tweets from the selected time period.

Note: this does not reduce the space occupied by the database on disk, since it is not compacted. But additional captures will not increase the size of the database on disk until the freed up space has been reused.

Resource: http://«hostname»/api/querybin.php/«binName»/tweets

Operation: DELETE

Note: since standard Web browsers do not support the DELETE method, alternatively a POST request with the action=tweet-purge query parameter can be used.

Query parameters:

  • startdate: tweets before this timestamp are not deleted. If this parameter is not specified, it is as if the timestamp of the earliest tweet is specified. See timestamp syntax below.

  • enddate: tweets after this timestamp are not deleted. If this parameter is not specifed, it is as if the timestamp of the latest tweet is specified. See timestamp syntax below.

Known limitations:

The tcat_* tables are not modified when tweets are purged, so the original capture periods remain. After purging tweets, it will appear as if the capture(s) were performed but no tweets appeared during the purged time period.

A possible enhancement could be an option to modify the tcat_* tables so that it appears as if capturing was not performed during the purged time period.

Timestamps

Syntax

Timestamps must be in the form of "YYYY-DD-MM HH:MM:SS TZ". The letter "T" (with no whitespace around it) can also be used to separate the date from the time. The whitespace before the timezone is optional.

Timezone

The timezone can be "Z", "UTC" or an offset. The format of a timezone offset is [+-]HH(:MM). That is, a mandatory plus or minus sign, followed by mandatory number of hours; optionally followed by a colon and a number of minutes.

For example,

  • 2016-02-28 17:10:00 Z
  • 2016-02-28T17:10:00UTC
  • 2016-02-28 17:10:00 +10:00
  • 2016-02-28 17:10:00-08:00

An API default timezone can be configured in the api/lib/common.php file. If configured, timestamps without an explicit timezone are interpreted in the API default timezone. If there is no API default timezone, timezones without an explicit timezone are invalid.

Partial timestamps

Partial timestamps can be specified by omitting the least significant components. For example, specifying everything up to the hour, but omitting the minutes and seconds.

Partial timestamps are interpreted as the beginning of the period for startdates, and as the end of the period for enddates.

For example, as the startdate:

  • 2016-03-14T09:15Z is 2016-03-14T09:15:00+00:00
  • 2016-03-14T09Z is 2016-03-14T09:00:00+00:00
  • 2016-03-14Z is 2016-03-14T00:00:00+00:00
  • 2016-03Z is 2016-03-01T00:00:00+00:00
  • 2016Z is 2016-01-01T00:00:00+00:00

For example, as the enddate::

  • 2016-03-14T09:15Z is 2016-03-14T09:15:59+00:00
  • 2016-03-14T09Z is 2016-03-14T09:59:59+00:00
  • 2016-03-14Z is 2016-03-14T23:59:59+00:00
  • 2016-03Z is 2016-03-31T23:59:59+00:00
  • 2016Z is 2016-12-31T23:59:59+00:00