Skip to content
IS4 edited this page Feb 23, 2024 · 6 revisions

Usage

The usage of the application and the supported options can be displayed by passing -? as the command-line option.

Usage: (describe|search|list|about|options) [options] input... output

Modes

The application operates in five modes: describe, search, list, about, and options:

describe

This is the primary usage mode of the application. In this mode, it loads a collection of input files and describes them using RDF, saving the output in a desired RDF serialization format to the last file specified.

search

In this mode, the application requires a list of SPARQL queries (passed via -s) in addition to the input files and evaluates them on the RDF descriptions, saving the output in a desired SPARQL results serialization format to the last file specified.

list

In this mode, it only displays a list of supported components and their properties, which can be used to configure them further. The input or output files are not used.

about

In this mode, the names of components (such as analyzer:object) or collections (such as analyzer) are expected, showing information about them, such as their type, properties, descriptions thereof, or in which assembly they are defined.

options

In this mode, the application generates an XML document from the supplied options, storing it in the output file.

Input and output

Input files support wildcard characters like ? and * to select multiple files at once. Additionally, several special paths with custom handling are supported:

-, /dev/fd/0, or /dev/stdin (only for the console application)

When used as the input, uses the standard input stream to provide the data, in case you want to enter it directly or pipe from another process.

-, /dev/fd/1, or /dev/stdout (only for the console application)

When used as the output, writes it to the standard output stream, appearing directly in the console or piped to another process.

/dev/fd/2 or /dev/stderr

When used as the output, uses the standard error stream. May be useful if you want to see the RDF data but use the standard output for file extraction.

/dev/null (or NUL on Windows, case-insensitive)

When used as the output, any produced data is simply discarded (useful if you only want to see the log or perform extraction).

/dev/clipboard

Attempts to load the data from the clipboard or store it there, if used as the output. Does not work in the portable distribution and may not be usable in some browsers.

/dev/picker

Only available on the Windows distributions. When this path is encountered, a file dialog is opened to choose the file to load or save into.

/dev/folderpicker

Only available on the Windows distributions. When this path is encountered, a folder dialog is opened to choose the folder to load as the input.

Options

Additional arguments given to the application must be passed as options before the input files, beginning with - for the short form or -- for the long form. The options may also be provided equivalently through a configuration XML file.

Short form Long form Argument Description
q quiet No logging messages, normally sent to the standard error, are produced.
i include pattern From the previously excluded components, includes those matching the pattern.
e exclude pattern From the previously included components, excludes those matching the pattern.
f format extension or MIME type Sets the RDF serialization format of the output (ttl, jsonld, rdf etc.) in case of describe, or the SPARQL results format in case of search. Also deduced from the output file extension.
h hash pattern Set the primary data hash algorithm. The algorithm is also included as a component. Only a sufficiently collision-resistant hash algorithm should be used.
c compress Enables gzip compression for the output.
m metadata Adds annotation metadata to the output.
d data-only Only treats the input files as plain data, without file information.
u ugly Use compact, non-pretty mode of RDF output writing.
o only-once Do not process the same entity again if encountered multiple times.
b buffered one of none (0), temporary (1), or (default if unspecified) full (2) Sets the level of graph buffering of data. With full, all triples are buffered in a single graph before written out. With temporary (default), triples are temporarily buffered in an intermediate graph.
r root URI Sets the URI prefix that is used for unique entities which do not have a stable identifier. Without this option, only blank nodes are used. A prefix like urn:uuid: or a Skolem IRI prefix, under /.well-known/genid/, is recommended.
s sparql-query file The given file is executed as a SPARQL query, in case of describe for selecting files, or in case of search to query for information from the description.
p plugin id Loads a plugin with a particular identifier.
C config file Load additional configuration from a file.

Examples

describe dir/* out.ttl
Describes all files in dir using the default components, and saves the RDF output to out.ttl.
describe -d -h sha1 dir out.ttl
The same as above, but only loads the files in the directory as data (-d), without storing their names or other metadata. In addition to that, the SHA-1 hash algorithm is used to produce ni: URIs for content.
describe -f rdf dir -
As above, but writes the RDF description as RDF/XML to the standard output.
describe -b -f jsonld dir -
Writes the RDF description in JSON-LD instead. This requires buffering the output (-b).
describe -r urn:uuid: dir -
Does not use blank nodes to identify entities, instead using URIs starting with urn:uuid:.
describe -x *-hash:* -i data-hash:sha1 dir -
Does not use any of the supported hash algorithms, with the exception of SHA-1, to describe data.