Skip to content

Commit

Permalink
# Conflicts:
Browse files Browse the repository at this point in the history
#	CHANGELOG.md
#	docker/docker-compose.yml
#	pom.xml
#	src/main/scala/io/rml/framework/Main.scala
#	src/main/scala/io/rml/framework/core/util/ParameterUtil.scala
  • Loading branch information
ghsnd committed May 30, 2022
1 parent 1f17e9e commit 73e83e8
Show file tree
Hide file tree
Showing 66 changed files with 1,301 additions and 2,491 deletions.
12 changes: 10 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,16 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## Unreleased
## [2.4.0] - 2022-05-30

### Changed
* Use of external [FnO](https://fno.io/) function handling component

* Use `docker cp` for copying files between Docker containers.
* Updated Flink from verion 1.14.0 to 1.14.4

### Fixed
* Parameter for FnO docs (internal [issue #150](https://gitlab.ilabt.imec.be/rml/proc/rml-streamer/-/issues/150))

## [2.3.0] - 2022-04-26

### Added
Expand Down Expand Up @@ -181,3 +188,4 @@ can be set with the program argument `--baseIRI`.
[2.2.1]: https://github.com/RMLio/RMLStreamer/compare/v2.2.0...v2.2.1
[2.2.2]: https://github.com/RMLio/RMLStreamer/compare/v2.2.1...v2.2.2
[2.3.0]: https://github.com/RMLio/RMLStreamer/compare/v2.2.2...v2.3.0
[2.4.0]: https://github.com/RMLio/RMLStreamer/compare/v2.3.0...v2.4.0
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ If you want to develop, read [these instructions](documentation/README_DEVELOPME
RMLStreamer runs its jobs on Flink clusters.
More information on how to install Flink and getting started can be found [here](https://ci.apache.org/projects/flink/flink-docs-release-1.14/try-flink/local_installation.html).
At least a local cluster must be running in order to start executing RML Mappings with RMLStreamer.
Please note that this version works with Flink 1.14.0 with Scala 2.11 support, which can be downloaded [here](https://archive.apache.org/dist/flink/flink-1.14.0/flink-1.14.0-bin-scala_2.11.tgz).
Please note that this version works with Flink 1.14.4 with Scala 2.11 support, which can be downloaded [here](https://archive.apache.org/dist/flink/flink-1.14.4/flink-1.14.4-bin-scala_2.11.tgz).

### Building RMLStreamer

Expand Down Expand Up @@ -83,6 +83,8 @@ $FLINK_BIN run <path to RMLStreamer jar> toKafka --broker-list <host:port> --top
```
Usage: RMLStreamer [toFile|toKafka|toTCPSocket|noOutput] [options]
-f, --function-descriptions <function description location 1>,<function description location 2>...
An optional list of paths to function description files (in RDF using FnO). A path can be a file location or a URL.
-j, --job-name <job name>
The name to assign to the job on the Flink cluster. Put some semantics in here ;)
-i, --base-iri <base IRI>
Expand Down
1 change: 0 additions & 1 deletion configuration_example.properties

This file was deleted.

14 changes: 9 additions & 5 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,12 +157,10 @@ on `/var/lib/docker/volumes/docker_data/_data`, as shown by the `$ docker volume
Copy `scenario-1` subfolder to our docker data volume, that can be read by RMLStreamer:

```
$ [sudo] cp -r scenario-1 /var/lib/docker/volumes/docker_data/_data/
$ [sudo] chmod -R 777 /var/lib/docker/volumes/docker_data/_data
$ docker cp scenario-1/mapping.rml.ttl docker_taskmanager_1:/mnt/data/scenario-1/mapping.rml.ttl
$ docker cp scenario-1/input.json docker_taskmanager_1:/mnt/data/scenario-1/input.json
```

(TODO: is there a more user-friendly way to put data on docker volumes?)

### 2. Start RMLStreamer

Go back to your browser, and fill in the following `Program Arguments`:
Expand All @@ -178,5 +176,11 @@ If all goes well, you will see that the job has finished, after a few seconds:

![Job done](images/scenario-1-job-done.png)

The result is written to `/var/lib/docker/volumes/docker_data/_data/scenario-1/output.nt`
The result is written to `/mnt/data/scenario-1/output.nt` inside the Docker container.
and should contain the same triples as `scenario-1/output.nq`.

You can get the generated output from the Docker container by copying it back to the host:

```
$ docker cp docker_taskmanager_1:/mnt/data/scenario-1/output.nt .
```
4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ version: '3'
services:

jobmanager:
image: flink:1.14.0-scala_2.11-java11
image: flink:1.14.4-scala_2.11-java11
expose:
- "6123"
ports:
Expand All @@ -14,7 +14,7 @@ services:
- data:/mnt/data

taskmanager:
image: flink:1.14.0-scala_2.11-java11
image: flink:1.14.4-scala_2.11-java11
expose:
- "6121"
- "6122"
Expand Down
180 changes: 45 additions & 135 deletions documentation/README_Functions.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,56 @@
# README: Functions

## Built-in functions

When deploying and running jobs on Flink, make sure
- to either place the external jars (e.g. `IDLabFunctions.jar` and `GrelFunctions.jar`) in Flink's `lib` directory,
or package the RMLStreamer along with those jars by placing them in `src/main/resources`.
- to place the following files in the directory where the `flink run ...` command is issued.
These files can be obtained from `src/main/resources`:
- `functions_grel.ttl`
- `functions_grel.ttl`
- `grel_java_mapping.ttl`
- `idlab_java_mapping.ttl`
---

## Example: RML Streamer + Flink
Flink's `lib` directory should contain the jar-files with the custom functions. In this example, these are marked with `*`
```
flink-1.14.0-scala_2.11
└── lib
├── GrelFunctions.jar *
├── IDLabFunctions.jar *
├── flink-dist_2.11-1.14.0.jar
├── flink-table-blink_2.11-1.14.0.jar
└── flink-table_2.11-1.14.0.jar
```
When running the RML Streamer on Flink, the directory should look like
```
.
├── RMLStreamer-2.2.0.jar
├── functions_grel.ttl
├── functions_idlab.ttl
├── grel_java_mapping.ttl
├── idlab_java_mapping.ttl
├── mapping.ttl
└── input_data.csv
```
Note that the function descriptions and function mappings are present.
Two function libraries are supported out of the box in RMLStreamer:
* GREL functions (<https://github.com/FnOio/grel-functions-java>)
* IDLab functions (<https://github.com/FnOio/idlab-functions-java>)

No extra configuration or parameters need to be provided to use those.

Check out [src/test/resources/fno-testcases](../src/test/resources/fno-testcases) for example RML Mappings.

## Using other function libraries

RMLStreamer can also execute functions from other libraries.
To use functions, three things have to be provided:

* A JAR file containing the code to execute functions;
* Function descriptions: a document describing the functions in the library semantically using [FnO](https://fno.io/).
* Implementation mappings: a document mapping the functions from the function descriptions
to the implementation in the JAR file, also using FnO.

Often the function descriptions an the implementation mappings are bundled in one document.

How to do this is explained as an example in the [Function Agent](https://github.com/FnOio/function-agent-java#example)
library, which is used by RMLStreamer to handle functions.

The command for running the RML Streamer on Flink should look like
A minimal example is provided in the tests: [src/test/resources/sandbox/function_related/external_jar/](../src/test/resources/sandbox/function_related/external_jar)

If there are a JAR file and FnO descriptions, RMLStreamer can be invoked with the `-f` or `--function-descriptions`
parameter, like so:

```shell
$ FLINK_BIN run [Flink options] -c io.rml.framework.Main <path to RMLStreamer jar> \
toFile \
--output-path /tmp/helloworld.nt \
--mapping-file /<path to RMLStreamer root>/src/test/resources/sandbox/function_related/external_jar/mapping.ttl \
--function-descriptions /<path to RMLStreamer root>/src/test/resources/sandbox/function_related/external_jar/simple-test-function-fno.ttl
```
~/flink/flink-1.14.0-scala_2.11/bin/flink run -c io.rml.framework.Main RMLStreamer-2.2.0.jar toFile --output-path $(pwd)'/out.ttl' -m mapping.ttl
```

Notes
- The paths to funtion descriptions or JAR files can also be URLs;
- One can pass multiple function description locations to the `--function-descriptions` parameter separated by a space.

## Test Cases
### FnO Tests
The official FnO testcases that are working can be found at `test/resources/fno-testcases`.
The official FnO testcases that are working can be found at [src/test/resources/fno-testcases](../src/test/resources/fno-testcases).

### SandboxTests
> These tests are still experimental.
The resources can be found at `test/resources/sandbox/function_related` and are executed from `io.rml.framework.SandboxTests`.
The resources can be found at [src/test/resources/sandbox/function_related](../src/test/resources/sandbox/function_related)
and are executed from `io.rml.framework.SandboxTests`.
Every test output is compared to the RMLMapper's output (e.g. `output.ttl`) for that test. A test passes when its output is equal to the RMLMapper's output.
Tests marked as `pending` should be considered as not working.<br>

Expand Down Expand Up @@ -79,6 +81,8 @@ The sandbox testcases are

- `equal`: uses function `idlab-fn:equal`

- `external_jar`: uses the `HelloWorld` function from an external JAR file.

- `notEqual`: uses function `idlab-fn:notEqual`

- `using_trueCondition_and_contains`
Expand All @@ -89,101 +93,7 @@ The sandbox testcases are
- `idlab-fn:trueCondition`
- `idlab-fn:equal`




## Tutorial: Using a function from a local JAR

The following FnO-testcases use the function `toUpperCaseURL`
- RMLFNOTC0004-CSV
- RMLFNOTC0005-CSV
- RMLFNOTC0006-CSV

The following steps show how to integrate the `toUpperCaseURL` function in the RML Streamer.

### Step 1: creating the JAR
This step is based on the best-practice example [`grel-functions-java`](https://github.com/FnOio/grel-functions-java).
- Create package `io.fno.idlab` and within that package, create the class `IDLabFunctions`
- For these testcases, we need a function that returns the given URL in uppercase.
- Make sure to set the Maven compiler to a version compatible with the RMLStreamer's version.

The following listing serves a minimalistic example that shows a possible implementation of the `toUpperCaseURL`-function.
```Java
package io.fno.idlab;

public class IDLabFunctions {
public static String toUpperCaseURL(String s) {
if(!s.startsWith("http"))
return "http://" + s.toUpperCase();
return s.toUpperCase();
}
}
```

Use Maven to build a JAR-file, and move this JAR-file to the RMLStreamer’s `main/resources`.

### Step 2: defining the FnO descriptions
An FnO description represents the abstract definition of a function.<br>
The aforementioned testcases require a function that returns a valid URL in uppercase.
Its description is shown in the following listing, and can be found in `functions_idlab.ttl`.

```Turtle
idlab-fn:toUpperCaseURL
a fno:Function ;
fno:name "toUppercaseURL" ;
rdfs:label "toUppercaseURL" ;
dcterms:description "Returns an uppercase, valid url." ;
fno:solves grel:prob_ucase ;
fno:expects ( idlab-fn:_str ) ;
fno:returns ( idlab-fn:_stringOut ) .
```

### Step 3: map FnO descriptions to the corresponding implementations
In the previous step, the abstract functions were created.
The current step will define the link between abstract function descriptions and the corresponding implementation.
This is illustrated by the following listing, extracted from `idlab_java_mapping.ttl`.
```Turtle
grelm:IDLabFunctions
a fnoi:JavaClass ;
doap:download-page "IDLabFunctions.jar" ;
fnoi:class-name "io.fno.idlab.IDLabFunctions" .
grelm:uppercaseURLMapping
a fno:Mapping ;
fno:function idlab-fn:toUpperCaseURL;
fno:implementation grelm:IDLabFunctions ;
fno:parameterMapping [ ] ;
fno:returnMapping [ ] ;
fno:methodMapping [ a fnom:StringMethodMapping ;
fnom:method-name "toUpperCaseURL" ] ;
.
```
This mapping instructs the RML Streamer to look for a method called `toUpperCaseURL` within the `io.fno.idlab.IDLabFunctions`-class of the `IDLabFunctions.jar`. Make sure the JAR-file is located in `main/resources`.

## How the `FunctionLoader` works

The function descriptions and mappings mentioned in the previous steps will be used by the `FunctionLoader`.

First, a `FunctionLoader` has to be aware of the available functions.
Therefore, it can be instantiated providing file paths to the function description files.
When no such file paths are provided, the default function descriptions are used (i.e. `functions_grel.ttl`).

Secondly, function URIs are mapped to the corresponding implementations by parsing the function mappings
(e.g. `resources/grel_java_mapping.ttl` and `resources/idlab_java_mapping.ttl`).
During this step, every function URI is mapped to a `FunctionMetaData`-object which contains the necessary metadata such as: the *download-page* of the library, the *class-name* of the function, the *method-name*, *input parameters* and *output parameters*.


## How the `FunctionLoader` is used

Initially, the `FunctionLoader` is used to read and parse function descriptions and mappings.
Afterwards, when running FNOT-testcases, the `FunctionLoader`-instance is used by `io.rml.framework.engine.statement.FunctionMapGeneratorAssembler`
to load and bind every function as specified in the testcase's `mapping.ttl`.



# Remarks
- When the RMLStreamer is unable to find a function description or function mapping, bind method parameters to values, it will be logged as an error to the console
- When the RMLStreamer is unable to find a function description or function mapping, bind method parameters to values, it will be logged as a warning to the console
and the function will not be applied.

31 changes: 0 additions & 31 deletions documentation/README_Netty_Snapshot.md

This file was deleted.

34 changes: 32 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ SOFTWARE.

<groupId>io.rml</groupId>
<artifactId>RMLStreamer</artifactId>
<version>2.3.0</version>
<version>2.4.0</version>
<packaging>jar</packaging>

<name>RMLStreamer</name>
<url>https://rml.io/</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.14.0</flink.version>
<flink.version>1.14.4</flink.version>
<slf4j.version>1.7.32</slf4j.version>
<log4j.version>2.17.0</log4j.version>
<jena.version>4.3.1</jena.version>
Expand All @@ -55,6 +55,14 @@ SOFTWARE.


<repositories>
<repository>
<id>repo.maven.apache.org</id>
<url>https://repo.maven.apache.org/maven2/</url>
</repository>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
<repository>
<id>bintray</id>
<url>https://jcenter.bintray.com</url>
Expand Down Expand Up @@ -376,6 +384,28 @@ SOFTWARE.
<version>4.1.4</version>
</dependency>

<dependency>
<groupId>com.github.FnOio</groupId>
<artifactId>function-agent-java</artifactId>
<version>v0.1.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.jena</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.github.fnoio</groupId>
<artifactId>grel-functions-java</artifactId>
<version>v0.7.3</version>
</dependency>
<dependency>
<groupId>com.github.fnoio</groupId>
<artifactId>idlab-functions-java</artifactId>
<version>v0.1.0</version>
</dependency>

</dependencies>

<!-- This profile helps to make things run out of the box in IntelliJ -->
Expand Down
Loading

0 comments on commit 73e83e8

Please sign in to comment.