Recipe

Recipe describes how Valor should execute a pipeline. A pipeline is one execution flow of Valor. Each recipe consists of one or more resources and also one or more of its framework frameworks.

Resource

Resource is something to be either validated, evaluated, or both. Each recipe should contains one or more resources. An example:

resources:
- name: user_account
  type: file
  path: ./example/resource
  format: json
  batch_size: 3 # new in v0.0.5
  regex_pattern: "[a-z]" # new in v0.0.6
  framework_names:
  - user_account_evaluation
...

Each resource is defined by a structure with the following fields:

Field	Description	Format	Example
name	a unique name of the resource	it is suggested to be descriptive and needs to follow regex [a-z_]+	user_account
path	the path where the resource should be read from	it has to follow the type format. for example, if the type is file (to indicate local file or directory), then the format should follow how a file path looks like	./example/resource
type	describes the type of path in order to get the resource	currently available: file	file
type	describes the type of path in order to get the resource	file describe that the path is of type file. if the path value is actually a directory but the type is set to be a file, then all files within that directory will be read.	file
format	indicates what format a resource was stored	currently available: json and yaml	json
batch_size	(new in v0.0.5) indicates the number of resources to be processed at one time	if not set, default value is 4 (four) if value is negative, batch size being used is the number of data within each resource other cases, batch size is based on the set value until maximum number of data for each resource	4
regex_pattern	(new in v0.0.6) regex pattern to match the path with	should be a valid regex pattern	[a-z]
framework_names	indicates what frameworks to be executed against a resource. execution of one framework name to another is done sequentially and independently.	each framework name should point to an existing framework	user_account_validation

Note that every field mentioned above is mandatory unless stated otherwise.

Framework

Framework describes how to validate and/or evaluate a resource and how to return the result. One framework can be used by multiple resources. An example of framework:

...
frameworks:
- name: user_account_evaluation
  schemas:
  - name: user_account_rule
    type: file
    format: json
    path: ./example/schema/user_account_rule.json
    output:
      treat_as: error
      targets:
      - name: std_output
        type: std
        format: yaml
  definitions:
  - name: memberships
    format: json
    type: file
    path: ./example/definition
    regex_pattern: "[a-z]" # new in v0.0.6
    function:
      type: file
      path: ./example/procedure/construct_membership_dictionary.jsonnet
  procedures:
  - name: enrich_user_account
    type: file
    format: jsonnet
    path: ./example/procedure/enrich_user_account.jsonnet
    output:
      treat_as: success
      targets:
      - name: std_output
        type: std
        format: yaml

The following is the general constructs for a framework:

Field	Required	Description	Format	Output
name	true	defines the name of a particular framework.	it is suggested to be descriptive and needs to follow regex `[a-z_]+`	-
schemas	false	defines how to validate a resource.	it is an array of `schema` that will be executed sequentially and independently.	for each schema, the output of validation is either a success or an error message.
definitions	false	definitions are data input that might be required by procedure. definitions helps evaluation to be more efficient when external data is referenced multiple times.	it is an array of `definition` that defines how a definition should be prepared.	for each definition, the output is expected to be an array of JSON object.
procedures	false	defines how to evaluate a resource.	it is an array of `procedure` that will be executed sequentially with the ability to pass on information from one procedure to the next.	vary, dependig on how the procedure is constructed.

Schema

Schema is mainly used for validation. A schema composes of one or more rules on how a data should look like. Currently, schema only follows the specification by JSON schema. The following is an example of basic construct of a schema in a framework:

...
name: user_account_rule
type: file
format: json
path: ./example/schema/user_account_rule.json
output:
  treat_as: error
  targets:
  - name: std_output
    type: std
    format: yaml
    path: ./out
...

Field	Description	Format
name	the name of schema	it has to be unique within a framework only and should follow `[a-z_]+`
type	the type of data to be read from the path specified by path	currently available is `file` only
format	the format being used to decode the data	currently available is `json` only, pointing that it's a JSON schema
path	the path where the schema rule to be read from	the valid format based on the type. if the specified path is a directory, then only the first file will be used as schema.
output	defines how output of the schema execution will be handled	it is optional. if it is being set, then its required fields should be specified.
output.treat_as	treatment that will be run against the output	currently availalbe: `info`, `warning`, `error`, `success`. if it is set to be `error`, then execution will not be continued.
output.targets	specifies the target output streams to write the result	it is an array of object, that needs to have a least one member
output.targets[].name	name of the output stream	it can be anything, but should be unique within the targets and should follow `[a-z_]+`
output.targets[].type	the type of output stream	currently available: `file` and `std`, where the `std` is the standard output on console.
output.targets[].format	format output that will be written	currently available: `yaml` and `json`
output.targets[].path	the path where to write the output	it is required when the target type is `file` but not considered when it is set to be `std`

Note that every field mentioned above is mandatory unless stated otherwise.

And the following is an example of JSON schema, pointed by path:

{
    "title": "user_account",
    "description": "Schema to validate user_account.",
    "type": "object",
    "properties": {
        "email": {
            "type": "string"
        },
        "membership_id": {
            "type": "integer"
        },
        "is_active": {
            "type": "boolean"
        }
    },
    "required": [
        "email",
        "membership_id"
    ],
    "additionalProperties": false
}

The above example is a validation rule for data user_account, where for every value in its email field should be a string, its membership field should an integer, and its is_active field should be a boolean. If any of the actual resource (or one could say, record) does not comply, then error will be triggered.

Definition

Definition is external data that could be used by procedures. Definition is usually utilized when one or more procedures want to load one or more externals data once and use it multiple times efficiently. Definition is like a static reference data. An example of definition construct in a framework:

...
name: memberships
format: json
type: file
path: ./example/definition
regex_pattern: "[a-z]" # new in v0.0.6
function:
  type: file
  path: ./example/procedure/construct_membership_dictionary.jsonnet
...

Field	Description	Format	Output
name	the name of definition	it has to be unique within a framework only and should follow `[a-z_]+`	-
type	the type of data to be read from the path specified by path	currently available is `file`	-
format	the format being used to decode the data	currently available is `json` and `yaml`	-
path	the path where to read the actual data from	the valid format based on the type	-
regex_pattern	*(new in v0.0.6)* regex pattern to match the path with	valid regex pattern	-
function	an optional instruction to build a definition, where the instruction follows the Jsonnet format	-	dictionary where the key is the name and the value is up to the actual function defined under function.path
function.type	defines the type of path specified by function.path	it should be valid for the given function.path with currently available is `file`	-
function.path	defines the path where to read the actual function	should be valid according to the function.type	-

Note that every field mentioned above is mandatory unless stated otherwise.

As mentionend, every definition function should follow Jsonnet format. Apart from that, there are a few additional rules involved when defining a definition:

the final definition output depends on the function:
- if function is not set, then the actual output will be a dictionary where the key is the definition name and the value is an array
- if function is set, then the actual output will be a dictionary where the key is the definition name and the value is up to the actual function to define
every definition function should define a special Jsonnet function with the following requirement:
- it has to be named construct
- it accepts one parameter
- it outputs one value
data being passed as the parameter of Jsonnet function is the raw data, which is an array of definition object
definition object is the actual data that is stored in the preferred place, such as a file
if the special function requires custom functions, then they should be initialized above the special function

The following is an example of actual definition function:

local construct (definitions) = {
    [std.toString(d.id)]: d
    for d in definitions
};

As shown above, there's only one function named construct. This is a special function that will be called by Valor, much like a "main" function. If needed, then the user can define some custom functions, like:

local custom_function() {
    // do something
};

local construct (definitions) = {
    custom_function(),

    [std.toString(d.id)]: d
    for d in definitions
};

The output when the definition function is not set:

{
    "memberships": [
        {
            "id": 1,
            "name": "premium",
            "description": "Membership which involves payment"
        }
    ]
}

When the definition function is defined where it outputs an object, then:

{
    "memberships": {
        "1": {
            "id": 1,
            "name": "premium",
            "description": "Membership which involves payment"
        }
    }
}

Procedure

Procedure is, like the name, one or more instruction to process data. Think of it like a the GO function. Procedure uses Jsonnet format. Even though it's similar with definition function in term of the format being used, it's acually different. If a definition function's purpose is to accept all external definition data and produces new data, then procedures's purpose is to accept every possible data and may or may not proceds new data. It might be a bit abstract, but let's take a look at its basic construct:

...
name: enrich_user_account
type: file
format: jsonnet
path: ./example/procedure/enrich_user_account.jsonnet
output:
  treat_as: success
  targets:
  - name: std_output
    type: std
    format: yaml
...

Field	Description	Format
name	the name of a procedure	it has to be unique within a framework only and should follow `[a-z_]+`
type	the type of data to be read from the path specified by path	currently available is `file` only
format	the format being used to decode the data	currently available is `jsonnet` only
path	the path where to read the actual data from	the valid format based on the type
output	defines how output of the procedure execution will be handled	it is optional. if it is being set, then its required fields should be specified.
output.treat_as	treatment that will be run against the output	currently availalbe: `info`, `warning`, `error`, `success`. if it is set to be `error`, then execution will not be continued.
output.targets	specifies the target output streams to write the result	it is an array of object, that needs to have a least one member
output.targets[].name	name of the output stream	it can be anything, but should be unique within the targets and should follow `[a-z_]+`
output.targets[].type	the type of output stream	currently available: `file` and `std`, where the `std` is the standard output on console.
output.targets[].format	format output that will be written	currently available: `yaml` and `json`
output.targets[].path	the path where to write the output	it is required when the target type is `file` but not considered when it is set to be `std`

Note that every field mentioned above is mandatory unless stated otherwise.

As mentioned, procedure follows the Jsonnet format. Though, there are some rules for it to be executed properly by Valor:

each procedure should have special Jsonnet function named evaluate, which:
- accepts resource, definition, and previous parameter sequentially, and
- may or may not return data, depending on how the function is defined
resource parameter in evaluate function refers to one resource data defined under resources, which is in a JSON format
definition parameter in evaluate function refers to the whole definition defined under definitions, which is in the form of dictionary (JSON object) where the key is the definition name
previous parameter in evaluate function refers to the output of the previous procedure execution within a framework and it will be a null value if the current procedure is the first to be executed
any additional function which might be required by the special function should be initialized beforehand

The following is an example of a procedure:

local evaluate(resource, definition, previous) =
    local membership_dict = definition['memberships'];
    local membership_id = std.toString(resource.membership_id);
    local current_membership = membership_dict[membership_id];
    {
        email: resource.email,
        membership: current_membership.name,
        is_active: resource.is_active
    };

On the procedure above, there's only one function, which is evalute. Behind the scene, Valor will call this function. In the line,

...
local membership_dict = definition['memberships'];
...

this function wants to extract a definition named memberships, and use it as a reference to process its business flow. This function may or may not use the provided parameters, and may or may not return any output. It is entirely up to the user. In the above example, this function outputs an object. If the funcition returns output, then it will be sent to the next pipeline, which can be:

a new procedure, where this output will be sent as parameter under previous, or
an output, where this output will be written out to output stream, or
nothing, where the output will not be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recipe.md

recipe.md

Recipe

Resource

Framework

Schema

Definition

Procedure

Files

recipe.md

Latest commit

History

recipe.md

File metadata and controls

Recipe

Resource

Framework

Schema

Definition

Procedure