Skip to content

Latest commit

 

History

History
79 lines (50 loc) · 3.67 KB

customization.md

File metadata and controls

79 lines (50 loc) · 3.67 KB

⚡️ Customized Implementation

You can build and customize your cluster from scratch according to your needs. Here in this section you'll find: (1) System prerequisites, (2) AI features, (3) OpenMLDB evaluation, (4) Flink evaluation.

Prerequisites

Before executing the benchmarking scripts, ensure that your environment meets the following version requirements, assuming you've already correctly configured the target FE system.

  • Java JDK: Version 1.8.0 or higher
  • Maven: 3.8.0 (recommended)

AI Features

In the features folder: Check out the features utilized in each of the 6 AI tasks, which are generated by the commercial automated ML tool HCML (the simplified version is available at https://github.com/4paradigm/AutoX ).

OpenMLDB Evaluation

Step 1: Clone the repository

Step 2: Download and move the data files to the dataset directory of the repository

Step 3: Start the OpenMLDB cluster. For a quick start, you can use the docker, but note that the performance may not be optimal since all the components are deployed on a single physical machine.

Please be aware that the default values for spark.driver.memory and spark.executor.memory may not be enough for your needs. If you encounter a java.lang.OutOfMemoryError: Java heap space error, you may need to increase them by setting spark.default.conf in conf/taskmanager.properties and restart taskmanager, or set spark parameters through CLI. You can refer to Spark Client Configuration.

spark.default.conf=spark.driver.memory=32g;spark.executor.memory=32g

Step 4: Modify the conf.properties.template file to create your own conf.properties file in the ./OpenMLDB/conf directory, and update the configuration settings in the file accordingly, including the OpenMLDB cluster and the locations of data and queries.

4.1 Modify the locations of data and query,

export FEBENCH_ROOT=`pwd`
# better to add file://
sed s#\<path\>#file://$FEBENCH_ROOT# ./OpenMLDB/conf/conf.properties.template > ./OpenMLDB/conf/conf.properties
sed s#\<path\>#$FEBENCH_ROOT# ./flink/conf/conf.properties.template > ./flink/conf/conf.properties

4.2 Modify the OpenMLDB cluster in conf.properties to your own,

# ./OpenMLDB/conf/conf.properties
ZK_CLUSTER=127.0.0.1:7181
ZK_PATH=/openmldb

Step 5: Compile and run the test

cd OpenMLDB
./compile_test.sh
./test.sh <dataset_ID>

Example test result looks as follows image

Flink Evaluation

Repeat the 1-5 steps in OpenMLDB Evaluation. And there are a few more steps:

  1. In Step 3, additionally start a disk-based storage engine (e.g., RocksDB in MySQL) to persist the Flink table data. Note (1) the listening port is set 3306 by default and (2) you need to preload all the secondary tables into the storage engine.

  2. In Step 5, supply <dataset_ID> when running compile_test.sh script; and no parameter when running test.sh, e.g.,

./compile_test.sh 3 # compile and run the test of task3
./test.sh # rerun the test of task3
  1. You will need to rerun compile_test.sh if you modify the file conf.properties. This is not required for OpenMLDB Evaluation.

image