Overview

This repository has the reproducible material for running the experiments of the TODO paper TODOpaperLink.

Requirements

Linux or MacOS operating system
Docker version 18.06.1-ce or higher
Code and data:

git clone https://github.com/NaelsonDouglas/DistributedMachineLearningThesis.git

** In order to generate the plots you'll need to install the folowing packages on your system *** apt install libxt6 libxrender1 libxext6 libgl1-mesa-glx libqt5widgets5 If you get the warning "dbind-WARNING **: 09:57:13.231: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files", you can solve it with: *** sudo apt-get install at-spi2-core

Running the experiments

Build the Docker image

After cloning this repository, execute the following commands.

Remark: GNU/Linux users may need to call sudo before docker command.

cd DistributedMachineLearningThesis/src/
docker build --no-cache -t dmlt .

Remark: it is mandatory that this image should be tagged to dmlt, so do NOT change the dmlt name in the command above.

Create an interactive Docker container and start a Bash session on it

First, let's set the DOCKERBIN variable according to your OS:

export DOCKERBIN="/usr/bin/docker" # Linux users
export DOCKERBIN="/usr/local/bin/docker" # MacOS users

The following command will create a container and run a Bash session on it by using the dmlt Docker image:

export CID=$($DOCKERBIN run --cpus 1 -tid -v /var/run/docker.sock:/var/run/docker.sock -v $DOCKERBIN:/usr/bin/docker -v /tmp/results:/DistributedMachineLearningThesis/src/results dmlt) && $DOCKERBIN exec -ti $CID /bin/bash

The -v parameters are necessary to allow the prototype to create further containers from the just-created container and to allow them to access the shared memory (i.e., the host file system).

Run the prototype

From the running container, run the following commands:

export JULIABIN="/root/julia/bin/julia"
cd DistributedMachineLearningThesis/src/
git pull #TODO specify the right tag
$JULIABIN
include("call_experiment.jl")
args =["4", "1000", "f1", "1234", "2", "2","summary"]
experiment(args)

The 'args' vector is a vector containing

The number of workers
Number of samples
The function name
Seed
n_cluster
dim
version of code to run: summary | histogram

You can select different configurations for 'args' and execute it as many times as you want. For a session you need to call include("call_experiments.jl") only once.

In order to run a customized experiment, please read the next Sections.

Performance metrics

This research prototype is assessed by using the following metrics:

Statistics metrics (accuracy)
MSE
MAPE
System metrics
Execution time: time to run the experiments
Exchanged data rate: refers to the network I/O among Julia Workers

Understand the parameters

The command used to run the prototype includes its parameters as described next.

Open a Julia session and run the following command:

include("script.jl")

Edit the vectors in the script.jl file to set up the experiment parameters which are explained next.

nof_nodes is the number of distributed nodes which experiment will launch.
nof_samples is the number of all examples (synthetic data sets) from which 80% will be used for training purposes and the latter 20% will be used for assessment.
function is the synthetic function that will be used to create the synthetic data sets. Currently, these see here the available functions.
seed specifies the seed that will be used to generate the TODO.
nof_neighborhoods
number_of_dataset_dimensions the number of function parameters, MUST be fixed in this way: $f1=2, f2=3, f3=2, f4=5, f10=1$ (more info here)
data_representation specifies whether to use histograms ("histogram") or summary statistics ("summary") to build the neighborhood. If not set, it is assumed to be "summary".

Output file format

The output file is named based on this standard:

TODO-Naelson

timestamp-function-number_of_distributed_sources-sample_size.txt

Prototype assessment as presented in paper TODO

Experimental Setup

The experiments used the following parameter configuration:

nof_nodes = [4, 8, 12, 16, 20, 24, 28, 32, 36]
nof_samples = [10^3, 10^4, 10^5]
function = [f1, f2, f4]
seed = [1..10]
nof_neighborhoods = [2, 3]
number_of_dataset_dimensions = according to each function
data_representation = ["histogram", "summary"]

Running the experiments

To run the experiments use the file script.jl:

include("script.jl")

Those parameters are explained in the section Understand the parameters

TODO MISC

use Julia standards
- documentation
- tests
- write execution data to log files with configurable log levels
define and implement performance metrics
- execution time
- speedup
- I/O rate
- throughput
- ...
write experiment data to files
- input
- configuration
- output
- execution time
write documentation in Markdown
allow parameter configuration (n. of repetitions, data set size, etc.)
naming
- replace salida by output
why @everywhere in many parts of the code?
- for example, in workers.jl it should be only necessary to call the code in the worker, do no need for @everywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 266 Commits
paper		paper
partial_results		partial_results
plots		plots
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirements

Running the experiments

Build the Docker image

Create an interactive Docker container and start a Bash session on it

Run the prototype

Performance metrics

Understand the parameters

Output file format

Prototype assessment as presented in paper TODO

Experimental Setup

Running the experiments

TODO MISC

About

Releases

Packages

Contributors 4

Languages

NaelsonDouglas/DistributedMachineLearningThesis

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Running the experiments

Build the Docker image

Create an interactive Docker container and start a Bash session on it

Run the prototype

Performance metrics

Understand the parameters

Output file format

Prototype assessment as presented in paper TODO

Experimental Setup

Running the experiments

TODO MISC

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages