This project is built on Rust. You need the following steps to run:
- To install Rust, use the following command on your system and then follow instructions:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Then clone the project:
git clone https://gitlab.engr.illinois.edu/shivamk4/cs-425-mp-4.git
- Now on every machine get the nodes up and running:
cd cs-425-mp-4/sdfs
cargo build --release
cargo run --release
An input field will spin up, and you can input your commands/requests.
Note: The scripts used for map and reduce operations must be Python scripts.
- Listing the nodes's membership list (stored using ip addresses):
list_mem
- Listing the nodes's own ip:
list_self
- Leaving the system:
leave
- PUT'ing file onto the filesystem:
put <local_file_path> <remote_file_name>
Example:
put /home/tmp/local_file.dat remote_file.dat
- GET'ing file from the filesystem:
get <remote_file_name> <local_file_path>
Example:
get remote_file.dat /home/tmp/local_file.dat
- Listing nodes storing a particular file:
ls <remote_file_name>
Example:
ls remote_file.dat
- Listing files stored by this particular node:
store
- Initiate GET from the same file on the SDFS by multiple nodes (multi-read):
multiread <remote_file_name> <local_file_path> <ip_1> <ip_2> <ip_3> ..
The ip's are your nodes' ip addresses. You can add however many ip's as you like. Example:
multiread remote_file.dat /home/tmp/local_file.dat 127.0.0.1 128.0.0.1 129.0.0.1 130.0.0.1
- Perform a map operation:
maple <local_python_script_path> <num_tasks> <output_prefix> <remote_source_directory> <executable argument 1> <executable argument 2> ..
You can add how many executable arguments as you want. The following example puts a dataset onto the file system then performs a regex search:
put dataset.csv dataset.csv
maple /home/scripts/regex_search_map.py 7 regex dataset \w*
- Perform a reduce operation:
juice <local_python_script_path> <num_tasks> <input_prefix> <output_file_name> <true|false>
For the last argument, input true
or false
to denote whether to delete the input files.
The following example is a follow up from the previous one:
juice /home/scripts/regex_search_reduce.py 7 regex search_output.txt true
- Performs a sequel filter using regex:
SELECT ALL FROM <dataset_directory> WHERE <regex>
The examples from map and reduce can be shortened as:
SELECT ALL FROM dataset WHERE \w*
Note how you don't need to provide an executable, and don't need to wrap the regex string in quotes.
The output file name will be dataset_filter
- Performs a sequel join using regex:
SELECT ALL FROM <dataset_1_directory> <dataset_2_directory> WHERE <d1_field> = <d2_field>
There must be spaces around =
.
The following example uploads 2 datasets to the filesystem, then performs a join:
put cars.csv cars.csv
put trucks.csv trucks.csv
SELECT ALL FROM cars trucks WHERE cars.price = trucks.price
The output filename will be cars_trucks_join