Skip to content

Setting up to run CCF Pipeline Processing on the WUSTL CHPC Cluster

Timothy B. Brown edited this page Aug 23, 2017 · 40 revisions

Setting up to run CCF Pipeline Processing on the WUSTL CHPC Cluster

Step 1. Get a CHPC account

The Connectome Coordination Facility (CCF) Pipeline Processing is currently run on the Washington University in St. Louis (WUSTL) Center for High Performance Computing (CHPC) cluster.

You will need to get an account on the CCF cluster in order to perform CCF Pipeline Processing using the Human Connectome Project (HCP) Pipeline Scripts.

You can start the process of getting an account by emailing the CHPC cluster administrator (Malcolm Tobias) at [email protected].

In order to run the pipelines and have appropriate access to the data stored in the XNAT archive and the scratch directories, your CHPC account will need to be in the following groups: 1033(connectome) and 60022(hcpx).

For other related activities, your account should also be in the groups: 60023(hcpi) and 60026(hcp_open).

Be sure to request that your account be in these groups when asking for the account to be created. You will probably have to get approval from Dan Marcus [email protected] to be included in those groups.

Step 2. Login to your CHPC account to verify access and change your password

Step 2a. Pick an SSH Client to use

If your desktop system is a Linux system or a Mac OSX system, you should be able to start up a terminal window and issue the following commands to login to the CHPC. If your desktop system is a Windows system, you will need to use a terminal emulator/SSH client program like MobaXterm, PuTTY, or Xshell. (There's nothing "endorsed" about the listed terminal emulator/SSH client programs. If you already have an SSH client program you're happy using, feel free to keep using it.)

Step 2b. Use your chosen SSH Client to access your CHPC account

Once you are running your Terminal/SSH Client and your CHPC account has been created, you should be able to access your account using command similar to the following commands.

For the examples in this documentation, we'll assume that your CHPC account is named <youraccount>. You will need to substitute your actual account name (e.g. tbbrown) for <youraccount> in the examples.

$ ssh -Y <youraccount>@login.chpc.wustl.edu

Note: Your SSH client may have some other mechanism for using SSH to connect to your account on the CHPC. But the "machine" you should connect to is login.chpc.wustl.edu. The -Y in the above command enables X11 forwarding. That is, it sets things up so that GUI-based programs that you run while logged in to your CHPC account show their interface on your local system. You will need to enable X11 forwarding using whatever mechanism your SSH client uses in order to use GUI-based programs (like emacs or gedit mentioned below) while logged in to your CHPC account.

You may be told that the The authenticity of host 'login.chpc.wustl.edu' can't be established. You can enter yes to confirm that you want to continue connecting.

Step 2c. Change your CHPC account password

You should then be prompted for your password. Enter the password that was supplied to you when you were provided the information for your new account.

Next, follow the instructions provided at How do I change my password? to change your CHPC account password.

You may want to create SSH keys for accessing the CHPC cluster so that you do not have to enter your password every time you open up a terminal/SSH connection to your account on the CHPC cluster. Doing so is not (yet) covered in this documentation.

Step 3. Install this repository in your CHPC account

The tools in this repository are intended to submit the necessary jobs to the cluster job scheduler to run the HCP Pipelines. You will want to install the tools from this repository into your CHPC account.

The intent is for you not to be doing code development (e.g. editing these tools) from your CHPC account. So you won't want to clone this repository into your CHPC account. Instead, you should visit the release page for this repository and find out what the latest release is. In these instructions, we'll assume that the latest release is v1.5.0. You'll need to substitute the actual latest release for v1.5.0 in the example commands below.

Step 3a. Get the tools/scripts

There are a number of different ways you can download the code from this repository into your CHPC account (e.g. using a browser on your desktop system, downloading the .zip or .tar.gz file to your desktop system, and then transferring the file over to your CHPC account). But the most straightforward way to get the code is to use the wget utility that is available on the CHPC. To do so, issue the following commands:

$ cd
$ mkdir pipeline_tools
$ cd pipeline_tools
$ wget https://github.com/Washington-University/xnat_pbs_jobs/archive/v1.5.0.tar.gz
$ tar xvf v1.5.0.tar.gz
$ ln -s xnat_pbs_jobs-1.5.0 xnat_pbs_jobs
$ rm v1.5.0.tar.gz

Step 3b. Set up your environment

Next, you need to set the environment variables that are used by these tools.

There are a number of text editors installed on the CHPC systems for you to use. Among them are emacs, nano, vi/vim, and gedit. If you are not familiar with any of these and don't know which one to use, just use gedit for now. It has a fairly standard, menu-based GUI interface.

Open gedit with the following commands:

$ cd
$ gedit .bash_profile

Add the following environment variable setting lines to the end of the .bash_profile file.

# ============================================================
# Environment variables for xnat_pbs_jobs code
# https://github.com/Washington-University/xnat_pbs_jobs
# ============================================================

# Indicate that our compute cluster is the CHPC cluster
export COMPUTE=CHPC

# Indicate that we are using CHPC clster version "2.0"
export CLUSTER="2.0"

# Code and scripts for submitting XNAT-aware PBS jobs to run HCP pipelines
export XNAT_PBS_JOBS=${HOME}/pipeline_tools/xnat_pbs_jobs

# Location to store log files that should not be pushed back into XNAT DB
# after pipeline runs
export XNAT_PBS_JOBS_LOG_DIR=${HOME}/joblogs

# Location where data is placed outside of XNAT DB for pipeline processing
# a.k.a. "the build space"
export XNAT_PBS_JOBS_BUILD_DIR=/HCP/hcpdb/build_ssd/chpc/BUILD

# Location where running status files are kept
export XNAT_PBS_JOBS_RUNNING_STATUS_DIR=/HCP/hcpdb/build_ssd/chpc/BUILD

# Location of the XNAT archive with which these jobs will be interacting
export XNAT_PBS_JOBS_ARCHIVE_ROOT=/HCP/hcpdb/archive

# Tools (binaries and scripts) supplied by NRG that are needed
export NRG_PACKAGES=/HCP/NRG/nrgpackages

# Some utilities for working with XNAT from the command line
export XNAT_UTILS_HOME=/export/HCP/xnat_utilities

# Location of XNAT Pipeline Engine
export XNAT_PBS_JOBS_PIPELINE_ENGINE=/export/HCP/pipeline

# XNAT Server to use
#export XNAT_PBS_JOBS_XNAT_SERVER=db.humanconnectome.org
export XNAT_PBS_JOBS_XNAT_SERVER=intradb.humanconnectome.org

# Location of Perl libraries needed for creating FreeSurfer Assessor
export PERL5LIB=${NRG_PACKAGES}/tools/HCP/Freesurfer/freesurfer_includes

# Add location of ImageMagick installation needed for creating FreeSurfer Assessor
# to the PATH
export PATH=/export/HCP/ImageMagick-6.6.7/bin:${PATH}

Be sure to save your changes to the .bash_profile file. After you've saved your changes, you will need to log out and log back in to your CHPC account for these environment variable settings to take effect.

Step 4. Install a repository of example control files for job submitting

The tools in this repository make use of so-called "control files". You will be editing these control files to determine some aspects of how jobs are submitted.

For example,

  • The resource limits for certain jobs are specified in initialization files.
  • The amount and kind of output produced on screen as jobs are submitted is controlled by settings in logging configuration files.
  • The sessions and subjects processed are specified in subject files

Examples of these control files are in a separate repository.

It will be common for you to edit these files to control the jobs you submit. Even though you may seldom make changes that should alter the contents of the actual control repository, we'll get the control file examples as a clone of the repository instead of as an installation of a release.

Step 4a. Get the example control files

Get the control files repository using the following commands:

$ cd ~/pipeline_tools
$ git clone https://github.com/Washington-University/xnat_pbs_jobs_control.git

Step 4b. Set up your environment

Next, you need to set the environment variable that is used to specify where the control files are located.

As before, open the ${HOME}/.bash_profile file in a text editor and add the following lines to the end of the file.

# ============================================================
# Environment variables for xnat_pbs_jobs_control
# https://github.com/Washington-University/xnat_pbs_jobs_control
# ============================================================

# Control files for XNAT PBS JOBS
export XNAT_PBS_JOBS_CONTROL=${HOME}/pipeline_tools/xnat_pbs_jobs_control

As above, you will need to save the changes to the .bash_profile file and log out and log back in to your CHPC account for these changes to take effect.

Step 5. Set up Python 3 usage for your CHPC account

These scripts require Python 3 to be configured for your CHPC account. By default, the Python version installed and configured to run for your CHPC account is in the Python 2 series (probably version 2.6.6.)

Step 5a. Set up your account to run Anaconda

We'll use Anaconda to set up your account so that it can use Python 3. First, edit your ${HOME}/.bash_profile file and add the following lines to the end of that file.

# ============================================================
# Environment variables for using Anaconda
# ============================================================

export PATH=/act/Anaconda3-2.3.0/bin:${PATH}

Save the file, log out, and log back in for this change to take effect.

Step 5b. Create a Python 3 environment named python3

Once you have your PATH configured to allow you to use the Anaconda tool, issue the following commands:

$ conda create --name python3 python=3
...
Proceed ([y]/n)? y

Fetching packages ...
openssl-1.0.2l ...
...

#
# To activate this environment, use:
# $ source activate python3
#
# To deactivate this environment, use:
# $ source deactivate
#

Step 5c. Activate your python3 environment

As the messages indicate, once the python3 environment is created, you can activate the environment by entering a command like: source activate python3. Go ahead and activate the python3 environment now.

$ source activate python3

If all works as expected, your command line prompt will now be prefaced with (python3) so that it looks similar to:

(python3)[<youraccount>@login01 ~]$

It is not a good idea to add this line to one of your bash initialization files (e.g. .bashrc or .bash_profile) because some of the code you will be submitting to run on CHPC nodes using your account requires a different version of Python (v2.7.x).

Step 5d. Install prerequisite Python 3 packages

You will need to install the requests package within this activated environment. To do so, make sure your python3 environment is activated and enter the following command:

$ conda install requests

Enter y when prompted whether to proceed.

You will need to install the pyqt package within this activated environment. To do so, make sure your python3 environment is activated and enter the following command:

$ conda install pyqt

Enter y when prompted whether to proceed.

Step 5e. Setting up your Python path to find the library of code in this set of tools

There are a number of Python modules that are included in this set of tools that are library modules. Your Python 3 environment must be set up in such a way that these modules can be found. The easiest way to do this is to create a .pth file in your Anaconda site-packages directory for the python3 environment that you just created.

Take the following steps:

$ cd ~/.conda/envs/python3/lib
$ ls

In the files listed, look for a directory named python3.5, python3.6, or something similar. That directory will contain the based installation of the Python version you are using. Whether it is named python3.5, python3.6, or python3.<whatever> will depend on what was the latest version of the Python 3 series when you created this environment. Change into that directory and then work your way further down into the site-packages directory:

$ cd python3.6
$ cd site-packages

Create a text file in the site-packages directory named pipeline_tools_lib.pth.

Put a single line of text similar to the following in your pipeline_tools_lib.pth file:

/home/<youraccount>/pipeline_tools/xnat_pbs_jobs/lib

Save the file.

Step 6. Create the joblogs directory

This directory is referenced in the XNAT_PBS_JOBS_LOG_DIR environment variable set above. Files will not be automatically deleted from this directory. So from time-to-time, you'll want to clean it up and clear out log files. Primarily log files from PUT jobs go here because the other log files mostly go in the build directory (XNAT_PBS_JOBS_BUILD_DIR) and then end up being put into database resources as records of what processing happened.

Take the following steps:

$ cd
$ mkdir joblogs

You should now be able to proceed to submitting jobs to run the HCP Pipelines.

See Configuring and running Structural Preprocessing to get started.