-
Notifications
You must be signed in to change notification settings - Fork 1
Setting up to run CCF Pipeline Processing on the WUSTL CHPC Cluster
The Connectome Coordination Facility (CCF) Pipeline Processing is currently run on the Washington University in St. Louis (WUSTL) Center for High Performance Computing (CHPC) cluster.
You will need to get an account on the CCF cluster in order to perform CCF Pipeline Processing using the Human Connectome Project (HCP) Pipeline Scripts.
You can start the process of getting an account by emailing the CHPC cluster administrator (Malcolm Tobias) at [email protected].
In order to run the pipelines and have appropriate access to the data stored in
the XNAT archive and the scratch directories, your CHPC account will need to be
in the following groups: 1033(connectome)
and 60022(hcpx)
.
For other related activities, your account should also be in the groups:
60023(hcpi)
and 60026(hcp_open)
.
Be sure to request that your account be in these groups when asking for the account to be created. You will probably have to get approval from Dan Marcus [email protected] to be included in those groups.
If your desktop system is a Linux system or a Mac OSX system, you should be able to start up a terminal window and issue the following commands to login to the CHPC. If your desktop system is a Windows system, you will need to use a terminal emulator/SSH client program like MobaXterm, PuTTY, or Xshell. (There's nothing "endorsed" about the listed terminal emulator/SSH client programs. If you already have an SSH client program you're happy using, feel free to keep using it.)
Once you are running your Terminal/SSH Client and your CHPC account has been created, you should be able to access your account using command similar to the following commands.
For the examples in this documentation, we'll assume that your CHPC account is
named <youraccount>
. You will need to substitute your actual account name
(e.g. tbbrown
) for <youraccount>
in the examples.
$ ssh -Y <youraccount>@login.chpc.wustl.edu
Note: Your SSH client may have some other mechanism for using SSH to connect
to your account on the CHPC. But the "machine" you should connect to is
login.chpc.wustl.edu
. The -Y
in the above command enables X11 forwarding.
That is, it sets things up so that GUI-based programs that you run while logged
in to your CHPC account show their interface on your local system. You will need
to enable X11 forwarding using whatever mechanism your SSH client uses
in order to use GUI-based programs (like emacs or gedit mentioned below) while
logged in to your CHPC account.
You may be told that the The authenticity of host 'login.chpc.wustl.edu' can't be established.
You can enter yes
to confirm that you want to
continue connecting.
You should then be prompted for your password. Enter the password that was supplied to you when you were provided the information for your new account.
Next, follow the instructions provided at How do I change my password? to change your CHPC account password.
You may want to create SSH keys for accessing the CHPC cluster so that you do not have to enter your password every time you open up a terminal/SSH connection to your account on the CHPC cluster. Doing so is not (yet) covered in this documentation.
The tools in this repository are intended to submit the necessary jobs to the cluster job scheduler to run the HCP Pipelines. You will want to install the tools from this repository into your CHPC account.
The intent is for you not to be doing code development (e.g. editing these tools)
from your CHPC account. So you won't want to clone this repository into your CHPC
account. Instead, you should visit the release page
for this repository and find out what the latest release is. In these instructions,
we'll assume that the latest release is v1.5.0
. You'll need to substitute the
actual latest release for v1.5.0
in the example commands below.
There are a number of different ways you can download the code from this repository
into your CHPC account (e.g. using a browser on your desktop system, downloading
the .zip
or .tar.gz
file to your desktop system, and then transferring the
file over to your CHPC account). But the most straightforward way to get the code
is to use the wget
utility that is available on the CHPC. To do so, issue the
following commands:
$ cd
$ mkdir pipeline_tools
$ cd pipeline_tools
$ wget https://github.com/Washington-University/xnat_pbs_jobs/archive/v1.5.0.tar.gz
$ tar xvf v1.5.0.tar.gz
$ ln -s xnat_pbs_jobs-1.5.0 xnat_pbs_jobs
$ rm v1.5.0.tar.gz
Next, you need to set the environment variables that are used by these tools.
There are a number of text editors installed on the CHPC systems for you to use.
Among them are emacs
, nano
, vi/vim
, and gedit
. If you are not familiar
with any of these and don't know which one to use, just use gedit
for now. It
has a fairly standard, menu-based GUI interface.
Open gedit
with the following commands:
$ cd
$ gedit .bash_profile
Add the following environment variable setting lines to the end of the
.bash_profile
file.
# ============================================================
# Environment variables for xnat_pbs_jobs code
# https://github.com/Washington-University/xnat_pbs_jobs
# ============================================================
# Indicate that our compute cluster is the CHPC cluster
export COMPUTE=CHPC
# Indicate that we are using CHPC clster version "2.0"
export CLUSTER="2.0"
# Code and scripts for submitting XNAT-aware PBS jobs to run HCP pipelines
export XNAT_PBS_JOBS=${HOME}/pipeline_tools/xnat_pbs_jobs
# Location to store log files that should not be pushed back into XNAT DB
# after pipeline runs
export XNAT_PBS_JOBS_LOG_DIR=${HOME}/joblogs
# Location where data is placed outside of XNAT DB for pipeline processing
# a.k.a. "the build space"
export XNAT_PBS_JOBS_BUILD_DIR=/HCP/hcpdb/build_ssd/chpc/BUILD
# Location where running status files are kept
export XNAT_PBS_JOBS_RUNNING_STATUS_DIR=/HCP/hcpdb/build_ssd/chpc/BUILD
# Location of the XNAT archive with which these jobs will be interacting
export XNAT_PBS_JOBS_ARCHIVE_ROOT=/HCP/hcpdb/archive
# Tools (binaries and scripts) supplied by NRG that are needed
export NRG_PACKAGES=/HCP/NRG/nrgpackages
# Some utilities for working with XNAT from the command line
export XNAT_UTILS_HOME=/export/HCP/xnat_utilities
# Location of XNAT Pipeline Engine
export XNAT_PBS_JOBS_PIPELINE_ENGINE=/export/HCP/pipeline
# XNAT Server to use
#export XNAT_PBS_JOBS_XNAT_SERVER=db.humanconnectome.org
export XNAT_PBS_JOBS_XNAT_SERVER=intradb.humanconnectome.org
# Location of Perl libraries needed for creating FreeSurfer Assessor
export PERL5LIB=${NRG_PACKAGES}/tools/HCP/Freesurfer/freesurfer_includes
# Add location of ImageMagick installation needed for creating FreeSurfer Assessor
# to the PATH
export PATH=/export/HCP/ImageMagick-6.6.7/bin:${PATH}
Be sure to save your changes to the .bash_profile
file. After you've saved your
changes, you will need to log out and log back in to your CHPC account for these
environment variable settings to take effect.
The tools in this repository make use of so-called "control files". You will be editing these control files to determine some aspects of how jobs are submitted.
For example,
- The resource limits for certain jobs are specified in initialization files.
- The amount and kind of output produced on screen as jobs are submitted is controlled by settings in logging configuration files.
- The sessions and subjects processed are specified in subject files
Examples of these control files are in a separate repository.
It will be common for you to edit these files to control the jobs you submit. Even though you may seldom make changes that should alter the contents of the actual control repository, we'll get the control file examples as a clone of the repository instead of as an installation of a release.
Get the control files repository using the following commands:
$ cd ~/pipeline_tools
$ git clone https://github.com/Washington-University/xnat_pbs_jobs_control.git
Next, you need to set the environment variable that is used to specify where the control files are located.
As before, open the ${HOME}/.bash_profile
file in a text editor and add the following
lines to the end of the file.
# ============================================================
# Environment variables for xnat_pbs_jobs_control
# https://github.com/Washington-University/xnat_pbs_jobs_control
# ============================================================
# Control files for XNAT PBS JOBS
export XNAT_PBS_JOBS_CONTROL=${HOME}/pipeline_tools/xnat_pbs_jobs_control
As above, you will need to save the changes to the .bash_profile
file and log out and
log back in to your CHPC account for these changes to take effect.
These scripts require Python 3 to be configured for your CHPC account. By default, the Python version installed and configured to run for your CHPC account is in the Python 2 series (probably version 2.6.6.)
We'll use Anaconda to set up your account so that it can use Python 3. First, edit your ${HOME}/.bash_profile
file and add the following lines to the end of that file.
# ============================================================
# Environment variables for using Anaconda
# ============================================================
export PATH=/act/Anaconda3-2.3.0/bin:${PATH}
Save the file, log out, and log back in for this change to take effect.
Once you have your PATH configured to allow you to use the Anaconda tool, issue the following commands:
$ conda create --name python3 python=3
...
Proceed ([y]/n)? y
Fetching packages ...
openssl-1.0.2l ...
...
#
# To activate this environment, use:
# $ source activate python3
#
# To deactivate this environment, use:
# $ source deactivate
#
As the messages indicate, once the python3
environment is created, you can activate the environment by entering a command like: source activate python3
. Go ahead and activate the python3
environment now.
$ source activate python3
If all works as expected, your command line prompt will now be prefaced with (python3)
so that it looks
similar to:
(python3)[<youraccount>@login01 ~]$
It is not a good idea to add this line to one of your bash
initialization files (e.g. .bashrc
or .bash_profile
) because some of the code you will be submitting to run on CHPC nodes using your account requires a different version of Python (v2.7.x).
You will need to install the requests
package within this activated environment. To do so, make sure your python3
environment is activated and enter the following command:
$ conda install requests
Enter y
when prompted whether to proceed.
You will need to install the pyqt
package within this activated environment. To do so, make sure your python3
environment is activated and enter the following command:
$ conda install pyqt
Enter y
when prompted whether to proceed.
There are a number of Python modules that are included in this set of tools that are library modules. Your Python 3 environment must be set up in such a way that these modules can be found. The easiest way to do this is to create a .pth
file in your Anaconda site-packages
directory for the python3
environment that you just created.
Take the following steps:
$ cd ~/.conda/envs/python3/lib
$ ls
In the files listed, look for a directory named python3.5
, python3.6
, or something similar. That directory will contain the based installation of the Python version you are using. Whether it is named python3.5
, python3.6
, or python3.<whatever>
will depend on what was the latest version of the Python 3 series when you created this environment. Change into that directory and then work your way further down into the site-packages
directory:
$ cd python3.6
$ cd site-packages
Create a text file in the site-packages
directory named pipeline_tools_lib.pth
.
Put a single line of text similar to the following in your pipeline_tools_lib.pth
file:
/home/<youraccount>/pipeline_tools/xnat_pbs_jobs/lib
Save the file.
This directory is referenced in the XNAT_PBS_JOBS_LOG_DIR
environment variable set above.
Files will not be automatically deleted from this directory. So from time-to-time, you'll want to
clean it up and clear out log files. Primarily log files from PUT jobs go here because the other
log files mostly go in the build directory (XNAT_PBS_JOBS_BUILD_DIR
) and then end up being put
into database resources as records of what processing happened.
Take the following steps:
$ cd
$ mkdir joblogs
You should now be able to proceed to submitting jobs to run the HCP Pipelines.
See Configuring and running Structural Preprocessing to get started.