-
Notifications
You must be signed in to change notification settings - Fork 3
5. Create config file
We use configuration (i.e., config) files to pass parameter options to the analysis scripts. The easiest way to get started is to modify a pre-existing file (here is an example).
The configuration file must be named according to the format pictured below. The names used in the file name determine the naming of the output folders as illustrated in the example.
Some other example config file names include:
- config-pixar_ROI1-ROI2.tsv - will generate a pixar/ROI1-ROI2 results folder. The ROI names can be anything as long as they include no spaces.
- config-pixar_ToM-timecourses.tsv - will generate a pixar/ToM-timecourses results folder
Each row of the configuration file specifies an analysis option that the user can specify when running a pipeline script. The fields provided in the example config file should not be deleted or renamed, but you can change the values passed to each row. We may add additional options to the config file as we continue to develop pipelines in the lab, but the table below provides an overview of the current options.
config file tips
-
The pipeline scripts will parse the inputs passed to the config file options and will error out if an input is expected but not provided. In other words, some of the options below require explicit enabling or disabling of the option (e.g., yes or no) to be included instead of left blank. Please read through the table below carefully and include the appropriate input option described in the final column.
-
When editing the configuration file for your analysis, remember that this is a tab-separated values file, so you need to include a single tab (not a space) between the analysis option and the input value you're passing to it!
-
Relatedly, the scripts are expecting the configuration file to have 2 columns (separated by tabs). If there's an extra space or tab after the input value, you might get an error such as this
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3
. This means that on line 4 of your config file, there were actually 3 columns instead of 2 suggesting there's an extra space or tab after the input value. -
Some of the options are meant to be provide flexibility throughout a series of analyses, so modifications to the config file might need to happen after starting an analysis. A good example of this is the resultsDir option. This should be left blank when initially running an analysis (when no results folder has been generated), but you will likely want to pass a results directory to this option if you want to run additional participants through the pipeline and have their results saved to the same results directory or run additional firstlevel pipelines and have the outputs saved to the same directory, etc. Reading through the table below carefully and understanding what the options do will ensure you make the most of the config file in your analyses.
-
Not all of the options below are used by each pipeline, so you could, for example, only specify the options used by the firstlevel pipeline prior to running that analysis and then later specify the options for the timecourse or secondlevel pipeline. Ideally, you want to fill out your config file as completely as possible at the onset of the project, but if decisions are still being made or if you don't intend to use one of the pipelines, these fields can be left blank.
Option | Description | Inputs |
---|---|---|
sharedDir | Location of shared directory on the server. This is where singularities, data files, and scripts are saved and copied over when setting up a new project. Some scripts will search for files in this directory instead of the your project directory to avoid having to copy over larger files (e.g., search space ROIs) for each project. | path to directory (on the EBC server: /EBC/processing) |
bidsDir | Location of BIDS directory | path to directory |
derivDir | Location of derivatives directory where fMRIPrep outputs are saved | path to directory |
resultsDir | Location of output directory with processed data. This option is useful if you want to output results files for new participants to a pre-existing folder. Use case examples include: running a pipeline for a subset of participants and wanting to save results for the remaining participants to the same output folder. | path to directory or leave blank |
smoothDir | Location of output directory with smoothed data. This option is useful if you want to pick up partially processed files (e.g., smoothed files) from a prior run of the pipeline. Use case examples include: (1) running the firstlevel pipeline and wanting to use the smoothed outputs for the timecourse pipeline or (2) running the firstlevel pipeline to generate one set of contrast files (e.g., mind-body) and wanting to use the smoothed files to generated another set of contrast files (e.g., faces-scenes). You can use this option with splithalf data (i.e., pick up and analyze data that has been split and smoothed). For this functionality, you should leave the splithalf field (described below) as yes so other files are named and split accordingly. The data will not be split again. If the smoothed files live within the results directory, the same path can be provided for both config fields. | path to directory or leave blank |
resampleDir | A path to ROI or search space nifti files. Used by the resample_ROIs.py script. You can leave this blank if you're not resampling ROIs. | path to directory containing the ROI file(s) to resample or leave blank |
task | The name of the task used in the BIDS formatted file names | task name (e.g., pixar) |
sessions | The label(s) used for labelling session folders | 01 or no |
multiecho | Whether the functional data were aquired using a multi-echo sequence. This is currently no for the ongoing Richardson Lab projects | yes or no |
FD_thresh | Framewise displacement threshold to use for tagging outlier volumes | any number (default: 1) |
DVARS_thresh | Standardized DVARS threshold to use for tagging outlier volumes | any number (default: 1.5) |
art_norm_thresh | Composite motion threshold relative to the previous timepoint | any number (default: 1) |
art_z_thresh | SD threshold for fluctuations in global signal | any number (default: 3) |
ntmpts_exclude | Run will be excluded if the proportion of outlier volumes is greater than this number | any number (default: .33) |
dropvols | Number of volumes to drop from beginning of each run | any number (default: 0) |
smoothing | Smoothing value to use; if 0 spatial smoothing will be skipped. If providing a resultsDir with previously smoothed data, this value should likely be 0 or the already smoothed data will be further smoothed. | any number (default: 5) |
hpf | Value to use for temporal high pass filtering (in seconds) | any number (default: 100) |
filter | Filtering method to use (not relevant for standard firstlevel_pipeline script) | no, butterworth, cosine |
detrend | Option to detrend data when running the timecourse pipeline (not relevant for standard firstlevel_pipeline script). This can usually be set to 'no' as slow drift (for the length of experiments we run) is typically dealt w/ by the high-pass filter. | yes or no |
standardize | Option to standardize data when running the timecourse pipeline (not relevant for standard firstlevel_pipeline script) | no (do not standardize the data), zscore (timeseries are shifted to zero mean and scaled to unit variance; uses population std), zscore_sample (timeseries are shifted to zero mean and scaled to unit variance; uses sample std), psc (timeseries are shifted to zero mean and scaled to PSC (as compared to original mean signal)) (more info on strategies to standardize signal) |
splithalf | Whether to analyse the data as separate halves | yes or no |
events | Events to analyse, separated by commas. Formatted to match the conds column in the contrasts file and the trial_type column in the events file(s), which should already use the same condition/trial labels. This can include adult timecourse regressors. | any number of events that are specified as conditions within the events files or ROI timecourse files: mind, body, faces, scenes (events); RTPJ, VMPFC (timecourses) |
modulators | Option to use specified events as parametric (or amplitude) modulators in the model. If yes, then the events files need to have an amplitude column that specifies the modulation value for each onset and trial type. | yes or no |
contrast | List of contrasts to model separated by commas or no. The events specified above can be contrasted in any way as long as there is a corresponding row in the contrasts file that describes the weights to assign each condition. The contrast names specified here must match the desc column in the contrasts file. The contrast files generated by the model will have this label. If no, the parameter estimate for each variable in the model will be returned. | any number of contrasts that are specified in the contrasts file: mind-body, faces-scenes (events); RTPJ-S2 (timecourses) |
timecourses | Option to use adult timecourse regressors. If desired, then the informative part of the file name needs to be passed to this argument, separated by commas (if more than 1 ROI file is needed). These files are described in greater detail on the first-level modeling wiki | list of ROI timecourses (e.g., ToM_ROIs, FaceSceneObject_ROIs) or no |
regressors | List of regressors to use in GLM, separated by commas. This includes timecourse regressors because a given ROI timecourse file may have more timecourses than desired for the model. If ROI timecourses are requested, they must use the same ROI names as the ROI timecourse file. | art, aCompCor, FD, DVARS, motion_params-6, motion_params-12 (or other fMRIPrep generated confounds), RTPJ, LTPJ, PC, DMPFC, MMPFC, VMPFC (or other ROI timecourses) (default: art, aCompCor) |
template | The name (or partial name) of the template data are resampled to. This is used to select the correct ROI/search space files where relevant or to resample ROI files (using the resample_ROIs.py script). | name (or partial name) of template (default: MNI152NLin2009cAsym_res-02_T1w) |
search_spaces | List of ROI search spaces to extract timeseries from, separated by commas | list of ROI names (matching file names) or none |
match_events | Field used by the define_fROIs.py script. To avoid having to modify the config file and re-run the define_fROI script for each search space and contrast of interest, the script will optionally check whether the search space ROI matches the events field. This is useful when the events are contrasts of timecourses (e.g., LTPJ-S2). In this case, you'd likely want to apply the LTPJ search space (and not the other ToM search spaces) to this contrast and so on for each timecourse contrast specified under the events field. If yes, the define_fROI script will check whether the search space ROI name exists within the event and skip search space ROIs that do not match the event name. If no, the define_fROI script will return fROIs for each search space and contrast specified in the config file. | yes or no |
top_nvox | Number of top voxels to extract from ROI search space (if requested) | any number (default: 80) |
mask | Which mask(s) to apply when extracting either timecourses or stats values from an image, separated by commas. If no ROI masking is desired, specify whole_brain. If the mask image starts with fROI, the script will assume a functional ROI file exists for each subject in the list provided. It will search for the fROI files in the resultsDir specified above. If a group or atlas based ROI is provided (i.e., fROI is not in the name), the script will search within your <PROJECT_NAME>/files/ROI folder for the ROI file requested. | a list of ROIs (e.g., whole_brain (no ROI), fROI-LTPJ_splithalf1 (fROI), RTPJ (group/atlas ROI) |
extract | Option to specify whether you want mean or voxelwise values returned. If mean is requested, an averaged (across all voxels within the specified mask) timecourse vector will be saved. If voxelwise is requested, a time x voxel array will be saved. | mean or voxelwise |
nonparametric | Option for secondlevel modeling script. Whether to use a nonparametric (i.e., permutation-based) approach in the group-level analysis | yes or no |
npermutations | Option for secondlevel modeling script. Number of permutations to use. This will only be applied if nonparametric is yes | any number (default: 5000) |
group_comparison | Option for secondlevel modeling script. The type of group comparison: within-group or between-group. | within or between |
group_variables | Option for secondlevel modeling script. The variables to use in the group analysis. These must be included in the subject-condition file passed to the secondlevel pipeline. Only the variables listed in this field will be included in the model, even if additional variables are included in the subject-condition file. | any variables (e.g., age, gender) |
est_group_variances | Option for secondlevel modeling script. Whether variances should be estimated separately for each group. | yes or no (default: no) |
tfce | Option for secondlevel modeling script. Whether to threshold outputs using Threshold-Free Cluster Enhancement. | yes or no |
overwrite | Whether to overwrite results if outputs from a prior analysis are found. If no a new, timestamped output folder will be created alongside the prior ouputs. | yes or no |
Configuration files must be saved in the main project directory. This is where the pipeline scripts will search for them, and you will get a file not found error if they are saved elsewhere. Because the config files are always saved in the same location relative to the scripts, you do not need to pass the path to the config file when running the scripts that require a config file. Only the name of the config file needs to be provided so the pipeline script picks up the correct config file, in the event you have multiple saved in your project directory.