Skip to content

5. Create config file

Melissa Thye edited this page Sep 17, 2024 · 38 revisions

We use configuration (i.e., config) files to pass parameter options to the analysis scripts. The easiest way to get started is to modify a pre-existing file (here is an example).

Naming the file

The configuration file must be named according to the format pictured below. The names used in the file name determine the naming of the output folders as illustrated in the example. config_file

Some other example config file names include:

  • config-pixar_ROI1-ROI2.tsv - will generate a pixar/ROI1-ROI2 results folder. The ROI names can be anything as long as they include no spaces.
  • config-pixar_ToM-timecourses.tsv - will generate a pixar/ToM-timecourses results folder

Contents of the file

Each row of the configuration file specifies an analysis option that the user can specify when running a pipeline script. The fields provided in the example config file should not be deleted or renamed, but you can change the values passed to each row. We may add additional options to the config file as we continue to develop pipelines in the lab, but the table below provides an overview of the current options.

config file tips

  • The pipeline scripts will parse the inputs passed to the config file options and will error out if an input is expected but not provided. In other words, some of the options below require explicit enabling or disabling of the option (e.g., yes or no) to be included instead of left blank. Please read through the table below carefully and include the appropriate input option described in the final column.

  • When editing the configuration file for your analysis, remember that this is a tab-separated values file, so you need to include a single tab (not a space) between the analysis option and the input value you're passing to it!

  • Relatedly, the scripts are expecting the configuration file to have 2 columns (separated by tabs). If there's an extra space or tab after the input value, you might get an error such as this pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3. This means that on line 4 of your config file, there were actually 3 columns instead of 2 suggesting there's an extra space or tab after the input value.

  • Some of the options are meant to be provide flexibility throughout a series of analyses, so modifications to the config file might need to happen after starting an analysis. A good example of this is the resultsDir option. This should be left blank when initially running an analysis (when no results folder has been generated), but you will likely want to pass a results directory to this option if you want to run additional participants through the pipeline and have their results saved to the same results directory or run additional firstlevel pipelines and have the outputs saved to the same directory, etc. Reading through the table below carefully and understanding what the options do will ensure you make the most of the config file in your analyses.

  • Not all of the options below are used by each pipeline, so you could, for example, only specify the options used by the firstlevel pipeline prior to running that analysis and then later specify the options for the timecourse or secondlevel pipeline. Ideally, you want to fill out your config file as completely as possible at the onset of the project, but if decisions are still being made or if you don't intend to use one of the pipelines, these fields can be left blank.

Option Description Inputs
sharedDir Location of shared directory on the server. This is where singularities, data files, and scripts are saved and copied over when setting up a new project. Some scripts will search for files in this directory instead of the your project directory to avoid having to copy over larger files (e.g., search space ROIs) for each project. path to directory (on the EBC server: /EBC/processing)
bidsDir Location of BIDS directory path to directory
derivDir Location of derivatives directory where fMRIPrep outputs are saved path to directory
resultsDir Location of output directory with processed data. This option is useful if you want to output results files for new participants to a pre-existing folder. Use case examples include: running a pipeline for a subset of participants and wanting to save results for the remaining participants to the same output folder. path to directory or leave blank
smoothDir Location of output directory with smoothed data. This option is useful if you want to pick up partially processed files (e.g., smoothed files) from a prior run of the pipeline. Use case examples include: (1) running the firstlevel pipeline and wanting to use the smoothed outputs for the timecourse pipeline or (2) running the firstlevel pipeline to generate one set of contrast files (e.g., mind-body) and wanting to use the smoothed files to generated another set of contrast files (e.g., faces-scenes). You can use this option with splithalf data (i.e., pick up and analyze data that has been split and smoothed). For this functionality, you should leave the splithalf field (described below) as yes so other files are named and split accordingly. The data will not be split again. If the smoothed files live within the results directory, the same path can be provided for both config fields. path to directory or leave blank
resampleDir A path to ROI or search space nifti files. Used by the resample_ROIs.py script. You can leave this blank if you're not resampling ROIs. path to directory containing the ROI file(s) to resample or leave blank
task The name of the task used in the BIDS formatted file names task name (e.g., pixar)
sessions The label(s) used for labelling session folders 01 or no
multiecho Whether the functional data were aquired using a multi-echo sequence. This is currently no for the ongoing Richardson Lab projects yes or no
FD_thresh Framewise displacement threshold to use for tagging outlier volumes any number (default: 1)
DVARS_thresh Standardized DVARS threshold to use for tagging outlier volumes any number (default: 1.5)
art_norm_thresh Composite motion threshold relative to the previous timepoint any number (default: 1)
art_z_thresh SD threshold for fluctuations in global signal any number (default: 3)
ntmpts_exclude Run will be excluded if the proportion of outlier volumes is greater than this number any number (default: .33)
dropvols Number of volumes to drop from beginning of each run any number (default: 0)
smoothing Smoothing value to use; if 0 spatial smoothing will be skipped. If providing a resultsDir with previously smoothed data, this value should likely be 0 or the already smoothed data will be further smoothed. any number (default: 5)
hpf Value to use for temporal high pass filtering (in seconds) any number (default: 100)
filter Filtering method to use (not relevant for standard firstlevel_pipeline script) no, butterworth, cosine
detrend Option to detrend data when running the timecourse pipeline (not relevant for standard firstlevel_pipeline script). This can usually be set to 'no' as slow drift (for the length of experiments we run) is typically dealt w/ by the high-pass filter. yes or no
standardize Option to standardize data when running the timecourse pipeline (not relevant for standard firstlevel_pipeline script) no (do not standardize the data), zscore (timeseries are shifted to zero mean and scaled to unit variance; uses population std), zscore_sample (timeseries are shifted to zero mean and scaled to unit variance; uses sample std), psc (timeseries are shifted to zero mean and scaled to PSC (as compared to original mean signal)) (more info on strategies to standardize signal)
splithalf Whether to analyse the data as separate halves yes or no
events Events to analyse, separated by commas. Formatted to match the conds column in the contrasts file and the trial_type column in the events file(s), which should already use the same condition/trial labels. This can include adult timecourse regressors. any number of events that are specified as conditions within the events files or ROI timecourse files: mind, body, faces, scenes (events); RTPJ, VMPFC (timecourses)
modulators Option to use specified events as parametric (or amplitude) modulators in the model. If yes, then the events files need to have an amplitude column that specifies the modulation value for each onset and trial type. yes or no
contrast List of contrasts to model separated by commas or no. The events specified above can be contrasted in any way as long as there is a corresponding row in the contrasts file that describes the weights to assign each condition. The contrast names specified here must match the desc column in the contrasts file. The contrast files generated by the model will have this label. If no, the parameter estimate for each variable in the model will be returned. any number of contrasts that are specified in the contrasts file: mind-body, faces-scenes (events); RTPJ-S2 (timecourses)
timecourses Option to use adult timecourse regressors. If desired, then the informative part of the file name needs to be passed to this argument, separated by commas (if more than 1 ROI file is needed). These files are described in greater detail on the first-level modeling wiki list of ROI timecourses (e.g., ToM_ROIs, FaceSceneObject_ROIs) or no
regressors List of regressors to use in GLM, separated by commas. This includes timecourse regressors because a given ROI timecourse file may have more timecourses than desired for the model. If ROI timecourses are requested, they must use the same ROI names as the ROI timecourse file. art, aCompCor, FD, DVARS, motion_params-6, motion_params-12 (or other fMRIPrep generated confounds), RTPJ, LTPJ, PC, DMPFC, MMPFC, VMPFC (or other ROI timecourses) (default: art, aCompCor)
template The name (or partial name) of the template data are resampled to. This is used to select the correct ROI/search space files where relevant or to resample ROI files (using the resample_ROIs.py script). name (or partial name) of template (default: MNI152NLin2009cAsym_res-02_T1w)
search_spaces List of ROI search spaces to extract timeseries from, separated by commas list of ROI names (matching file names) or none
match_events Field used by the define_fROIs.py script. To avoid having to modify the config file and re-run the define_fROI script for each search space and contrast of interest, the script will optionally check whether the search space ROI matches the events field. This is useful when the events are contrasts of timecourses (e.g., LTPJ-S2). In this case, you'd likely want to apply the LTPJ search space (and not the other ToM search spaces) to this contrast and so on for each timecourse contrast specified under the events field. If yes, the define_fROI script will check whether the search space ROI name exists within the event and skip search space ROIs that do not match the event name. If no, the define_fROI script will return fROIs for each search space and contrast specified in the config file. yes or no
top_nvox Number of top voxels to extract from ROI search space (if requested) any number (default: 80)
mask Which mask(s) to apply when extracting either timecourses or stats values from an image, separated by commas. If no ROI masking is desired, specify whole_brain. If the mask image starts with fROI, the script will assume a functional ROI file exists for each subject in the list provided. It will search for the fROI files in the resultsDir specified above. If a group or atlas based ROI is provided (i.e., fROI is not in the name), the script will search within your <PROJECT_NAME>/files/ROI folder for the ROI file requested. a list of ROIs (e.g., whole_brain (no ROI), fROI-LTPJ_splithalf1 (fROI), RTPJ (group/atlas ROI)
extract Option to specify whether you want mean or voxelwise values returned. If mean is requested, an averaged (across all voxels within the specified mask) timecourse vector will be saved. If voxelwise is requested, a time x voxel array will be saved. mean or voxelwise
nonparametric Option for secondlevel modeling script. Whether to use a nonparametric (i.e., permutation-based) approach in the group-level analysis yes or no
npermutations Option for secondlevel modeling script. Number of permutations to use. This will only be applied if nonparametric is yes any number (default: 5000)
group_comparison Option for secondlevel modeling script. The type of group comparison: within-group or between-group. within or between
group_variables Option for secondlevel modeling script. The variables to use in the group analysis. These must be included in the subject-condition file passed to the secondlevel pipeline. Only the variables listed in this field will be included in the model, even if additional variables are included in the subject-condition file. any variables (e.g., age, gender)
est_group_variances Option for secondlevel modeling script. Whether variances should be estimated separately for each group. yes or no (default: no)
tfce Option for secondlevel modeling script. Whether to threshold outputs using Threshold-Free Cluster Enhancement. yes or no
overwrite Whether to overwrite results if outputs from a prior analysis are found. If no a new, timestamped output folder will be created alongside the prior ouputs. yes or no

Saving the file

Configuration files must be saved in the main project directory. This is where the pipeline scripts will search for them, and you will get a file not found error if they are saved elsewhere. Because the config files are always saved in the same location relative to the scripts, you do not need to pass the path to the config file when running the scripts that require a config file. Only the name of the config file needs to be provided so the pipeline script picks up the correct config file, in the event you have multiple saved in your project directory.