Releases: usc-softarch/arcade_core


12 Nov 04:17
Please note that ARCADE_Core is an experimental tool. If you find any bugs or have difficulty at any point in the execution, please contact me (Marcelo) through the link provided in the

What is included in this release

ARCADE_Core.jar : A packaged distribution of ARCADE_Core, including all of its dependencies. : A distribution of Mallet, which must be unpacked to be used as a fact extractor for ARC. : A directory containing resources used by certain phases of ARC, which must be unpacked in the same directory as ARCADE_Core.jar is run from. It may technically be unpacked anywhere else, but this may cause warnings to be raised during execution.
code-maat-1.0-SNAPSHOT-standalone.jar : A distribution of Code Maat, to be used as a fact extractor. : A distribution of PMD, to be used as a fact extractor. : A distribution of Apache Ant, to be used in executing PMD. : A distribution of Dependency Finder, to be used as a fact extractor. and : Two Perl scripts that are executed when analyzing C-based systems. They must be placed in the same directory as ARCADE_Core itself.


For analyzing C-based systems, you will require a working installation of Perl. In order to use ARC on Windows, you will also need to set up an environment variable MALLET_HOME to point to the root of the mallet distribution provided with ARCADE_Core.

Compiling ARCADE

While a binary is provided, re-compiling is easily achieved using Maven and a JDK 11+. To generate a jar, run the command mvn clean to install ARCADE_Core's local dependencies to your .m2 directory, and then mvn package -Dmaven.test.skip=true to generate the jar inside the target directory.

Known issues

  • The DependencyFinderProcessing smell detector for Interface-based and Change-based smells only works with Java-based systems. This is due to its dependence on Dependency Finder, a dependency extractor for Java-based systems. Therefore, all fact extraction components related to this smell detector, viz. Code Maat, CodeMaatCleanUp, PMD and DependencyFinder, are not useful when analyzing C-based systems.

  • Due to limitations in the current version of ARCADE_Core, note that the source directories MUST follow a specific naming convention. Version directories must be named <PROJECT_NAME>-<PROJECT_VERSION>, whereas a root directory containing multiple versions should NOT contain the dash character. For example, if the system under analysis is bash and the versions are 1.14.4 and 4.2, then the directory structure should be:

|- bash-1.14.4
|- bash-4.2


Fact Extraction

  • CSourceToDepsBuilder: This is the Fact Extractor for C-based systems. It takes the system source as input and outputs a list of dependencies in .RSF format and a serialized FeatureVectors file which is used as input for latter phases. Remember that this component will fail if and (included in this release) are not placed in the same directory as ARCADE_Core's executable jar.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.dependencies.CSourceToDepsBuilder <PATH_TO_SOURCE> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT>

<PATH_TO_SOURCE> : This should be a path pointing to the root of the system under analysis, so that CSourceToDepsBuilder may locate all source files by searching that directory's subtree.

<PATH_TO_RSF_OUTPUT> : This is the path where the output .RSF should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

<PATH_TO_FVECS_OUTPUT> : This is the path where the output .JSON should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

Note that this component outputs two result files. While equivalent, they are used by different components of ARCADE: the RSF output is most widely used and is the most human-readable format of the two; the JSON output contains a few extra bits of information which are needed by the Clusterer component, which includes ARC, WCA and Limbo.

  • JavaSourceToDepsBuilder: This is the Fact Extractor for Java systems, and works similarly to CSourceToDepsBuilder.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.dependencies.JavaSourceToDepsBuilder <PATH_TO_BINARIES> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT> <PACKAGE_PREFIX>

<PATH_TO_BINARIES> : This should be a path pointing to a directory containing the compiled binaries of the system under analysis. Do note that providing the root of the system under analysis, such as in CSourceToDepsBuilder, will result in an empty output.

<PATH_TO_RSF_OUTPUT> : This is the path where the output .RSF should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

<PATH_TO_FVECS_OUTPUT> : This is the path where the output .JSON should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

<PACKAGE_PREFIX> : A prefix by which to filter the dependencies to only those relevant to the subject system. The fact extractor will omit dependencies from the output unless their package begins with <PACKAGE_PREFIX>. Using the empty string as an argument includes all results to the output, and may improve the results of the clustering phase. To do so, input "" as the final argument.

  • MalletRunner: This is a driver for executing Mallet correctly for generating ARC's inputs.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.topics.MalletRunner <PATH_TO_SOURCE> <SOURCE_LANGUAGE> <MALLET_PATH> <ARTIFACTS_DIR> <STOPWORD_DIR>

<PATH_TO_SOURCE>: This should point to the root directory to be analyzed. That is the source directory for a single version, or a super-directory containing the source directory of multiple versions.

<SOURCE_LANGUAGE>: Language of the system under analysis, either c or java. Case-insensitive.

<MALLET_PATH>: Path to the Mallet executable file, either Mallet-202108\bin\mallet.bat for Windows or Mallet-202108\bin\mallet for Linux.

<ARTIFACTS_DIR>: Directory in which to place the output.

<STOPWORD_DIR>: Directory containing javakeywordsexpanded and ckeywordsexpanded. In this distribution, these are contained in the, under res.

By default, this driver will execute by creating a temporary copy of the source directory containing only the files of interest, and then removing that copy once execution is finished. This is due to a limitation of Mallet which does not allow the user to specify which file extensions are to be included in the analysis. For that reason, running the driver on very large systems may take a while. In order to potentially speed things up, two optional arguments may be added to the command:

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.topics.MalletRunner <PATH_TO_SOURCE> <SOURCE_LANGUAGE> <MALLET_PATH> <ARTIFACTS_DIR> <STOPWORD_DIR> <COPY_READY> <KEEP_COPY>

<COPY_READY>: Boolean true or false, lower case. Indicates that the copy directory <PATH_TO_SOURCE>_temp already exists. If this argument is set to false or is absent and a directory already exists under the name <PATH_TO_SOURCE>_temp, execution will fail. This is to avoid unwanted over-writing of existing files. This can happen if execution fails for any reason before it has a chance to remove the temporary copy.

<KEEP_COPY>: Boolean true or false, lower case. Indicates that the temporary copy directory should not be removed. Can be useful if the user intends to run multiple Mallet analyses over the same source.

Lastly, by default, the execution is set to run for 50 topics and 250 iterations. This can be modified by adding another two optional arguments:


  • Code Maat: Code Maat is used to collect additional coupling information used by one of ARCADE's smell detectors. In order to run it, a few preliminary steps are required.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames > <ARTIFACTS_DIR>/project.log

<ARTIFACTS_DIR> : The artifacts directory. Note that project.log does not necessarily need to be placed in the artifacts directory, or indeed be named project.log, but we find that keeping fact extractor results in one place facilitates use of ARCADE.

This command should be run from the root of the system under analysis, such as to obtain the version log from git.

sed "s/'//g" <ARTIFACTS_DIR>/project.log > <ARTIFACTS_DIR>/clean_project.log

Likewise, naming and placement of these files can be changed. This command is intended to pre-process the log file, removing all single quotes. As sed may not be available in Windows, other similar means of removing single quotes may be used instead, such as a manual replacement using a text editor.

Finally, execute the included distribution of Code Maat to obtain the project.csv used by the DependencyFinderProcessing smell detector.

`java -jar code-maat-1.0-SNAPSHOT-standalone.jar -l <ARTIFACTS_DIR>/clean_project.log -c git2 -a coupling > <ARTIFACTS_DIR...

