Releases · usc-softarch/arcade_core

12 Nov 04:17

v1.2.0

4c822e8

v1.2.0 Latest

Latest

DISCLAIMER

Please note that ARCADE_Core is an experimental tool. If you find any bugs or have difficulty at any point in the execution, please contact me (Marcelo) through the link provided in the readme.md.

What is included in this release

ARCADE_Core.jar : A packaged distribution of ARCADE_Core, including all of its dependencies.
Mallet-202108.zip : A distribution of Mallet, which must be unpacked to be used as a fact extractor for ARC.
resources.zip : A directory containing resources used by certain phases of ARC, which must be unpacked in the same directory as ARCADE_Core.jar is run from. It may technically be unpacked anywhere else, but this may cause warnings to be raised during execution.
code-maat-1.0-SNAPSHOT-standalone.jar : A distribution of Code Maat, to be used as a fact extractor.
pmd-bin-5.3.2.zip : A distribution of PMD, to be used as a fact extractor.
apache-ant-1.9.6.zip : A distribution of Apache Ant, to be used in executing PMD.
DependencyFinder.zip : A distribution of Dependency Finder, to be used as a fact extractor.
mkdep.pl and mkfiles.pl : Two Perl scripts that are executed when analyzing C-based systems. They must be placed in the same directory as ARCADE_Core itself.

Requirements

For analyzing C-based systems, you will require a working installation of Perl. In order to use ARC on Windows, you will also need to set up an environment variable MALLET_HOME to point to the root of the mallet distribution provided with ARCADE_Core.

Compiling ARCADE

While a binary is provided, re-compiling is easily achieved using Maven and a JDK 11+. To generate a jar, run the command mvn clean to install ARCADE_Core's local dependencies to your .m2 directory, and then mvn package -Dmaven.test.skip=true to generate the jar inside the target directory.

Known issues

The DependencyFinderProcessing smell detector for Interface-based and Change-based smells only works with Java-based systems. This is due to its dependence on Dependency Finder, a dependency extractor for Java-based systems. Therefore, all fact extraction components related to this smell detector, viz. Code Maat, CodeMaatCleanUp, PMD and DependencyFinder, are not useful when analyzing C-based systems.
Due to limitations in the current version of ARCADE_Core, note that the source directories MUST follow a specific naming convention. Version directories must be named <PROJECT_NAME>-<PROJECT_VERSION>, whereas a root directory containing multiple versions should NOT contain the dash character. For example, if the system under analysis is bash and the versions are 1.14.4 and 4.2, then the directory structure should be:

bash
|- bash-1.14.4
|- bash-4.2

Functionalities

Fact Extraction

CSourceToDepsBuilder: This is the Fact Extractor for C-based systems. It takes the system source as input and outputs a list of dependencies in .RSF format and a serialized FeatureVectors file which is used as input for latter phases. Remember that this component will fail if mkdep.pl and mkfiles.pl (included in this release) are not placed in the same directory as ARCADE_Core's executable jar.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.dependencies.CSourceToDepsBuilder <PATH_TO_SOURCE> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT>

<PATH_TO_SOURCE> : This should be a path pointing to the root of the system under analysis, so that CSourceToDepsBuilder may locate all source files by searching that directory's subtree.

<PATH_TO_RSF_OUTPUT> : This is the path where the output .RSF should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

<PATH_TO_FVECS_OUTPUT> : This is the path where the output .JSON should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. The path should include the desired filename.

Note that this component outputs two result files. While equivalent, they are used by different components of ARCADE: the RSF output is most widely used and is the most human-readable format of the two; the JSON output contains a few extra bits of information which are needed by the Clusterer component, which includes ARC, WCA and Limbo.

JavaSourceToDepsBuilder: This is the Fact Extractor for Java systems, and works similarly to CSourceToDepsBuilder.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.dependencies.JavaSourceToDepsBuilder <PATH_TO_BINARIES> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT> <PACKAGE_PREFIX>

<PATH_TO_BINARIES> : This should be a path pointing to a directory containing the compiled binaries of the system under analysis. Do note that providing the root of the system under analysis, such as in CSourceToDepsBuilder, will result in an empty output.

<PACKAGE_PREFIX> : A prefix by which to filter the dependencies to only those relevant to the subject system. The fact extractor will omit dependencies from the output unless their package begins with <PACKAGE_PREFIX>. Using the empty string as an argument includes all results to the output, and may improve the results of the clustering phase. To do so, input "" as the final argument.

MalletRunner: This is a driver for executing Mallet correctly for generating ARC's inputs.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.topics.MalletRunner <PATH_TO_SOURCE> <SOURCE_LANGUAGE> <MALLET_PATH> <ARTIFACTS_DIR> <STOPWORD_DIR>

<PATH_TO_SOURCE>: This should point to the root directory to be analyzed. That is the source directory for a single version, or a super-directory containing the source directory of multiple versions.

<SOURCE_LANGUAGE>: Language of the system under analysis, either c or java. Case-insensitive.

<MALLET_PATH>: Path to the Mallet executable file, either Mallet-202108\bin\mallet.bat for Windows or Mallet-202108\bin\mallet for Linux.

<ARTIFACTS_DIR>: Directory in which to place the output.

<STOPWORD_DIR>: Directory containing javakeywordsexpanded and ckeywordsexpanded. In this distribution, these are contained in the resources.zip, under res.

By default, this driver will execute by creating a temporary copy of the source directory containing only the files of interest, and then removing that copy once execution is finished. This is due to a limitation of Mallet which does not allow the user to specify which file extensions are to be included in the analysis. For that reason, running the driver on very large systems may take a while. In order to potentially speed things up, two optional arguments may be added to the command:

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.topics.MalletRunner <PATH_TO_SOURCE> <SOURCE_LANGUAGE> <MALLET_PATH> <ARTIFACTS_DIR> <STOPWORD_DIR> <COPY_READY> <KEEP_COPY>

<COPY_READY>: Boolean true or false, lower case. Indicates that the copy directory <PATH_TO_SOURCE>_temp already exists. If this argument is set to false or is absent and a directory already exists under the name <PATH_TO_SOURCE>_temp, execution will fail. This is to avoid unwanted over-writing of existing files. This can happen if execution fails for any reason before it has a chance to remove the temporary copy.

<KEEP_COPY>: Boolean true or false, lower case. Indicates that the temporary copy directory should not be removed. Can be useful if the user intends to run multiple Mallet analyses over the same source.

Lastly, by default, the execution is set to run for 50 topics and 250 iterations. This can be modified by adding another two optional arguments:

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.topics.MalletRunner <PATH_TO_SOURCE> <SOURCE_LANGUAGE> <MALLET_PATH> <ARTIFACTS_DIR> <STOPWORD_DIR> <COPY_READY> <KEEP_COPY> <NUM_TOPICS> <NUM_ITERATIONS>

Code Maat: Code Maat is used to collect additional coupling information used by one of ARCADE's smell detectors. In order to run it, a few preliminary steps are required.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames > <ARTIFACTS_DIR>/project.log

<ARTIFACTS_DIR> : The artifacts directory. Note that project.log does not necessarily need to be placed in the artifacts directory, or indeed be named project.log, but we find that keeping fact extractor results in one place facilitates use of ARCADE.

This command should be run from the root of the system under analysis, such as to obtain the version log from git.

sed "s/'//g" <ARTIFACTS_DIR>/project.log > <ARTIFACTS_DIR>/clean_project.log

Likewise, naming and placement of these files can be changed. This command is intended to pre-process the log file, removing all single quotes. As sed may not be available in Windows, other similar means of removing single quotes may be used instead, such as a manual replacement using a text editor.

Finally, execute the included distribution of Code Maat to obtain the project.csv used by the DependencyFinderProcessing smell detector.

`java -jar code-maat-1.0-SNAPSHOT-standalone.jar -l <ARTIFACTS_DIR>/clean_project.log -c git2 -a coupling > <ARTIFACTS_DIR...

Assets 11

07 Jun 00:12

MarceloLaser

v1.1.0

6256853

v1.1.0

What is included in this release

ARCADE_Core.jar : A packaged distribution of ARCADE_Core, including all of its dependencies.
mallet-2.0.7.zip : A modified distribution of Mallet 2.0.7, which must be unpacked to be used as a fact extractor for ARC.
resources.zip : A directory containing resources used by certain phases of ARC, which must be unpacked in the same directory as ARCADE_Core.jar is run from. It may technically be unpacked anywhere else, but this may cause warnings to be raised during execution.
code-maat-1.0-SNAPSHOT-standalone.jar : A distribution of Code Maat, to be used as a fact extractor.
pmd-bin-5.3.2.zip : A distribution of PMD, to be used as a fact extractor.
apache-ant-1.9.6.zip : A distribution of Apache Ant, to be used in executing PMD.
DependencyFinder.zip : A distribution of Dependency Finder, to be used as a fact extractor.
mkdep.pl and mkfiles.pl : Two Perl scripts that are executed when analyzing C-based systems. They must be placed in the same directory as ARCADE_Core itself.

Requirements

Compiling ARCADE

Known issues

The DependencyFinderProcessing smell detector for Interface-based and Change-based smells only works with Java-based systems. This is due to its dependence on Dependency Finder, a dependency extractor for Java-based systems. Therefore, all fact extraction components related to this smell detector, viz. Code Maat, CodeMaatCleanUp, PMD and DependencyFinder, are not useful when analyzing C-based systems.

Functionalities

Fact Extraction

CSourceToDepsBuilder: This is the Fact Extractor for C-based systems. It takes the system source as input and outputs a list of dependencies in .RSF format and a serialized FeatureVectors file which is used as input for latter phases. Remember that this component will fail if mkdep.pl and mkfiles.pl (included in this release) are not placed in the same directory as ARCADE_Core's executable jar.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.driver.CSourceToDepsBuilder <PATH_TO_SOURCE> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT>

<PATH_TO_SOURCE> : This should be a path pointing to the root of the system under analysis, so that CSourceToDepsBuilder may locate all source files by searching that directory's subtree.

JavaSourceToDepsBuilder: This is the Fact Extractor for Java systems, and works similarly to CSourceToDepsBuilder.

java -cp ARCADE_Core.jar edu.usc.softarch.arcade.facts.driver.JavaSourceToDepsBuilder <PATH_TO_BINARIES> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT> <PACKAGE_PREFIX>

PipeExtractor: This is an auxiliary Fact Extractor that produces one of the input artifacts for ARC. It takes the system language and a path to the system's root source directory as input, and provides a binary file as output.

java -cp ARCADE_CORE.jar edu.usc.softarch.arcade.util.ldasupport.PipeExtractor <PATH_TO_SOURCE> <ARTIFACTS_DIR> <LANGUAGE> <PATH_TO_RESOURCES>

<PATH_TO_SOURCE> : This should be a path pointing to the root of the system under analysis, so that PipeExtractor may locate all source files by searching that directory's subtree.

<ARTIFACTS_DIR> : This is the path where the output should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. This path should not include a filename: the output file will be created as output.pipe, and should not be renamed.

<LANGUAGE> : The language of the system under analysis. Supported languages are c and java.

<PATH_TO_RESOURCES> : Path to the resources directory included in this distribution.

Note that the directory used for the output, referred to as the artifacts directory, will be used as an input for ARC and therefore its structure is more or less hardcoded. Therefore, no files generated inside the artifacts directory should be renamed.

Mallet: The distribution included in this release is slightly modified to guarantee determinism, and should therefore not be replaced by a newer one. Note that these commands, as written here, may throw errors when executing from Windows. If this happens, try using mallet-2.0.7/bin/mallet.bat instead of mallet-2.0.7/bin/mallet. Two commands are required as fact extractors to be used by ARC:

mallet-2.0.7/bin/mallet import-dir --input <PATH_TO_SOURCE> --remove-stopwords TRUE --keep-sequence TRUE --output <ARTIFACTS_DIR>/topicmodel.data

<PATH_TO_SOURCE> : This should be a path pointing to the root of the system under analysis, so that Mallet may locate all source files by searching that directory's subtree.

<ARTIFACTS_DIR> : This is the path where the output should be placed. Necessary directories will be created as long as ARCADE_Core has access permissions to the root directory. Note that the path should include the filename topicmodel.data, which should not be renamed.

This first step will generate the topicmodel required by the next command.

mallet-2.0.7/bin/mallet train-topics --input <ARTIFACTS_DIR>/topicmodel.data --inferencer-filename <ARTIFACTS_DIR>/infer.mallet --num-top-words 50 --num-topics 100 --num-threads 3 --num-iterations 100 --doc-topics-threshold 0.1

<ARTIFACTS_DIR> : The artifacts directory. Note that this is used both to point to the command's input (topicmodel.data) and output (infer.mallet). The output should not be renamed, as ARC will expect it to be named infer.mallet.

Code Maat: Code Maat is used to collect additional coupling information used by one of ARCADE's smell detectors. In order to run it, a few preliminary steps are required.

git log --all --numstat --date=short --pretty=format:'--%h--%ad--%aN' --no-renames > <ARTIFACTS_DIR>/project.log

This command should be run from the root of the system under analysis, such as to obtain the version log from git.

sed "s/'//g" <ARTIFACTS_DIR>/project.log > <ARTIFACTS_DIR>/clean_project.log

Finally, execute the included distribution of Code Maat to obtain the project.csv used by the DependencyFinderProcessing smell detector.

java -jar code-maat-1.0-SNAPSHOT-standalone.jar -l <ARTIFACTS_DIR>/clean_project.log -c git2 -a coupling > <ARTIFACTS_DIR>/project.csv

CleanUpCodeMaat: This component modifies the results of Code Maat's execution for use with the DependencyFinderProcessing smell detector. It is only required when analyzing Java projects.

java -cp ARCADE_Core.jar logical_coupling.cleanUpCodeMaat <ARTIFACTS_DIR>

<ARTIFACTS_DIR> : The artifacts directory. Note that no filename should be included: CleanUpCodeMaat is set to execute over any .csv f...

Assets 11

13 May 09:46

MarceloLaser

v1.0.0

6cf5998

v1.0.0

ARCADE_Core v1.0.0

This is the first truly stable version of ARCADE_Core. All core functionalities from ARCADE are present and have been fully tested. In order to use this release, you will require Maven and a JDK 11+. For analyzing C-based systems, you will also require a working installation of Perl. To generate a jar with the desired functionality, modify the pom.xml, Lines 264 and 306, to point to the desired entry point class. Then run the command mvn clean package -Dmaven.test.skip=true to generate the jar inside the target directory. ARCADE_Core is also available as maven package in this repository, to be used as a library. The following functionalities are available:

Fact Extraction

CSourceToDepsBuilder: This is the Fact Extractor for C-based systems. It takes the system source as input and outputs a list of dependencies in .RSF format and a serialized FeatureVectors file which is used as input for latter phases.

java -jar CSourceToDepsBuilder.jar <PATH_TO_SOURCE> <PATH_TO_RSF_OUTPUT> <PATH_TO_FVECS_OUTPUT>

JavaSourceToDepsBuilder: This is the Fact Extractor for Java systems, and works similarly to CSourceToDepsBuilder. It takes an additional input: a package prefix to filter the dependencies only to those relevant to the subject system. The package prefix is optional, and omitting it may improve the results of the clustering phase. Omitting it will result in a dependencies list which include external libraries and native Java packages.
PipeExtractor: This is an auxiliary Fact Extractor that produces one of the input artifacts for ARC. It takes the system language and a path to the system's root source directory as input, and provides a binary file as output.

java -jar PipeExtractor <PATH_TO_SOURCE> <PATH_TO_OUTPUT> <LANGUAGE>

CleanUpCodeMaat: This is an auxiliary Fact Extractor that filters the results of Code Maat's execution for use with the DependencyFinderProcessing smell detector. Execution of Code Maat is described in the ARCADE manual, Section 5.1.1.1.

Clustering

ACDC: This is the Algorithm for Comprehension-Driven Clustering, designed and developed by Tzerpos and Holt. The version distributed with ARCADE is modified to conform with modern Java practices, but is otherwise functionally equivalent. It takes a dependencies.rsf file as input and provides a clusters.rsf architecture file as output.

java -jar ACDC.jar <PATH_TO_DEPS_RSF> <PATH_TO_CLUSTERS_OUTPUT>

ClusteringAlgoRunner: This is a shared entry point between ARC (Architecture Recovery with Concerns, Garcia et al.), WCA (Weighted Clustering Algorithm, Maqbool and Babri) and Limbo (scaLable InforMation BOttleneck, Andritsos and Tzerpos).

java -jar ClusteringAlgoRunner.jar <CLUSTERING_ALGORITHM> <LANGUAGE> <FVECS_PATH> <STOPPING_CRITERION_TYPE> <STOPPING_CRITERION_VALUE> <SIMILARITY_MEASURE> <SERIALIZATION_CRITERION> <SERIALIZATION_THRESHOLD> <SUBJECT_SYSTEM_NAME> <OUTPUT_PATH> <PACKAGE_PREFIX> <SYSTEM_ROOT> <AUXILIARY_ARTIFACTS>

<CLUSTERING_ALGORITHM>: This is the desired clustering algorithm. Options are "arc", "limbo" and "wca".
<LANGUAGE>: The language of the subject system. Currently supported languages are Java and C. Note that you must select the same language in both clustering and fact extraction.
<FVECS_PATH>: The path to a FeatureVectors file generated by a Fact Extractor.
<STOPPING_CRITERION_TYPE>: The criterion to be used to stop the clustering process. Currently, the only stopping criterion supported is "preselected", meaning the desired number of clusters is provided.
<STOPPING_CRITERION_VALUE>: The value to be used by the stopping criterion. As "preselected" is the only one supported, this is the number of clusters at which the process will stop.
<SIMILARITY_MEASURE>: This is the measure which will be used by the clustering algorithm. ARC currently only supports "js", which applies the Jensen-Shannon divergence. Limbo is designed to use the "il" measure, which stands for Info Loss. WCA may use either "uem" or "uemnm", which stand for Unbiased Ellenberg Measure and Unbiased Ellenberg Measure-NM.
<SERIALIZATION_CRITERION>: This is a criterion for how often to serialize the results of the clustering process. Currently supported are "archsize", "archsizemod" and "stepcount". "archsize" simply provides a specific size at which to serialize the results (note that if archsize < stopping_criterion_value, the results will not be serialized). "archsizemod" provides a modulo to be applied to the architecture size, such that it will be serialized whenever architecture_size % value == 0. "stepcount" is the inverse of "archsizemod", providing a value such that every n clustering steps, the result is serialized.
<SERIALIZATION_THRESHOLD>: This is the value to be used by the serialization criterion. Each criterion will utilize this value differently, see above.
<SUBJECT_SYSTEM_NAME>: This is a name to be used by serialization to identify the output files.
<OUTPUT_PATH>: This is a path to a directory in which to place the output files.
<PACKAGE_PREFIX>: This is used for clustering Java systems, and works the same way as in JavaSourceToDepsBuilder. Note that providing a package prefix in this phase is highly encouraged, though providing an empty string ("") will cause the selected clustering algorithm to execute over the entire system space. However, omitting the package prefix will cause results of Limbo and WCA to be entirely useless, and results of ARC to be less than ideal.
<SYSTEM_ROOT>: This argument is only required by ARC, and indicates the root directory of the subject system. This argument is optional for Limbo and WCA.
<AUXILIARY_ARTIFACTS>: This is the path to a directory containing the necessary auxiliary input artifacts to execute ARC. More information on this can be found in the ARCADE manual, Section 3.3.2. Note that this argument is also optional for Limbo and WCA.

Smell Detection

ArchSmellDetector: This is the smell detector for Dependency-Based and Concern-Based smells. It takes in six arguments, five of input and one of output.

java -jar ArchSmellDetector.jar <DEPS_FILE_PATH> <CLUSTERS_FILE_PATH> <OUTPUT_FILE_PATH> <LANGUAGE> <DOC_TOPICS_PATH> <IS_ARC>

<DEPS_FILE_PATH>: This is a path to a deps.rsf file as generated by a Fact Extractor.
<CLUSTERS_FILE_PATH>: This is a path to a cluster.rsf file as generated by a Clustering Algorithm.
<OUTPUT_FILE_PATH>: This is self-explanatory.
<LANGUAGE>: The language of the subject system. As before, supported languages are "java" and "c", and the language selected here should match the language selected in previous phases.
<DOC_TOPICS_PATH>: This is an optional argument to a doc_topics.json file. This is an additional output file generated by ARC, and is therefore only required when running smell detection over the results of ARC.
<IS_ARC>: This is a boolean input and should be provided as either "true" or "false". Smell detection over the results of ARC is special in that ARC provides concern-based information which enables the detection of concern-based smells. When running ArchSmellDetector over the results of ACDC, WCA or Limbo, only dependency-based smells will be detected.

DependencyFinderProcessing: This is the smell detector for Interface-Based and Change-Based smells. It takes in six arguments, though they are different from ArchSmellDetector. Note that since this smell detector includes change-based smells, it should be run over the results of multiple clustering results of various versions of a subject system.

java -jar DependencyFinderProcessing <CLUSTERS_DIRECTORY> <DEPENDENCIES_DIRECTORY> <CLONES_DIRECTORY> <CODE_MAAT_RESULTS> <PACKAGE_PREFIX> <OUTPUT_PATH>

<CLUSTERS_DIRECTORY>: This is a directory containing the cluster.rsf results of each version being analyzed.
<DEPENDENCIES_DIRECTORY>: This is a directory containing the outputs of running DependencyFinder on each version being analyzed.
<CLONES_DIRECTORY>: This is a directory containing the outputs of running PMD on each version being analyzed. PMD may be executed as follows:

apache-ant\bin\ant.bat -f pmd-bin\cpd.xml cpd -Din=<PATH_TO_VERSION_ROOT> -Dout=<PATH_TO_OUTPUT>.xml

A version of both Apache ant (1.9.6) and pmd (5.3.2) are provided in the repository.
<CODE_MAAT_RESULTS>: This is the file containing the cleaned results from Code Maat, obtained from running the cleanUpCodeMaat Fact Extractor.
<PACKAGE_PREFIX>: As before, this is an optional parameter for Java systems. When analyzing C-based systems, input the empty string ("").
<OUTPUT_PATH>: Self-explanatory.

Future

ARCADE_Core has been a massive project of refactoring, testing, debugging, and overall rehabilitation of a primarily academic tool. There are still significant components that have yet to be re-integrated into this new version, and there are new functionalities currently being developed.

Metrics Components: The metrics components for calculating MoJo, A2A and a suite of Decay metrics are all still functional and present in ARCADE_Core, but compared to the other components which have already been re-integrated, their understandability is less than optimal. Furthermore, CVG has yet to be integrated into ARCADE_Core at all.
GUI: A web-based GUI for ARCADE_Core is currently under development, and should significantly improve its usability. While I have striven to make the CLI entry points as clear as possible, any tool that depends on a CLI is inherently limited.
Stopping Criteria...

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCLAIMER

What is included in this release

Requirements

Compiling ARCADE

Known issues

Functionalities

Fact Extraction

What is included in this release

Requirements

Compiling ARCADE

Known issues

Functionalities

Fact Extraction

ARCADE_Core v1.0.0

Fact Extraction

Clustering

Smell Detection

Future

Releases: usc-softarch/arcade_core

v1.2.0

DISCLAIMER

What is included in this release

Requirements

Compiling ARCADE

Known issues

Functionalities

Fact Extraction

v1.1.0

What is included in this release

Requirements

Compiling ARCADE

Known issues

Functionalities

Fact Extraction

v1.0.0

ARCADE_Core v1.0.0

Fact Extraction

Clustering

Smell Detection

Future