Caching artifact downloads for call graph construction #363

mir-am · 2021-12-13T12:00:28Z

As described in #338, we can optimize the network traffic of the FASTEN pipeline by:
1- Downloads should be cached in a configurable folder.
2- The download logic should be refactored such that it can be mocked for the CallGraphConstructorTest to avoid a download and make the test faster.

The text was updated successfully, but these errors were encountered:

proksch · 2021-12-13T14:44:32Z

We could also try to start a basic web server in the test and download the package from localhost, to avoid introducing any actual internet traffic.

The important thing is that the refactoring will make it possible to cache packages, so we can eliminate downloads in other tests that make use of this class.

proksch · 2022-05-19T21:55:29Z

Is this issue still valid? Now that all Maven coordinates are downloaded to the .m2 folder, it should be straight forward to get access to .jars without downloading them again.

mir-am · 2022-05-20T07:19:36Z

I can still look into this and extend the OPAL plugin to use the local JARs instead of downloading them again.

proksch · 2022-05-20T22:04:34Z

Can you do it, or do you do it?

mir-am · 2022-05-20T22:23:30Z

Can you do it, or do you do it?

@proksch
I will do it tomorrow.

proksch · 2022-05-21T19:16:05Z

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

proksch · 2022-05-21T19:30:45Z

In particular, I am talking about the various downloadPom methods and httpGetToFile, the implementations are completely brute-force and completely disrespect the cacheable nature of Maven. Instead, these methods should first check whether the file can be found in the local .m2 repo and if not, should use either Shrinkwrap or the Maven library to resolve the pom/jar, because this will actually create the file in the .m2 folder. Once the file exists there, the methods can just return the file pointer.

Looking at the discussion that we had with @michelescarlato and @MagielBruntink, this would also be the right place to implement the download logic for source .jars. Instead of just having parameters for group, artifact, and version and having two methods, one for poms and one for jars, we could reduce this to a single method with a much more general interface, in which we also allow to define the packaging type (e.g., pom, jar, war, ..) and the classifier (e.g., none, javadoc, sources, ...). The mechanism is then the same for all Maven artifacts and it would be one logic to rule them all.

mir-am · 2022-05-21T19:31:55Z

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

Please let me clarify this.
Before this fix, OPALPlugin downloads JAR files on the fly to generate a call graph. Now, OPALPlugin does this:

Checks if the JAR file of a given Maven coordinate exists in the .m2 folder (created by POMAnalyzer), if yes, it uses the local JAR file to generate a CG.
If the JAR file doesn't exist in the .m2 folder, the plugin uses MavenUtilties to downland the JAR file from Maven central to generate a call graph. This is now a "fallback" method.

That said, I can extend MavenUtilties to also store downloaded JARs in .m2.

proksch · 2022-05-21T19:34:34Z

This case distinction is not a concern of the OPAL plugin, this is the task of the implementation logic of the downloader. I strongly disagree with handling this logic in the OPAL plugin.

Would it make sense to clarify this task in person, before you spend more time on it?

mir-am · 2022-05-21T19:39:14Z

Sure, we can discuss this in person on Monday.

@proksch, also, if I understand correctly, POMAnalyzer already resolves POMs, JARs, and sources for a given Maven coordinate and stores them in .m2. So OPALPlugin re-uses the JAR artifacts from .m2 if available. HOWEVER, it would be good to also extend MavenUtilties to do the same.

proksch · 2022-05-21T20:11:54Z

The PomAnalyzer only resolves .pom and .jar, checks if a source file exists at the expected location, and -if so- includes a URL for it. Part of the downloading logic is indeed implemented in the MavenRepositoryUtils that are part of the PomAnalyzer... it would be a great idea to merge this class with the MavenUtilities and provide a more general version. In contrast to the latter, the MavenRepositoryUtils are thoroughly tested, these tests could be extended to cover the general case. Let's talk about this on Monday.

proksch · 2022-05-21T20:14:22Z

The Resolver class is also relevant here.

mir-am added the enhancement New feature or request label Dec 13, 2021

mir-am self-assigned this May 20, 2022

mir-am mentioned this issue May 21, 2022

Generate Java call graphs from local JARs in OPALPlugin #459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching artifact downloads for call graph construction #363

Caching artifact downloads for call graph construction #363

mir-am commented Dec 13, 2021

proksch commented Dec 13, 2021

proksch commented May 19, 2022

mir-am commented May 20, 2022

proksch commented May 20, 2022

mir-am commented May 20, 2022 •

edited

Loading

proksch commented May 21, 2022

proksch commented May 21, 2022 •

edited

Loading

mir-am commented May 21, 2022 •

edited

Loading

proksch commented May 21, 2022

mir-am commented May 21, 2022

proksch commented May 21, 2022

proksch commented May 21, 2022

Caching artifact downloads for call graph construction #363

Caching artifact downloads for call graph construction #363

Comments

mir-am commented Dec 13, 2021

proksch commented Dec 13, 2021

proksch commented May 19, 2022

mir-am commented May 20, 2022

proksch commented May 20, 2022

mir-am commented May 20, 2022 • edited Loading

proksch commented May 21, 2022

proksch commented May 21, 2022 • edited Loading

mir-am commented May 21, 2022 • edited Loading

proksch commented May 21, 2022

mir-am commented May 21, 2022

proksch commented May 21, 2022

proksch commented May 21, 2022

mir-am commented May 20, 2022 •

edited

Loading

proksch commented May 21, 2022 •

edited

Loading

mir-am commented May 21, 2022 •

edited

Loading