Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching artifact downloads for call graph construction #363

Open
mir-am opened this issue Dec 13, 2021 · 12 comments
Open

Caching artifact downloads for call graph construction #363

mir-am opened this issue Dec 13, 2021 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@mir-am
Copy link
Contributor

mir-am commented Dec 13, 2021

As described in #338, we can optimize the network traffic of the FASTEN pipeline by:
1- Downloads should be cached in a configurable folder.
2- The download logic should be refactored such that it can be mocked for the CallGraphConstructorTest to avoid a download and make the test faster.

@mir-am mir-am added the enhancement New feature or request label Dec 13, 2021
@proksch
Copy link
Contributor

proksch commented Dec 13, 2021

We could also try to start a basic web server in the test and download the package from localhost, to avoid introducing any actual internet traffic.

The important thing is that the refactoring will make it possible to cache packages, so we can eliminate downloads in other tests that make use of this class.

@proksch
Copy link
Contributor

proksch commented May 19, 2022

Is this issue still valid? Now that all Maven coordinates are downloaded to the .m2 folder, it should be straight forward to get access to .jars without downloading them again.

@mir-am
Copy link
Contributor Author

mir-am commented May 20, 2022

I can still look into this and extend the OPAL plugin to use the local JARs instead of downloading them again.

@proksch
Copy link
Contributor

proksch commented May 20, 2022

Can you do it, or do you do it?

@mir-am mir-am self-assigned this May 20, 2022
@mir-am
Copy link
Contributor Author

mir-am commented May 20, 2022

Can you do it, or do you do it?

@proksch
I will do it tomorrow.

@proksch
Copy link
Contributor

proksch commented May 21, 2022

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

@proksch
Copy link
Contributor

proksch commented May 21, 2022

In particular, I am talking about the various downloadPom methods and httpGetToFile, the implementations are completely brute-force and completely disrespect the cacheable nature of Maven. Instead, these methods should first check whether the file can be found in the local .m2 repo and if not, should use either Shrinkwrap or the Maven library to resolve the pom/jar, because this will actually create the file in the .m2 folder. Once the file exists there, the methods can just return the file pointer.

Looking at the discussion that we had with @michelescarlato and @MagielBruntink, this would also be the right place to implement the download logic for source .jars. Instead of just having parameters for group, artifact, and version and having two methods, one for poms and one for jars, we could reduce this to a single method with a much more general interface, in which we also allow to define the packaging type (e.g., pom, jar, war, ..) and the classifier (e.g., none, javadoc, sources, ...). The mechanism is then the same for all Maven artifacts and it would be one logic to rule them all.

@mir-am
Copy link
Contributor Author

mir-am commented May 21, 2022

One of us is missing the point of the issue.

From my understanding, the problem is (or maybe was) that OPAL did download the jar files for the analysis from the internet, rather than opening it from the local file system. This issue has requested to change the behavior of OPAL accordingly to use a dir (i.e., the .m2 folder), so we do not have downloads on every build. Since then, we have changed the PomAnalyzer logic such that is uses the .m2 folder, so this could be the caching folder that is mentioned in the first post of this PR...

I am not 100% how the proposed PR relates to that? I do not think that we need any kind of logic change in the OPAL plugin itself, what's required is a change in the MavenUtilities that (to the best of my knowledge) right now still download .jars from the internet, rather using the already downloaded file in the .m2 folder or (in case of non-existence) Shrinkwrap for downloading and putting the artifact into the .m2 folder.

Please let me clarify this.
Before this fix, OPALPlugin downloads JAR files on the fly to generate a call graph. Now, OPALPlugin does this:

  • Checks if the JAR file of a given Maven coordinate exists in the .m2 folder (created by POMAnalyzer), if yes, it uses the local JAR file to generate a CG.
  • If the JAR file doesn't exist in the .m2 folder, the plugin uses MavenUtilties to downland the JAR file from Maven central to generate a call graph. This is now a "fallback" method.

That said, I can extend MavenUtilties to also store downloaded JARs in .m2.

@proksch
Copy link
Contributor

proksch commented May 21, 2022

This case distinction is not a concern of the OPAL plugin, this is the task of the implementation logic of the downloader. I strongly disagree with handling this logic in the OPAL plugin.

Would it make sense to clarify this task in person, before you spend more time on it?

@mir-am
Copy link
Contributor Author

mir-am commented May 21, 2022

Sure, we can discuss this in person on Monday.

@proksch, also, if I understand correctly, POMAnalyzer already resolves POMs, JARs, and sources for a given Maven coordinate and stores them in .m2. So OPALPlugin re-uses the JAR artifacts from .m2 if available. HOWEVER, it would be good to also extend MavenUtilties to do the same.

@proksch
Copy link
Contributor

proksch commented May 21, 2022

The PomAnalyzer only resolves .pom and .jar, checks if a source file exists at the expected location, and -if so- includes a URL for it. Part of the downloading logic is indeed implemented in the MavenRepositoryUtils that are part of the PomAnalyzer... it would be a great idea to merge this class with the MavenUtilities and provide a more general version. In contrast to the latter, the MavenRepositoryUtils are thoroughly tested, these tests could be extended to cover the general case. Let's talk about this on Monday.

@proksch
Copy link
Contributor

proksch commented May 21, 2022

The Resolver class is also relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants