Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch's Unreliable Dependency - A Call for Enhanced API Compatibility #8982

Closed
peternied opened this issue Jul 28, 2023 · 12 comments · Fixed by #12974
Closed

OpenSearch's Unreliable Dependency - A Call for Enhanced API Compatibility #8982

peternied opened this issue Jul 28, 2023 · 12 comments · Fixed by #12974
Labels
bug Something isn't working

Comments

@peternied
Copy link
Member

This issue aims to shed light on a frustrating experience that is negatively impacting contributors, including myself. The intention is not to shame or punish but rather to address the problem constructively and foster a conversation for a potential solution.

Describe the bug

OpenSearch's main does not have any requirements for public API compatibility. This often causes changes on snapshot builds that break compilation of plugins or integration tests of plugins to fail. This issues are difficult to debug and consumer considerable time on its plugins.

Breaking changes are backported in previous major version numbers, these layers make it hard to know where new issues are coming from.

To Reproduce

Maintain a plugin that depends on OpenSearch.

Expected behavior

There are several options:

Keep the Status Quo:

  • OpenSearch core continues to grow with added functionalities.
  • Plugins contribute more functionality into the core codebase to reduce the burden of maintenance.

Strictly Follow SemVer:

  • OpenSearch adheres to Semantic Versioning, ensuring no public API changes without a major version increment.
  • Plugin teams can have operational build pipelines and better control over migration.
  • OpenSearch core contributions are more strictly monitored, limiting the introduction of non-essential breaking changes.

Increase Verification of Pull Requests:

  • OpenSearch pull requests undergo verification against plugins' GitHub Actions checks before merging.
  • Plugin teams integrate test suites that are run before pull requests are merged, enabling them to be considered as a blocking consumer.

Additional context

@peternied peternied added bug Something isn't working untriaged labels Jul 28, 2023
@davidlago
Copy link

davidlago commented Jul 31, 2023

@peternied
Copy link
Member Author

peternied commented Aug 1, 2023

Recent change that impacted all plugins:

@peternied
Copy link
Member Author

@nknize I know you are at work with refactoring components in core [1], the breaking changes being pushed out to plugin teams are painful. Is there a quantification of how far along that work is / for how much time plugin teams will have broken builds due to these changes?

peternied added a commit to peternied/OpenSearch-1 that referenced this issue Aug 1, 2023
Adds a gradle task and github action to check for breaking changes made
to the APIs in server by running comparison against most resent snapshot
build on sonotype maven repository.

Uses japicmp to perform the comparison against the jar files, learn
more https://siom79.github.io/japicmp/

- Related opensearch-project#8982

Signed-off-by: Peter Nied <[email protected]>
@nknize
Copy link
Collaborator

nknize commented Aug 1, 2023

Hey @peternied. I missed this message before posting my comment on the PR to detect compatibility impact.

Is there a quantification of how far along that work is / for how much time plugin teams will have broken builds due to these changes?

I'd like to target Phase 0 of #8110 to be complete no later than 2.11.

Of course that could slip, however, depending on the downstream surface area impacted once module-info.java is added to the core modules. That effort will likely bias towards minimal exports, which means downstreams will be temporarily cut off from some classes they currently extend. We'll take each breakage on a case by case basis and bias towards exporting, but quickly redesigning what needs to be redesigned to properly enforce semver.

In the meantime any mechanisms we can create to surface downstream impact early will help us assist downstreams in quickly opening PRs to change namespaces. Perhaps someone could put together a slackbot to post to a #build channel?

@peternied
Copy link
Member Author

@nknize Its good to hear we have a near-term target for those changes.

Building off of this - do we know if there is a list of planned breaking changes we need before we think we can target a semver safe API?

@peternied
Copy link
Member Author

@nknize Do you know who would be able to help assemble that list of breaking changes that should be expected?

@peternied
Copy link
Member Author

@peternied
Copy link
Member Author

peternied commented Aug 15, 2023

Capturing this proposal on how we could change the consumption model within OpenSearch Project

How does apache foundation handle cases like this?

It's varies by project. I can only speak for Lucene and Solr (and Elasticsearch, and even OpenSearch). Solr (and Elasticsearch, and OpenSearch core) absorbs upstream changes "on demand" the same way we do it here with Lucene. This is what I have been suggesting for a while.

Instead of upstream core OpenSearch repo force feeding per-merge SNAPSHOT artifacts to all downstreams, the downstreams should manually (or on some automated, e.g., weekly, cron period) trigger a snapshot build and update their ${opensearch_version} in build.gradle to their own created sha (e.g., 2.10.0-SNAPSHOT-4373c3b). This would keep downstream builds stable. The unpleasant side affect would depend on how often those snapshots are updated, but that would be under the control of the downstream repo community. It's not perfect (breaking changes never are) but least then the downstream controls their own pain. This happens a lot on Solr, OpenSearch, Elasticsearch when large changes are made to lucene (e.g., JPMS, gradle change from ant, Endianness changes)

From #9367 (comment)

@nknize
Copy link
Collaborator

nknize commented Aug 15, 2023

Follow up comment:

I wonder if we could invent a "toggle" mechanism to allow downstreams to switch between the two approaches at will? Maybe there's a GHA to enable/disable "force fed" snapshots? This could prove useful during large scale refactors to clot the bleeding? Downstreams cut over to the manual snapshot consumption model, and switch back when the change storm has passed?

@dblock
Copy link
Member

dblock commented Aug 24, 2023

@peternied I think there are two major shortcomings with this proposal, all while it does solve the problem for plugins to depend on a more stable core. It actually resembles the ODFE state where plugins depended on a previously released version of core, except that every plugin will be able to depend on some random snapshot build. That did not work well.

  1. In order to produce a working distribution build all plugins must depend on one specific snapshot build of OpenSearch - so how do you see us having any such build during the development cycle? The opensearch-build distribution builds everything from the top, from source. It cannot work without all plugins being at the same snapshot version, which will become very rare.
  2. What will prevent a plugin to simply never integrate with a newer version of core SNAPSHOT until the very last moment? this means that downstream impact of a large change is deferred to the very latest, and closest to release (aka worst) moment. This will likely slip release dates.

Maybe the problem is the breaking changes? Instead of making those with downstream impact that we're feeling, I would have first extracted a public API/SDK, declared that as a public API, and then migrated all plugins to that (similar to what we proposed with extensions, but in core).

@peternied
Copy link
Member Author

Thanks @dblock and @nknize for your thought

...every plugin will be able to depend on some random snapshot build. That did not work well.

That is a significant reason not to engage in this proposal. It would be different if changes were waiting out in 3.0.0 for a significant period, but instead they are moving into the 2.X line as fast as a day later.

Maybe the problem is the breaking changes?

If there were no breaking changes at java's public API level - this wouldn't be an issue.

extracted a public API/SDK, declared that as a public API

This sounds like it would be very useful to signal to plugins what contracts are safe, and which are going to be in flux. Would it be worthwhile to invest in, do we know a number of how many expect breaking changes are left around the refactor - that would help weight investment for myself if it made this difference between avoiding a number of breaking changes.

@peternied
Copy link
Member Author

RE: CompatibilityCheck:
Good news, the compatibility check is using the correct commits for the evaluation, you can confirm by looking at the SHA on the message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants