Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ABI compatibiity errors between abseil-cpp and dependent packages. #10003

Conversation

surfacepatterns
Copy link
Contributor

@surfacepatterns surfacepatterns commented Aug 1, 2024

Merge Checklist

All boxes should be checked before merging the PR (just tick any boxes which don't apply to this PR)

  • The toolchain has been rebuilt successfully (or no changes were made to it)
  • The toolchain/worker package manifests are up-to-date
  • Any updated packages successfully build (or no packages were changed)
  • Packages depending on static components modified in this PR (Golang, *-static subpackages, etc.) have had their Release tag incremented.
  • Package tests (%check section) have been verified with RUN_CHECK=y for existing SPEC files, or added to new SPEC files
  • All package sources are available
  • cgmanifest files are up-to-date and sorted (./cgmanifest.json, ./toolkit/scripts/toolchain/cgmanifest.json, .github/workflows/cgmanifest.json)
  • LICENSE-MAP files are up-to-date (./SPECS/LICENSES-AND-NOTICES/data/licenses.json, ./SPECS/LICENSES-AND-NOTICES/LICENSES-MAP.md, ./SPECS/LICENSES-AND-NOTICES/LICENSE-EXCEPTIONS.PHOTON)
  • All source files have up-to-date hashes in the *.signatures.json files
  • sudo make go-tidy-all and sudo make go-test-coverage pass
  • Documentation has been updated to match any changes to the build system
  • If you are adding/removing a .spec file that has multiple-versions supported, please add @microsoft/cbl-mariner-multi-package-reviewers team as reviewer (Eg. golang has 2 versions 1.18, 1.21+)
  • Ready to merge

Summary

The current abseil-cpp package is built with CMAKE_BUILD_TYPE set to None. This results in the package being built without NDEBUG being defined. This creates ABI incompatibilities between abseil-cpp and packages that depend on abseil-cpp that are compiled with NDEBUG defined.

I came across this issue when I was running tests for a custom grpc based service against Azure Linux. Each time I would run tests, the test executable would end up crashing with gdb tracebacks similar to this (sans traceback entries into proprietary code):

0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
1  0x00007ffff6729ed3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
2  0x00007ffff66ded86 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
3  0x00007ffff66c97e5 in __GI_abort () at abort.c:79
4  0x00007ffff63f2537 in absl::lts_20240116::raw_log_internal::(anonymous namespace)::RawLogVA(absl::lts_20240116::LogSeverity, char const*, int, char const*, __va_list_tag*) () at /lib/libabsl_raw_logging_internal.so.2401.0.0
5  0x00007ffff63f25ce in absl::lts_20240116::raw_log_internal::RawLog(absl::lts_20240116::LogSeverity, char const*, int, char const*, ...) ()
    at /lib/libabsl_raw_logging_internal.so.2401.0.0
6  0x00007ffff7b19280 in absl::lts_20240116::DeadlockCheck(absl::lts_20240116::Mutex*) () at /lib/libabsl_synchronization.so.2401.0.0
7  0x00007ffff7b1bb35 in absl::lts_20240116::Mutex::Lock() () at /lib/libabsl_synchronization.so.2401.0.0
8  0x00007ffff70113f0 in grpc_event_engine::experimental::BasicWorkQueue::Add(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
9  0x00007ffff700b19d in grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::Run(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
10 0x00007ffff700b2f3 in grpc_event_engine::experimental::WorkStealingThreadPool::Run(absl::lts_20240116::AnyInvocable<void ()>) () at /lib/libgrpc.so.39
11 0x00007ffff70039e2 in grpc_event_engine::experimental::TimerManager::TimerManager(std::shared_ptr<grpc_event_engine::experimental::ThreadPool>) ()
    at /lib/libgrpc.so.39
12 0x00007ffff6ff27a0 in grpc_event_engine::experimental::PosixEventEngine::PosixEventEngine() () at /lib/libgrpc.so.39
13 0x00007ffff6fdd1c3 in grpc_event_engine::experimental::DefaultEventEngineFactory() () at /lib/libgrpc.so.39
14 0x00007ffff6fdc7a5 in grpc_event_engine::experimental::CreateEventEngineInner() () at /lib/libgrpc.so.39
15 0x00007ffff6fdc7d2 in grpc_event_engine::experimental::CreateEventEngine() () at /lib/libgrpc.so.39
16 0x00007ffff6fdc9c9 in grpc_event_engine::experimental::GetDefaultEventEngine(grpc_core::SourceLocation) () at /lib/libgrpc.so.39
17 0x00007ffff6fdcd9c in grpc_event_engine::experimental::(anonymous namespace)::EnsureEventEngineInChannelArgs(grpc_core::ChannelArgs) () at /lib/libgrpc.so.39
18 0x00007ffff6fdd06b in std::_Function_handler<grpc_core::ChannelArgs (grpc_core::ChannelArgs), grpc_core::ChannelArgs (*)(grpc_core::ChannelArgs)>::_M_invoke(std::_Any_data const&, grpc_core::ChannelArgs&&) () at /lib/libgrpc.so.39
19 0x00007ffff6f851e6 in grpc_core::ChannelArgsPreconditioning::PreconditionChannelArgs(grpc_channel_args const*) const () at /lib/libgrpc.so.39
20 0x00007ffff71655d0 in grpc_server_create () at /lib/libgrpc.so.39
21 0x00007ffff7a728c4 in grpc::Server::Server(grpc::ChannelArguments*, std::shared_ptr<std::vector<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> >, std::allocator<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> > > > >, int, int, int, std::vector<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl>, std::allocator<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl> > >, grpc_server_config_fetcher*, grpc_resource_quota*, std::vector<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> >, std::allocator<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> > > >, grpc::experimental::ServerMetricRecorder*) () at /lib/libgrpc++.so.1.62
22 0x00007ffff7a6d0ca in grpc::ServerBuilder::BuildAndStart() () at /lib/libgrpc++.so.1.62
...

This happened because the implementation of absl::Mutex is changed by the presence of an NDEBUG definition. When abseil-cpp is compiled without NDEBUG, this definition of absl::Mutex::Dtor():

https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.cc#L742-L747

... is defined in the shared library and meant to be called by the absl::Mutex destructor. This helps clear out deadlock info when NDEBUG isn't defined. However, when NDEBUG is defined by dependent packages, this definition of absl::Mutex::Dtor():

https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.h#L1080

... is generated in the dependent binary for use by the dependent package, resulting in the former definition being unused, and causing grpc and other dependent packages to abort sporadically (but often) at absl::Mutex entry points.

One of the prominent maintainers of abseil has mentioned this issue before:

https://github.com/abseil/abseil-cpp/blob/master/FAQ.md#what-is-abi-and-why-dont-you-recommend-using-a-pre-compiled-version-of-abseil

FTR, this is also an issue in Fedora (I suspect the abseil-cpp rpm spec was taken from Fedora). I plan to make PRs there as well.

Change Log
  • abseil-cpp:
    • CMAKE_BUILD_TYPE changed to RelWithDebInfo so that abseil-cpp is compiled with NDEBUG defined.
    • Dependencies updated to note previously unacknowledged dependencies.
    • Flaky waiter tests disabled.
  • grpc:
    • Release bumped to rebuild against updated abseil-cpp.
    • Updated to depend on releases of abseil-cpp for which ABI compatibility can be maintained.
  • libarrow:
    • Release bumped to rebuild against updated abseil-cpp.
    • Updated to depend on releases of abseil-cpp for which ABI compatibility can be maintained.
    • Fix rpm warning.
  • opentelemetry-cpp:
    • Release bumped to rebuild against updated abseil-cpp.
    • Updated to depend on releases of abseil-cpp for which ABI compatibility can be maintained.
    • Provide explicit protobuf archive fetch in lieu of manual fetch.
  • protobuf:
    • Release bumped to rebuild against updated abseil-cpp.
    • Updated to depend on releases of abseil-cpp for which ABI compatibility can be maintained.
  • re2:
    • Release bumped to rebuild against updated abseil-cpp.
    • Updated to depend on releases of abseil-cpp for which ABI compatibility can be maintained.
Does this affect the toolchain?

NO

Associated issues
Test Methodology
  • Local build.

@surfacepatterns
Copy link
Contributor Author

I'm guessing the source signature check fails in the grpc signatures because the sources used in the pipeline are different than the sources that were generated during testing. I'll update the signature and the PR.

The current `abseil-cpp` package is built with `CMAKE_BUILD_TYPE` set to
`None`.  This results in the package being built without `NDEBUG` being
defined.  This creates ABI incompatibilities between `abseil-cpp` and packages
that depend on `abseil-cpp` that are compiled with `NDEBUG` defined.

I came across this issue when I was running tests for a custom `grpc` based
service against Azure Linux.  Each time I would run tests, the test executable
would end up crashing with `gdb` tracebacks similar to this (sans traceback
entries into proprietary code):

```
0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
1  0x00007ffff6729ed3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
2  0x00007ffff66ded86 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
3  0x00007ffff66c97e5 in __GI_abort () at abort.c:79
4  0x00007ffff63f2537 in absl::lts_20240116::raw_log_internal::(anonymous namespace)::RawLogVA(absl::lts_20240116::LogSeverity, char const*, int, char const*, __va_list_tag*) () at /lib/libabsl_raw_logging_internal.so.2401.0.0
5  0x00007ffff63f25ce in absl::lts_20240116::raw_log_internal::RawLog(absl::lts_20240116::LogSeverity, char const*, int, char const*, ...) ()
    at /lib/libabsl_raw_logging_internal.so.2401.0.0
6  0x00007ffff7b19280 in absl::lts_20240116::DeadlockCheck(absl::lts_20240116::Mutex*) () at /lib/libabsl_synchronization.so.2401.0.0
7  0x00007ffff7b1bb35 in absl::lts_20240116::Mutex::Lock() () at /lib/libabsl_synchronization.so.2401.0.0
8  0x00007ffff70113f0 in grpc_event_engine::experimental::BasicWorkQueue::Add(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
9  0x00007ffff700b19d in grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::Run(grpc_event_engine::experimental::EventEngine::Closure*) () at /lib/libgrpc.so.39
10 0x00007ffff700b2f3 in grpc_event_engine::experimental::WorkStealingThreadPool::Run(absl::lts_20240116::AnyInvocable<void ()>) () at /lib/libgrpc.so.39
11 0x00007ffff70039e2 in grpc_event_engine::experimental::TimerManager::TimerManager(std::shared_ptr<grpc_event_engine::experimental::ThreadPool>) ()
    at /lib/libgrpc.so.39
12 0x00007ffff6ff27a0 in grpc_event_engine::experimental::PosixEventEngine::PosixEventEngine() () at /lib/libgrpc.so.39
13 0x00007ffff6fdd1c3 in grpc_event_engine::experimental::DefaultEventEngineFactory() () at /lib/libgrpc.so.39
14 0x00007ffff6fdc7a5 in grpc_event_engine::experimental::CreateEventEngineInner() () at /lib/libgrpc.so.39
15 0x00007ffff6fdc7d2 in grpc_event_engine::experimental::CreateEventEngine() () at /lib/libgrpc.so.39
16 0x00007ffff6fdc9c9 in grpc_event_engine::experimental::GetDefaultEventEngine(grpc_core::SourceLocation) () at /lib/libgrpc.so.39
17 0x00007ffff6fdcd9c in grpc_event_engine::experimental::(anonymous namespace)::EnsureEventEngineInChannelArgs(grpc_core::ChannelArgs) () at /lib/libgrpc.so.39
18 0x00007ffff6fdd06b in std::_Function_handler<grpc_core::ChannelArgs (grpc_core::ChannelArgs), grpc_core::ChannelArgs (*)(grpc_core::ChannelArgs)>::_M_invoke(std::_Any_data const&, grpc_core::ChannelArgs&&) () at /lib/libgrpc.so.39
19 0x00007ffff6f851e6 in grpc_core::ChannelArgsPreconditioning::PreconditionChannelArgs(grpc_channel_args const*) const () at /lib/libgrpc.so.39
20 0x00007ffff71655d0 in grpc_server_create () at /lib/libgrpc.so.39
21 0x00007ffff7a728c4 in grpc::Server::Server(grpc::ChannelArguments*, std::shared_ptr<std::vector<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> >, std::allocator<std::unique_ptr<grpc::ServerCompletionQueue, std::default_delete<grpc::ServerCompletionQueue> > > > >, int, int, int, std::vector<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl>, std::allocator<std::shared_ptr<grpc::internal::ExternalConnectionAcceptorImpl> > >, grpc_server_config_fetcher*, grpc_resource_quota*, std::vector<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> >, std::allocator<std::unique_ptr<grpc::experimental::ServerInterceptorFactoryInterface, std::default_delete<grpc::experimental::ServerInterceptorFactoryInterface> > > >, grpc::experimental::ServerMetricRecorder*) () at /lib/libgrpc++.so.1.62
22 0x00007ffff7a6d0ca in grpc::ServerBuilder::BuildAndStart() () at /lib/libgrpc++.so.1.62
...
```

This happened because the implementation of `absl::Mutex` is changed by the
presence of an `NDEBUG` definition.  When `abseil-cpp` is compiled without
`NDEBUG`, this definition of `absl::Mutex::Dtor()`:

https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.cc#L742-L747

... is defined in the __shared__ __library__ and meant to be called by the
`absl::Mutex` destructor.  This helps clear out deadlock info when `NDEBUG`
isn't defined.  However, when `NDEBUG` is defined by dependent packages, this
definition of `absl::Mutex::Dtor()`:

https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.h#L1080

... is generated in the __dependent__ __binary__ for use by the dependent
package, resulting in the former definition being unused, and causing `grpc`
and other dependent packages to abort when re-using mutexes.

One of the prominent maintainers of `abseil` has mentioned this issue before:

https://github.com/abseil/abseil-cpp/blob/master/FAQ.md#what-is-abi-and-why-dont-you-recommend-using-a-pre-compiled-version-of-abseil

FTR, this is also an issue in Fedora (I suspect the `abseil-cpp` rpm spec was
taken from Fedora).  I plan to make PRs there as well.
@surfacepatterns
Copy link
Contributor Author

@microsoft/cbl-mariner-devs: AFAICT, this PR hasn't received any attention while newer PRs have been processed and approved. Is there something missing from this PR?

@PawelWMS
Copy link
Contributor

Buddy build for the modified packages.

…o danderson/abseil-abi-incompatibility-fix
@surfacepatterns surfacepatterns force-pushed the danderson/abseil-abi-incompatibility-fix branch from 0214bd3 to c719b35 Compare August 14, 2024 06:02
@PawelWMS
Copy link
Contributor

Another buddy build run after the 3.0-dev merge.

@PawelWMS PawelWMS merged commit bd89859 into microsoft:3.0-dev Aug 14, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants