Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on exit in as_cluster_tender (after libuv event loop is stopped) #149

Open
bsergean opened this issue Mar 19, 2024 · 5 comments
Open

Comments

@bsergean
Copy link

0   server                    0x131064b           gsignal (raise.c:51)
1   server                    0xba03c2            abort 
2   server                    0x5f3fb8            [inlined] uv__async_send (async.c:198)
3   server                    0x5f3fb8            uv_async_send.cold (async.c:73)
4   server                    0x213c818           as_event_execute (as_event_uv.c:246)
5   server                    0x213b26b           as_event_balance_connections (as_event.c:1846)
6   server                    0x212f7f5           [inlined] as_cluster_balance_connections (as_cluster.c:632)
7   server                    0x212f7f5           as_cluster_manage (as_cluster.c:653)
8   server                    0x212fe8d           as_cluster_tend (as_cluster.c:885)
9   server                    0x21304a0           as_cluster_tender (as_cluster.c:935)
10  server                    0x1304c48           start_thread (pthread_create.c:477)
11  server                    0x83aa4c2           __clone
  1. We use libuv (recent version)
  2. On shutdown we sequentially call:
        aerospike_destroy( cluster );
        LOG_INFO( "Closing aero event loops" );
        as_event_close_loops();

Before our event loop gets stopped. Any idea of what's going on ?
If feels like after calling as_event_close_loops the cluster_tend mechanic should stop (and that thread exit).

@BrianNichols
Copy link
Member

BrianNichols commented Mar 20, 2024

Do you call aerospike_close() before calling aerospike_destroy()?

aerospike_close() should perform a graceful shutdown of the cluster while aerospike_destroy() just frees cluster memory. aerospike_destroy() alone does not attempt to stop the cluster tend thread.

@bsergean
Copy link
Author

Yes we do, this is what our cleanup looks like.

          as_error err{};
          as_error_reset( &err );
          auto * cluster = static_cast<aerospike *>( mInternalObject );
      
           if ( aerospike_close( cluster, &err ) != AEROSPIKE_OK )
          {
              LOG_ERROR( "Could not close connection to aerospike: error({}) {} at [{}:{}]", static_cast<int>( err.code ), err.message, err.file, err.line );
          }
          
          aerospike_destroy( cluster );cleaner]
          
          LOG_INFO( "Closing aero event loops" );
          as_event_close_loops();

We changed the way we are closing our uv_loop, maybe this is our problem.

@bsergean
Copy link
Author

Could it help to call

as_event_set_external_loop( ... );

with a null pointer to tell that the loop is gone ... ?
Or maybe we call uv_run a few times to advance the event loop which could help the aerospike graceful termination.

@BrianNichols
Copy link
Member

When are you closing the shared uv_loop?

If you are sharing libuv event loops with the C client via as_event_set_external_loop() or as_set_external_event_loop(), then closing those event loops must come after as_event_close_loops().

@bsergean
Copy link
Author

Good point, we are calling uv_stop (loop) before our aero shutdown sequence. I think we need to reshuffle things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants