Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: use MEMORY type cache #36

Merged
merged 5 commits into from
Aug 29, 2024
Merged

refactor: use MEMORY type cache #36

merged 5 commits into from
Aug 29, 2024

Conversation

zmstone
Copy link
Contributor

@zmstone zmstone commented Aug 24, 2024

Hopefully fixes #26, but not 100% sure because I cannot reproduce it.

The idea is to:

  • Change to use MEMORY type cache by default (because the reported error is seemingly related to file system/hard drive)
  • Allow caller to specify cache type and name. e.g. kinit(KeyTab, Principal, "FILE:/tmp/my_cc_name"). This should provide flexibility at application level.

*
* krb5 doc says krb5_cc_default is essentially krb5_cc_resolve with default ccname, but it does not work.
* So we set environment variable KRB5CCNAME and call krb5_cc_default instead */
setenv("KRB5CCNAME", (const char*)ccname_char, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in fact might be the simple reason why this didn't work for me when I experimented with it previously. Great find!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, better news, this does work, but you have to set default name first, there might be a better way, but :

 if ((error = krb5_cc_set_default_name(context, (const char*)ccname_char)) != 0) {
        tag = "krb5_cc_set_default_name";
        goto kinit_finish;
    }

    if ((error = krb5_cc_default(context, &ccache)) != 0) {
        tag = "krb5_cc_default";
        goto kinit_finish;
    }

And voila, we don't have to rely on os envs now 😄 We'll need to look at caveats though around threads and such, though krb5 seems pretty thread safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better solution indeed, so you were half way there, but we need to make this call first : https://github.com/krb5/krb5/blob/b9b654e5b469140d5603f27af5bf83ee9a826349/src/lib/krb5/ccache/t_cc.c#L621C40-L621C51

Looks like there's a default memory cc struct (krb5_cc_ops) in an include somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ ignore, krb5_cc_resolve works fine, only need to register if it's a truly new cc type.

   if ((error = krb5_cc_resolve(context, (const char*)ccname_char, &ccache)) != 0) {
        tag = "krb5_cc_set_default_name";
        goto kinit_finish;
    }

    printf("CC TYPE : %s\r\n", krb5_cc_get_type(context, ccache));

then

1> example:main(client1).
CC TYPE : MEMORY
ok
2> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

krb5_cc_get_type probably will return what you want to see,
but what matters is the return value of sasl_client_new.

@starbelly
Copy link
Contributor

I'll give this a whirl in anger shortly.

@zmstone zmstone changed the title refactor: use memory MEMORY type cache refactor: use MEMORY type cache Aug 24, 2024
c_src/sasl_auth.c Outdated Show resolved Hide resolved
@starbelly
Copy link
Contributor

Alas, I'm running to the same problem I had with cyrus-sasl when testing this (no creds supplied error). That said, I have a reproducer for you. This can be done with just sasl_auth app, but for the sake of time, I did the following :

  1. git clone [email protected]:kafka4beam/brod_gssapi.git
  2. cd brod_gssapi
  3. mkdir _checkouts
  4. git clone [email protected]:kafka4beam/sasl_auth.git
  5. cd ..
  6. example
  7. ./up
  8. shell into the container spun up
  9. cd /opt/rebar3 && ./rebar3 local install && cd -
  10. Modify src/example.erl to look like the following :
-module(example).

-export([main/1, race_test/0]).


race_test() -> 
        ClientNames = [erlang:list_to_atom("client_" ++ erlang:integer_to_list(I)) || I <- lists:seq(1, 100)],
        [spawn(fun() -> example:main(Client) end) || Client <- ClientNames].

main(ClientName) ->
    try
        application:ensure_all_started(brod),
        application:load(brod_gssapi),
        application:set_env(brod_gssapi,
                            default_handshake_vsn,
                            1),
        KafkaBootstrapEndpoints = [{"kafka.kerberos-demo.local", 9093}],
        Topic = <<"mytest">>,
        %Partition = 0,
        KeyTab = <<"/var/lib/secret/rig.key">>,
        Principal = <<"[email protected]">>,
        Config = [{sasl, {callback, brod_gssapi, {gssapi, KeyTab, Principal}}}],
        brod:start_client(KafkaBootstrapEndpoints, ClientName, Config),
        brod:start_producer(ClientName, Topic, _ProducerConfig = [])
    catch
        ErrorClass:Reason:Stack -> 
            io:format("~n~n~n~n~n\033[0;31mFAIL\033[0m (Failed to send and receive messages with brod)~n~n ~p:~p:~p~n~n~n~n~n", [ErrorClass, Reason, Stack])
            %erlang:halt(1)
    end.
  1. rebar3 shell
  2. example:race_test()

If you do that on main, you'll end up with file missing errors from krb5, if you do that on this branch you'll end up with no credentials error.

As said, I started to experiment with this and other solutions (such as using unique files, though this can go poorly if you do not clean up afterwards). The most optimal solution is indeed using memory cache, even if your on a sane platform and using SSSD, that's still a bottle neck. iirc, when I looked to this previously, cyrus-sasl is doing something that prevents memory cache from effectively being used, though I do like to be wrong, and hope I and am wrong.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 25, 2024

Thank you @starbelly

cyrus-sasl is doing something that prevents memory cache from effectively being used

It would be great if you can give some pointers about what is preventing memory cache.

@starbelly
Copy link
Contributor

starbelly commented Aug 25, 2024

Thank you @starbelly

cyrus-sasl is doing something that prevents memory cache from effectively being used

It would be great if you can give some pointers about what is preventing memory cache.

@zmstone I think actually this is working, there's just a potential problem perhaps with going across schedulers (hypothesis, not theory). If I reduce the number of schedulers to 1, such that all memory access happens within a single thread, I have no issues, and in fact can change the number of clients starting up to 1000 without issue (do not it was just starting the client and the producer as above, though that should be good enough for these purposes). We could also look at the latest cyrus-sasl and see if there is improvement there, but I think the intention by memory cache is that it's to be used by a single process (which I believe really means thread) : https://github.com/krb5/krb5/blob/b9b654e5b469140d5603f27af5bf83ee9a826349/doc/basic/ccache_def.rst?plain=1#L101

That said, maybe we can make a unique name per scheduler?

Edit:

This works, in fact, I think the solution right now was working, but path deps (as part of the brod_gssapi example client) was not working as expected (i.e., it didn't pull in the checkout). This is good news 😄

That said, I still wonder if should have a cache per scheduler per what I alluded to above.

Extra edit :

Sorry for the noise, but trying to get my comments in before I head out for an hour or so. I think we also want to ensure we only set this once, if possible.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 25, 2024

found some discussions in memory cache thread-safe topic: https://bugzilla.redhat.com/show_bug.cgi?id=1605756
fixed in krb5-1.15.1-35.el7

@zmstone
Copy link
Contributor Author

zmstone commented Aug 25, 2024

Changed to set KRB5CCNAME when nif is loaded.
And kinit/3 is now @hidden.
I can try to test more before unhide it.

@starbelly
Copy link
Contributor

Changed to set KRB5CCNAME when nif is loaded. And kinit/3 is now @hidden. I can try to test more before unhide it.

No need for the env var 😄 See #36 (comment)

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

I tride to test krb5_cc_set_default_name and/or krb5_cc_resolve, only to get below error when calling client_new.

{error,{sasl_fail,<<"SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (No Kerberos credentials available (default cache: xxxxxxx))">>}}

@starbelly
Copy link
Contributor

I tride to test krb5_cc_set_default_name and/or krb5_cc_resolve, only to get below error when calling client_new.

{error,{sasl_fail,<<"SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (No Kerberos credentials available (default cache: xxxxxxx))">>}}

What os? I tried ubuntu, but I can try redhat today.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

What os? I tried ubuntu, but I can try redhat today.

Ubuntu 20.

I noticed, but not tried to fix yet: rebar3 compile may copy the old .so file from priv to _build, if you are testing with pa _build/...

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

this is how I run my tests (rebar3 compile twice intended)

        rm -f c_src/*.o
        rm -f priv/*.so
        rm -rf _build/default/lib/sasl_auth
        rebar3 compile
        rebar3 compile
        ./scripts/int-test.sh

@starbelly
Copy link
Contributor

I tried that, I didn't get an error, but not sure the tests went well, but exited with error none the less. Can you try the test I described utilizing brod_gssapi?

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

I can confirm that krb5_cc_resolve works for this test: #36 (comment)

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

right. it works in Ubuntu 22 (libgssapi-krb5-2 1.19.2-2), but not 20 (libgssapi-krb5-2 1.17-6).

@starbelly
Copy link
Contributor

Beautiful! I suppose we should enhance tests here ala what's going on there.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

I'll keep setting env in the load call.
And change to call krb5_cc_resolve for user provided ccname.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

Sorry, Maybe I tested something wrong on Ubuntu 22, it's actually failing after I tried to repeat the tests with int-test.sh

@zmstone
Copy link
Contributor Author

zmstone commented Aug 26, 2024

I can confirm that krb5_cc_resolve works for this test: #36 (comment)

Sorry, it did not work after all.

With below diff.

--- a/c_src/sasl_auth.c
+++ b/c_src/sasl_auth.c
@@ -149,7 +149,7 @@ static int load(ErlNifEnv* env, void** UNUSED(priv), ERL_NIF_TERM UNUSED(info))
     // because the default FILE type cache has race conditions.
     // also krb5_cc_resolve does not always work in all systems (e.g. ubuntu 20)
     // so we need an override for default cache name
-    setenv("KRB5CCNAME", DEFAULT_CCNAME, 1);
+    //setenv("KRB5CCNAME", DEFAULT_CCNAME, 1);
     return !sasl_client_connection_nif_resource_type && !(cli_result == SASL_OK)
         && !sasl_server_connection_nif_resource_type && !(srv_result == SASL_OK);
 }
@@ -756,17 +756,10 @@ static ERL_NIF_TERM sasl_kinit(ErlNifEnv* env, int UNUSED(argc), const ERL_NIF_T

     handle_alive = 1;

-    if (ccname_char[0] != 0) {
-        if((error = krb5_cc_resolve(context, (const char*) ccname_char, &ccache)) != 0) {
+        if((error = krb5_cc_resolve(context, DEFAULT_CCNAME, &ccache)) != 0) {
             tag = "krb5_cc_resolve";
             goto kinit_finish;
         }
-    } else {
-        if((error = krb5_cc_default(context, &ccache)) != 0) {
-            tag = "krb5_cc_default";
-            goto kinit_finish;
-        }
-    }

     cache_alive = 1;

example:race_test() fails with errors like below:

error:{badmatch,
           {error,
               {client_down,
                   [{{"kafka.kerberos-demo.local",9093},
                     {{sasl_auth_error,
                          {sasl_fail,
                              <<"SASL(-1): generic failure: GSSAPI Error: No credentials were supplied, or the credentials were unavailable or inaccessible (No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_0))">>}},
                      [{kpro_sasl,auth,7,
                           [{file,
                                "/opt/brod_gssapi/example/brod_client/_build/default/lib/kafka_protocol/src/kpro_sasl.erl"},
                            {line,43}]},
                       {kpro_connection,init_connection,3,
                           [{file,
                                "/opt/brod_gssapi/example/brod_client/_build/default/lib/kafka_protocol/src/kpro_connection.erl"},
                            {line,240}]},
                       {kpro_connection,init,4,
                           [{file,
                                "/opt/brod_gssapi/example/brod_client/_build/default/lib/kafka_protocol/src/kpro_connection.erl"},
                            {line,170}]},
                       {proc_lib,init_p_do_apply,3,
                           [{file,"proc_lib.erl"},{line,226}]}]}}]}}}:[{example,
                                                                        main,
                                                                        1,
                                                                        [{file,
                                                                          "/opt/brod_gssapi/example/brod_client/src/example.erl"},
                                                                         {line,
                                                                          24}]}]

The same error from int-test.sh.

@starbelly
Copy link
Contributor

Ok, let me take a further look this evening.

@starbelly
Copy link
Contributor

Ok, let me take a further look this evening.

Wasn't able to get to this last night, will plan on looking today.

@zmstone
Copy link
Contributor Author

zmstone commented Aug 28, 2024

Hi @starbelly by any chance you can help to have a look soon?
If not, I'd like to cut a release with the setenv implementation for now.

In the next iteration we can focus on:

  1. unhide the kinit/3 API, or add cache create API.
  2. add cache destroy API.

@starbelly
Copy link
Contributor

starbelly commented Aug 29, 2024

@zmstone I conquer with your findings, it can only work this way in fact, so not sure how I ended up with success previously, but that's another story. Sadly, cyrus-sasl's sasl_client_new provides no way to provide a ccname explicitly, which is what I think is ultimately needed to avoid env vars. Using env vars is unfortunate, but will work for now. I will give this one more look over code wise.

@starbelly starbelly merged commit 0010a80 into master Aug 29, 2024
2 checks passed
@zmstone zmstone deleted the 0823-per-node-ccache branch August 29, 2024 14:30
@zmstone
Copy link
Contributor Author

zmstone commented Aug 29, 2024

thanks! @starbelly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-work kinit
2 participants