Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia fails to unwind Garbage Collection frames #300

Open
aristotelhs opened this issue Sep 1, 2023 · 1 comment
Open

Julia fails to unwind Garbage Collection frames #300

aristotelhs opened this issue Sep 1, 2023 · 1 comment

Comments

@aristotelhs
Copy link
Contributor

While using ddprof in PID mode to profile a Julia program there are a number of frames that end up being in [Incomplete] frames. Like in the following example:
image

These frames seems to be (mostly at least if not exclusively) coming from GC which is using SIGSEGV to sync multiple threads. DDprof has a problem unwinding these frames since the stack unwinding falls into JITTED Julia code and it is not able to follow up. (It seems that the jitted julia code is not exported to the normal jit-dump in the corresponding file since this is jitted during Julia startup).
Using GDB to unwind the same stack frame results also i n a weird stack-frame:

Thread 7 "julia" received signal SIGSEGV, Segmentation fault.
0x00007ffff725dae4 in jl_gc_state_set (state=<optimized out>, old_state=<optimized out>, ptls=0x7fffc4000b70) at /home/user/profiling/julia/julia/src/julia_threads.h:349
349             jl_gc_safepoint_(ptls);
(gdb) bt
#0  0x00007ffff725dae4 in jl_gc_state_set (state=<optimized out>, old_state=<optimized out>, ptls=0x7fffc4000b70)
    at /home/user/profiling/julia/julia/src/julia_threads.h:349
#1  maybe_collect (ptls=0x7fffc4000b70) at /home/user/profiling/julia/julia/src/julia_threads.h:342
#2  ijl_gc_managed_malloc (sz=8000000) at /home/user/profiling/julia/julia/src/gc.c:3993
#3  0x00007ffff7224ca5 in _new_array_ (elsz=<optimized out>, zeroinit=0 '\000', isunion=0 '\000', hasptr=<optimized out>, isunboxed=1 '\001',
    dims=0x7fff9c3ffdc8, ndims=1, atype=0x7fffe3962b70 <jl_system_image_data+2193584>) at /home/user/profiling/julia/julia/src/array.c:141
#4  _new_array (dims=<optimized out>, ndims=<optimized out>, atype=<optimized out>) at /home/user/profiling/julia/julia/src/array.c:198
#5  ijl_alloc_array_1d (atype=0x7fffe3962b70 <jl_system_image_data+2193584>, nr=<optimized out>) at /home/user/profiling/julia/julia/src/array.c:436
#6  0x00007fffc8f01e38 in ?? ()
#7  0x41431384d01825fa in ?? ()
#8  0x00007fffed32cb68 in ?? ()
#9  0x0000000000000004 in ?? ()
#10 0x0000000000000000 in ?? ()

The 0x00007fffc8f01e38 is in the jitted mmaped area, and the next stackframe 0x41431384d01825fa is completely off (this address is not even mmaped looking into /proc/$pid/maps)

However GDB can use the jit listener interface (https://sourceware.org/gdb/onlinedocs/gdb/JIT-Interface.html#JIT-Interface) . This is enabled for Julia by setting the variable export ENABLE_GDBLISTENER=1 before executing Julia.
In this case unwinding the stack after the signal works as it is supposed to:

(gdb) bt
#0  0x00007ffff725dae4 in jl_gc_state_set (state=<optimized out>, old_state=<optimized out>, ptls=0x5555555bd2e0)
    at /home/user/profiling/julia/julia/src/julia_threads.h:349
#1  maybe_collect (ptls=0x5555555bd2e0) at /home/user/profiling/julia/julia/src/julia_threads.h:342
#2  ijl_gc_managed_malloc (sz=8000000) at /home/user/profiling/julia/julia/src/gc.c:3993
#3  0x00007ffff7224ca5 in _new_array_ (elsz=<optimized out>, zeroinit=0 '\000', isunion=0 '\000',
    hasptr=<optimized out>, isunboxed=1 '\001', dims=0x7fff9afffdc8, ndims=1,
    atype=0x7fffe3962b70 <jl_system_image_data+2193584>) at /home/user/profiling/julia/julia/src/array.c:141
#4  _new_array (dims=<optimized out>, ndims=<optimized out>, atype=<optimized out>)
    at /home/user/profiling/julia/julia/src/array.c:198
#5  ijl_alloc_array_1d (atype=0x7fffe3962b70 <jl_system_image_data+2193584>, nr=<optimized out>)
    at /home/user/profiling/julia/julia/src/array.c:436
#6  in Array () at boot.jl:477
#7  Array () at boot.jl:486
#8  Array () at boot.jl:494
#9  rand () at /home/user/profiling/julia/julia/usr/share/julia/stdlib/v1.9/Random/src/Random.jl:291
#10 rand () at /home/user/profiling/julia/julia/usr/share/julia/stdlib/v1.9/Random/src/Random.jl:292
#11 rand () at /home/user/profiling/julia/julia/usr/share/julia/stdlib/v1.9/Random/src/Random.jl:280
#12 macro expansion () at /home/user/profiling/ddprof/test.jl:5
#13 julia_#3_24 () at threadingconstructs.jl:404
#14 0x00007fffc4f01f50 in jfptr_#3_25 ()
#15 0x00007ffff72228d7 in jl_apply (nargs=1, args=<unavailable>)
    at /home/user/profiling/julia/julia/src/julia.h:1880
#16 start_task () at 
/home/user/profiling/julia/julia/src/task.c:1092

GDB is able to continue because it is able to figure out that this is jitted code (probably looking up the symbols that exist in __jit_debug_descriptor or tracks them through __jit_debug_register_code and then stops the FP unwinding and uses jitted dwarf debug info (this last part is a bit of an educated guess)

Implementing the same mechanism right now might require a lot of effort but it would be great if there is a way for these frames to be detected and be reported something else instead of [Incomplete].

Although this was shown in Julia, I believe that this is relevant to all languages which use LLVM as a jit backend is tha same way.

@r1viollet
Copy link
Collaborator

I started looking into this working from @aristotelhs 's pointers.

I could not find dwarf information within the jitdump. It seems llvm does not emit it through jitdump.
It is not clear to me where debug information is coming from in gdb has some specific code to unwind jit information.
I will look more into this next week. A reproducer julia script would be great, if we can add this to the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants