Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDB stub and async shim rework #29

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

encounter
Copy link
Contributor

@encounter encounter commented Jul 31, 2024

This implements a working GDB stub for x86-emu and x86-unicorn. One larger change is the new state machine in cli/main.rs. This code is now shared between x86-emu and x86-unicorn: it allows stopping, resuming and single-stepping the machine emulation, handling breakpoints and machine errors in a unified way.

Another larger change is reworking the async shim handling. There's no longer any backend-specific code in builtin.rs. (Take a look at the simplified implementation!) x86-unicorn was updated to use the x86-emu approach for future handling (using eip=MAGIC_ADDR to poll futures instead of executing CPU), though I'd like to consider ideas to unify this code between the two as well, if possible.

Other changes:

  • x86-unicorn: Unmapped first 4k of memory to catch zero page accesses
  • Removed --trace-blocks because I didn't feel like reimplementing it (I could, though, if desired)
  • Removed x86-emu snapshot feature. I broke it while refactoring things to use Pin<>, and decided to rip it out and revisit later if it's still a desired feature. One big issue is that it's incompatible with any running futures.

TODO:

  • Fix x86-64 build
  • Fix web build
  • Multi-threading support
  • Restore --trace-points

Copy link
Owner

@evmar evmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sad to give up on snapshotting but I agree it is impossible(?) to preserve in the face of the futures. It is a super bummer, earlier I had partway implemented a reverse debugger that used snapshots. Also serde is a lot of goop that I would rather not deal with in general so dropping it is nice I guess.

I am generally positive on this but it would be nice to preserve --trace-points as we discussed in chat.

web/glue/src/lib.rs Outdated Show resolved Hide resolved
win32/derive/src/gen.rs Show resolved Hide resolved
}
_ => return false,
x86::CPUState::Error(message) => StopReason::Error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW one reason I had a weird API here around returning a bool is that this run function is the hottest code when emulating. So when this returns a constructed StopReason enum, the caller is responsible for tearing that StopReason down when it drops which means the drop impl needs to check if it's a StopReason::Error and if so free the message. I cannot remember if I measured this mattering or if I was just worrying about it in the abstract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the drop check here will matter much, it won't have to do anything in most cases. Happy to rework it if you have suggestions, though

/// The CPU hit a debug breakpoint.
Breakpoint { eip: u32 },
/// The CPU hit a shim call.
ShimCall(&'static Shim),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I have been (slowly) working on a change to how shims work that will require something very similar to this.

The summary is that kernel32.dll becomes a real dll that contains e.g.

_ReadFile:
  syscall
  ret 32

and then the cpu's syscall handler will trigger a StopReason like the above.

(The motivation for this change is (1) using a real dll makes it easier to handle exposing raw symbols from dlls as needed for msvcrt, and (2) a demoscene unpacker attempts to walk the dll headers in memory so I need actual dlls to exist in memory to appease them.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coincidentally another win32 emu I've been following just made a similar change
https://inuh.net/@[email protected]/112889440660822739

@evmar
Copy link
Owner

evmar commented Aug 2, 2024

Thinking about this change, I would like to merge it in pieces to better understand the different parts. Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?

@encounter
Copy link
Contributor Author

Do you mind if i pick up parts of it (e.g. "remove snapshotting") separately?

Go for it! I'll rebase accordingly once I get time to work on it again.

@evmar
Copy link
Owner

evmar commented Aug 4, 2024

1129538 removes all the snapshotting, including the signal handler and web UI for it

@evmar
Copy link
Owner

evmar commented Aug 4, 2024

I copied your change to add Handler::Async, and one thing I notice about it is we end up with two Boxed futures per async call. The first basically holds the decoded stack args, and then the second calls the first and updates state based on its result. The previous code had only one boxed future because it bundled those two things together. I'm not sure there's a good way to resolve this.

#31 -- my attempt

@encounter
Copy link
Contributor Author

encounter commented Aug 5, 2024

If handle_shim_call was always async, one Boxed future could be avoided. It’s only used to conditionally return a Future. Maybe it would work to make that function return Option<impl Future> instead?

edit: I realized that wouldn’t allow us to store it in a Vec, unless we converted it to a custom Future implementation to make it a concrete type.

@evmar
Copy link
Owner

evmar commented Aug 5, 2024

Hm yeah, I had a similar thought. Unfortunately it's put in Vec stored per-cpu, and the cpu layer doesn't know about shims, hrmm.

@encounter
Copy link
Contributor Author

encounter commented Aug 7, 2024

In 5c92c7d I was able to simplify the async shim handling. It pushes the responsibility for storing and polling the future up into the top-level event loop (cli or web, web updates in 6822158). machine.call_shim returns the BoxFuture<u32> from the shim directly, instead of wrapping it in another future for updating the registers. The event loop stores this future, polls it, and calls finish_shim_call when complete, which will then run the machine-specific register updates.

x86-emu and x86-unicorn only have to return StopReason::Blocked when their EIP=MAGIC_ADDR, which tells the outer event loop to perform future polling. This simplifies their implementations quite a bit.

@evmar
Copy link
Owner

evmar commented Aug 7, 2024

I haven't had a chance to look at this yet, but I wanted to note my pending builtins-as-dlls work lets us eliminate some of the post-shim code: 60b256e

basically the new flow is that the original binary does some call [SomeFn] which shows up at

SomeFn:
   syscall  ; causes Rust impl to be invoked
   ret 12   ; the number here is stack_consumed

The only thing the shim implementation code needs to manage is taking the return value of the Rust fn and putting it into eax.

I haven't quite figured it out yet but I am pretty sure that async fns will benefit similarly.

Now that I've typed this out, maybe the eax handling means this doesn't win anything, hrm.

@evmar
Copy link
Owner

evmar commented Aug 7, 2024

New idea: make the futures Vec a Vec of Future<Option<u32>>, where a present value means "put this in eax when done". Coupled with my other change that removes the other code that needs to run after an async block I think that is enough?

@encounter
Copy link
Contributor Author

Sounds good to me. I think that's similar to the solution I cooked up in the above commit:

struct AsyncShimCall {
  shim: &'static win32::shims::Shim,
  future: BoxFuture<u32>,
}
let mut shim_calls = Vec::<AsyncShimCall>::new();

// ...

match stop_reason {
  win32::StopReason::Blocked => {
    // Poll the last future.
    let shim_call = shim_calls.last_mut().unwrap();
    match shim_call.future.as_mut().poll(&mut ctx) {
      Poll::Ready(ret) => {
        target.machine.finish_shim_call(shim_call.shim, ret);

Except now finish_shim_call is only responsible for setting eax, instead of doing the eip and stack manipulation as well.

@encounter encounter force-pushed the gdb-stub branch 2 times, most recently from 7265fd6 to 25adc6f Compare August 13, 2024 01:58
@encounter encounter changed the title WIP GDB stub GDB stub and async shim rework Aug 13, 2024
@encounter encounter marked this pull request as ready for review August 13, 2024 01:59
@evmar
Copy link
Owner

evmar commented Sep 12, 2024

I am so so sorry this has taken me so long! I have merged pieces of it and I am working on the main debugger part, but it also managed to collide horribly with this dll change I've also been working on for a long time so it's been kind a three way train wreck between your change, my change, and also my personal life stuff.

Ok(stream)
}

pub type StateMachine<'a> = GdbStubStateMachine<'a, MachineTarget, std::net::TcpStream>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to follow why you went with implementing this StateMachine type rather that using the simpler event loop from the gdbstub examples. Can you explain a bit?

I would have expected something more like this
https://github.com/daniel5151/gdbstub/blob/6e72c26211515bf15b8a462d195f6ccf418f4c92/examples/armv4t/main.rs#L95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants