Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Reboots (Spinlock Timeout Panic) on iOS 15 arm64e #274

Open
opa334 opened this issue Oct 24, 2023 · 30 comments
Open

Random Reboots (Spinlock Timeout Panic) on iOS 15 arm64e #274

opa334 opened this issue Oct 24, 2023 · 30 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@opa334
Copy link
Owner

opa334 commented Oct 24, 2023

Mapping on top of dyld_shared_cache executable pages seems to trigger an edge case behaviour in the PPL that sometimes causes a timeout on the spinlock of a memory page, resulting in a kernel panic.

The more tweaks that hook C functions are installed and the more processes those inject into, the more often this behaviour seems to be triggered.

It appears this issue could be fixed by wiring down all pages that have been hooked, but the userspace cannot take such a lock and finding the vm_page object in kernel memory to flip the wired bit directly is proving to be difficult.

@opa334 opa334 pinned this issue Oct 24, 2023
@opa334 opa334 added bug Something isn't working help wanted Extra attention is needed labels Oct 24, 2023
@opa334
Copy link
Owner Author

opa334 commented Nov 21, 2023

Here is an attempt at a more indepth explanation of the issue, to my best current understanding of it. Keep in mind it is based on assumptions that are basically impossible to verify.

So in a multithreaded system "locks" are used to prevent two threads from interfering with each other. By that one thread can acquire a lock, make the modification and unlock it. While locked, another thread trying to acquire the lock will wait until the object has been unlocked again.

A spinlock is essentially the same thing, just used for performance relevant stuff and the main difference is that a spinlock can time out if something takes the lock too long while another thread is trying to acquire the lock. So when acquiring a lock and the object is already locked, it would wait for a few ticks and if the object doesn't get unlocked in that time frame, it will time out.

This mechanism by itself is not the issue, the issue has to do with memory pages. Every memory page (which describes an area of 16kB of RAM) has a spinlock so that there are no issues when multiple processes try to acquire the same page at the same time.

Specific pages can be mapped into multiple processes (e.g. if both load the same library), they reuse the same page in order to save memory. Tweaks want to overwrite such memory on a per-process basis, so they have to first make a process-specific copy of the existing mapping and map it on top of it, so that e.g. one page can be modified in one process while remaining stock in the other processes. The issue seems to specifically happen when mapping on top of a page that resides inside the dyld_shared_cache.

The problem is now that Apple probably never tested this kind of hooking and apparently when you do it in a lot of processes, it can cause the original page (the one of the shared mapping) to be paged out, because it's not actively being used. Paging out a page essentially removes it from RAM and when it is accessed again it will be loaded again. On a stock system this will not happen because nothing has been hooked.

Now the root cause appears to be something trying to page a previously paged out shared/executable page back in, this triggers a preemption issue where one thread takes the spinlock and while it has that, it gets preempted to a different context which also takes the same spinlock (Preemption essentially is a mechanism that allows one thread to be used for something else even if it's currently busy, code has to explicitely disable and reenable it if there is a piece of code that should always be executed in one go). So there seems to be one code path which is only invoked from this particular behaviour where Apple does not correctly disable preemption, leading to one thread taking the same spinlock two times, which makes it time out because the old context isn't executing anymore and can't unlock the spinlock again.

As for mitigating it, I tried messing with spinlock related variables to make the threshold that it takes for it to time out higher, unfortunately Apple screwed us over because everything related to that is KTRR protected, for which we do not have a bypass. I guess the proper fix would be to "wire down" (wiring down a page prevents it from being paged out) every to-be-hooked page before it's overwritten to ensure that the page out never happens and therefore the code path involved in the issue doesn't trigger, I tried a bunch of stuff so far but it seems it's straight up impossible to acquire such a wiring from userspace, so it has to be done inside the kernel. Unfortunately the structures involved in this specific shared mapping that causes the issue are very convuluted and I have yet to find a way to get the correct page object to apply the wiring to.

@opa334
Copy link
Owner Author

opa334 commented Nov 21, 2023

So the next step to try and fix it would be to find the vm_page structure of a DSC page in kernel memory, so far all my attempts at finding such a structure have failed.

@eglacias
Copy link

I assume since this is still here it’s still an issue in the latest release? Are there any reboot issues that have been resolved with the new release?

Sounds difficult. If I ever start doing iPhone programming maybe I’ll take a look, Or maybe that would be the most awful first experience of such work I could think of, lol!

well, I’m on my third day without a reboot which is better than I did on pale rain anyway. If I start to get more than a week without any reboots I’m definitely starting a success thread about it as far as I can tell it’s been an issue at all the rules jailbreaks.

The main one in palerain was solved by automatically scheduling a user space reboot every 24 hours

@opa334
Copy link
Owner Author

opa334 commented Feb 21, 2024

This issue only affects arm64e and is fixed in 16.0. So if you're coming from palera1n, you don't need to worry.

@eglacias
Copy link

eglacias commented Mar 2, 2024

You mean it’s fixed after iOS 16.0? Not sure what the 16.0 was. Thinking of buying a new in box iPhone 13 Pro Max because it will be on iOS 15 usually and even some used will be on 16.x, and this is the jailbreak that will support it and all my tweaks!

@Xad-ce
Copy link

Xad-ce commented Mar 16, 2024

Hi,
I’m not sure if it’s a spin lock issue I have but I’ll try to explain

I installed dopamine on an iPhone 7 32GB on 15.7
Even when I removed jailbreak I got the spin lock: the phone shows a white loading circle and sometimes, after I can enter passcode, it returns in spin lock and I need to force restart
My app icons are buggy: disappeared or glitchy
CoreTrust seems to be also, in a way affected, when I click on “Refresh App Registration” it print the white loading circle, sometimes endless
Also I have always under 50MB of free storage, even if I remove all of my unnecessary

I’m a ‘bit dread of sending all data to an new iPhone I received iPhone 12 128GB, could the issue repeat on it ?
I can upgrade to iOS 16, thus the issue would disappear if I understood right

I hope my explanation was clear, that it’s linked in a way- x)
Good luck and thanks lot for your projects !

@opa334
Copy link
Owner Author

opa334 commented Mar 16, 2024

Hi, I’m not sure if it’s a spin lock issue I have but I’ll try to explain

I installed dopamine on an iPhone 7 32GB on 15.7 Even when I removed jailbreak I got the spin lock: the phone shows a white loading circle and sometimes, after I can enter passcode, it returns in spin lock and I need to force restart My app icons are buggy: disappeared or glitchy CoreTrust seems to be also, in a way affected, when I click on “Refresh App Registration” it print the white loading circle, sometimes endless Also I have always under 50MB of free storage, even if I remove all of my unnecessary

I’m a ‘bit dread of sending all data to an new iPhone I received iPhone 12 128GB, could the issue repeat on it ? I can upgrade to iOS 16, thus the issue would disappear if I understood right

I hope my explanation was clear, that it’s linked in a way- x) Good luck and thanks lot for your projects !

What you describe has nothing to do with this issue. A spinlock is simply a random reboot, not a spinning wheel.

@Xad-ce
Copy link

Xad-ce commented Mar 16, 2024

Hi, I’m not sure if it’s a spin lock issue I have but I’ll try to explain
I installed dopamine on an iPhone 7 32GB on 15.7 Even when I removed jailbreak I got the spin lock: the phone shows a white loading circle and sometimes, after I can enter passcode, it returns in spin lock and I need to force restart My app icons are buggy: disappeared or glitchy CoreTrust seems to be also, in a way affected, when I click on “Refresh App Registration” it print the white loading circle, sometimes endless Also I have always under 50MB of free storage, even if I remove all of my unnecessary
I’m a ‘bit dread of sending all data to an new iPhone I received iPhone 12 128GB, could the issue repeat on it ? I can upgrade to iOS 16, thus the issue would disappear if I understood right
I hope my explanation was clear, that it’s linked in a way- x) Good luck and thanks lot for your projects !

What you describe has nothing to do with this issue. A spinlock is simply a random reboot, not a spinning wheel.

Oh okay thanks for the quick response !

Yup I don’t have random reboots but I have to force reboot sometimes_

@OnesuchDev
Copy link

Have you tried mlock(2)? Its manual page even mentions "wired pages": https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mlock.2.html

A single process can mlock the minimum of a system-wide ``wired pages'' limit and the per-process RLIMIT_MEMLOCK resource limit.

@opa334
Copy link
Owner Author

opa334 commented Mar 31, 2024

Have you tried mlock(2)? Its manual page even mentions "wired pages": https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/mlock.2.html

A single process can mlock the minimum of a system-wide ``wired pages'' limit and the per-process RLIMIT_MEMLOCK resource limit.

You cannot mlock a page from the the dyld shared cache, as it is shared in multiple processes. The only way to do so is to use kernel r/w to lock it down in kernel.

@shaylavi
Copy link

shaylavi commented Apr 2, 2024

@opa334 I'm not sure how much this helps but I actually managed to find a pattern for my reboots. It seems to happen based on a geographical location each time I drive my car around some where but I do have a geographical based automation set so it might be related to that as well.

@opa334
Copy link
Owner Author

opa334 commented Apr 8, 2024

@watcher00090
Copy link

@opa334 Couldn't we get a old version of update_dyld_shared_cache (say from a MacOS 10.9 laptop) to see if it might still work?

@OnesuchDev
Copy link

OnesuchDev commented Apr 8, 2024 via email

@opa334
Copy link
Owner Author

opa334 commented Apr 9, 2024

@opa334 Couldn't we get a old version of update_dyld_shared_cache (say from a MacOS 10.9 laptop) to see if it might still work?

Such tooling never worked on iOS in the first place. Even if I had the ability to replace the dyld_shared_cache, that also wouldn't fix this issue as it's about applying different function hooks in different processes, just changing the global one doesn't matter.

@watcher00090
Copy link

@opa334 I have a idea.

@watcher00090
Copy link

watcher00090 commented Apr 13, 2024

@opa334

How can the dyld_shared_cache pages be signed and immutable, because doesn't the operating system need to make changes to the variables in the virtual memory of the system processes as the system processes run? So what exactly is Apple signing and making immutable? The code pages of the system processes? Why can't we just tweak the values of the in-memory variables of these processes? I can't see how Apple could predict what values the in-memory variables of the system processes will be.

So the idea is as follows. Call mmap or sbrk for a given system process to allocate more data segment pages for it. Then, change the function pointers in the in-memory variables in the system processes to call into the functions in our newly-allocated page (we're sort of hacking the binaries here, we would need to write assembly code here), which we then wire down because it's not in the dyld_shared_cache. I thought Apple would only go to the dyld_shared_cache if we are paging out, so it seems like these hacky additional pages would (hopefully) be immune from this issue.

James Pedersen

@opa334
Copy link
Owner Author

opa334 commented Apr 14, 2024

You can't "change function pointers", the entire dyld shared cache is mapped r-- so if you want to replace instructions (which you do when hooking a function, as you replace it with a branch to your code), you need to map a different page on top of it.

I did get one idea from this though: Maybe it's possible to map in a page from the shared cache directly and maybe then you can wire it down, I will try that soon.

@watcher00090
Copy link

@opa334 But why can't you change function pointers, because aren't they computed at runtime?

@opa334
Copy link
Owner Author

opa334 commented Apr 16, 2024

@opa334 But why can't you change function pointers, because aren't they computed at runtime?

Because there are no function pointers, what you refer to are direct branches.

@watcher00090
Copy link

What about the following idea?

Clone the page from dyld_shared_cache that you'd like to change, then change the names of the functions in the cloned page and make sure that the symbol table reflects the changed names, then edit the pages of the process we are tweaking so that it calls into the functions with new names in the cloned page, then wire down the cloned page.

@opa334
Copy link
Owner Author

opa334 commented Apr 26, 2024

What about the following idea?

Clone the page from dyld_shared_cache that you'd like to change, then change the names of the functions in the cloned page and make sure that the symbol table reflects the changed names, then edit the pages of the process we are tweaking so that it calls into the functions with new names in the cloned page, then wire down the cloned page.

The cloned page is not backed by a file so it's always going to be "wired". Also the names of the function does not matter, it's direct branch from point a to point b. Anyhow, my idea from previously to mmap the dyld_shared_cache myself to wire it down does seem to work, so it might be that this issue will be fixed soon.

@Geofferey
Copy link

Geofferey commented Jun 2, 2024

This is still an issue for me but I am guessing you are aware it still happens ;) I got my hands on an XR with 15.2 and found out about this curse the hard way. I’ve since tried everything from limiting my usage of tweaks to limiting my choice of tweaks and using choicy to limit what the tweaks can interact with. At best I’ve gotten about two days of uptime. If there is anything I can help with like providing my next panic-full, just say the word. I did verify it is a true spinlock by peeking at the aforementioned log file after crash.

{"bug_type":"210","timestamp":"2024-06-01 21:54:16.00 -0700","os_version":"iPhone OS 15.2 (19C56)","incident_id":"6DA1DCED-1DAE-4FA4-813F-172E7711BE30"} { "build" : "iPhone OS 15.2 (19C56)", "product" : "iPhone11,8", "kernel" : "Darwin Kernel Version 21.2.0: Sun Nov 28 20:43:38 PST 2021; root:xnu-8019.62.2~1\/RELEASE_ARM64_T8020", "incident" : "6DA1DCED-1DAE-4FA4-813F-172E7711BE30", "crashReporterKey" : "7382076c32e341530e5d3decbf3054d41f6d6783", "date" : "2024-06-01 21:54:16.29 -0700", "panicString" : "panic(cpu 1 caller 0xfffffff0285e9408): Spinlock[0xfffffff046f0a6d4] timeout after 12595003 ticks; current state: 0x732e984af0fffff0, start time: 314350564125, now: 314363159128, timeout: 12582912 @locks.c:723\nDebugger message:

@ljcool2006
Copy link

Here's my panic log:
panic-full-2024-06-13-182234.000.ips.txt

@OnesuchDev
Copy link

OnesuchDev commented Jul 10, 2024

@opa334 I saw that you posted elsewhere that the idea to wire down the dyld shared cache page didn't work. That's unfortunate. Did you also try to wire down the cloned page (both alone, and along with the dyld shared cache page)? You said above that it is always going to be wired because it's not backed by a file, but iOS uses memory compression (like zram) so that is not necessarily true. https://developer.apple.com/videos/play/wwdc2018/416/

@opa334
Copy link
Owner Author

opa334 commented Jul 11, 2024

@opa334 I saw that you posted elsewhere that the idea to wire down the dyld shared cache page didn't work. That's unfortunate. Did you also try to wire down the cloned page (both alone, and along with the dyld shared cache page)? You said above that it is always going to be wired because it's not backed by a file, but iOS uses memory compression (like zram) so that is not necessarily true. https://developer.apple.com/videos/play/wwdc2018/416/

The idea of wiring down the shared cache did work (at least no tester had a spinlock panic with it), but it wasted so much RAM that on devices with less than 3GB RAM, things would go haywire and even on 3GB RAM devices stuff would still break after a few days of usage.

@OnesuchDev
Copy link

OnesuchDev commented Jul 25, 2024 via email

@opa334
Copy link
Owner Author

opa334 commented Jul 25, 2024

No, I was wiring down the entire shared cache since it's very hard to keep track of what pages have been modified. I tried only wiring down those modified, but spinlock panics were still occuring.

@eglacias
Copy link

eglacias commented Sep 7, 2024

Hey there, I never actually got a positive confirmation that this is only an issue on iOS 15? In other words, I never got a clarification that this only affects arm 64E and only on iOS 15, youhe said it was fixed with “16” but I don’t know what you meant by “16” As you said the number, but didn’t say that you meant iOS 16!

By the way, I am on iOS 16.61 on an 8+ and I’ve never had a jailbreak this stable. I never get any re-Springs ever (!) and certainly no reboots. I’ve had up time for up to 20 days Before having to reboot myself, for some other reason.
On all other jailbreaks since iOS 14 taurine or uncover, certain unknown events were causing respring for no consistent rhyme or reason, at least once a day.

@opa334 opa334 changed the title Random Reboots (Spinlock Timeout Panic) Random Reboots (Spinlock Timeout Panic) on iOS 15 arm64e Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants