Potential deadlock between RequestRoute and LightningChannel commitment state machine #9060
Unanswered
bjohnson5
asked this question in
Troubleshooting
Replies: 1 comment
-
Update: This appears to only be an issue if the graph cache is turned off. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I believe I have discovered a potential deadlock situation, but I am relatively new to LND and wanted to discuss it before opening an issue, to be sure that I am not missing something. This is when using the bbolt backend database.
In
lnwallet/channel.go
theLightningChannel
struct defines several methods that the comments explain as the "state machine which corresponds to the current commitment protocol wire spec". These methods are:SignNextCommitment
,ReceiveNewCommitment
,RevokeCurrentCommitment
, andReceiveRevocation
. Each of these will first lock theLightningChannel
:lc.lock()
and then they will typically attempt to update the channel db.When updating the channel db, sometimes the database must be re-sized and re-mapped to memory using the
mmap
function in bbolt'sdb.go
file. This function first attempts to lock themmaplock
mutex.This is all fine except that if one of the state machine functions is called while the node is trying to find a route a deadlock could occur. The
RequestRoute
function inpayment_session.go
will get a routing graph from the db and this will acquire themmaplock
on the db (for good reason, it needs to be sure the db is not re-mapped while it is using it to find a route). It will eventually call functions of theLightningChannel
struct in order to find bandwidth, balances, etc... It is possible that these functions are locked by one of the state machine methods and that state machine method could be stuck waiting on themmaplock
.For example:
Thread1: RequestRoute -> NewGraphSession() ->
mmaplock.lock()
----------------------------------> p.pathFinder -> availableChanBandwidth -> attempts to call LC functions, blocks waiting on
lc.lock()
Thread2: ReceiveRevocation ->
lc.lock()
-----------------------------------------> AdvanceCommitChainTail -> attempts to update db, blocks waiting on
mmaplock.lock()
If anyone has experience in this area, please let me know if this is all correct and if I should open an issue. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions