Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Restarting node just got a LOT slower #5884

Open
TerminadaPool opened this issue Jun 25, 2024 · 10 comments
Open

[BUG] - Restarting node just got a LOT slower #5884

TerminadaPool opened this issue Jun 25, 2024 · 10 comments
Labels
Stale type: bug Something is not working

Comments

@TerminadaPool
Copy link

External

Area

  • Other: Node performance

Summary
Nodes very slow to restart with significant delay during "Pushing ledger state" from 22% to 44%.

Steps to reproduce
systemctl restart cardano-node

Expected behavior
Restart on a low power ARM machine used to take 8.5 minutes, now takes 36 minutes.

Restart on a more powerful AMD machine used to take 2 minutes, now takes 6 minutes.

All the extra time occurs when "Pushing ledger state" from 22% to 45%.

System info (please complete the following information):

  • OS Name: Debian

  • OS Version: 12 (Bookworm)

  • Node version (low power ARM machine)

cardano-node 8.9.3 - linux-aarch64 - ghc-9.8
git rev e7f5f3a

  • Node version (AMD machine)

cardano-node 8.9.3 - linux-x86_64 - ghc-9.8
git rev e7f5f3a

  • Both AMD and ARM versions compiled with:
    • ghc_version=9.8.2
    • cabal_version=3.10.2.1

Logs

  • Logs from low power ARM machine (equivalent processor to raspberry pi but with more RAM) running cardano-node version 8.9.3:
Jun 25 07:51:26 bp1 systemd[1]: Started cardano-node.service - Cardano Node.
... snip ...
Jun 25 07:51:29 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:51:29.36 UTC] Started opening Chain DB
Jun 25 07:51:29 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:51:29.36 UTC] Started opening Immutable DB
... snip ...
Jun 25 07:57:32 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:57:32.20 UTC] Opened lgr db
Jun 25 07:57:32 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:57:32.20 UTC] Started initial chain selection
Jun 25 07:57:33 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:57:33.70 UTC] Pushing ledger state for block b5b51f187630ebed4a3072f486a60b5a126241f1d99d563d2e42e02fcb97e31d at slot 127654809. Progress: 0.00%
Jun 25 07:58:36 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:58:36.66 UTC] before next, messages elided = 127654824
Jun 25 07:58:36 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 21:58:36.66 UTC] Pushing ledger state for block 009c88b1b75321ae217bcb8bd38ffcb0b51e458f9cdd4ce6abde0d399a41efbd at slot 127664808. Progress: 22.36%
... snip ...
Jun 25 08:24:11 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:24:11.57 UTC] before next, messages elided = 127664916
Jun 25 08:24:11 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:24:11.57 UTC] Pushing ledger state for block 731dde558c6ecb3bbd3e313147a3a4a19461ac445afba0d57d590b8a8ca81151 at slot 127674899. Progress: 44.92%
... snip ...
Jun 25 08:25:27 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:25:27.79 UTC] before next, messages elided = 127674927
Jun 25 08:25:27 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:25:27.79 UTC] Pushing ledger state for block d471b680b20f95be52bb10667bfecf84bf0beaee616ce3ea4a2db461909d304c at slot 127684905. Progress: 67.29%
... snip ...
Jun 25 08:26:51 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:26:51.68 UTC] before next, messages elided = 127684986
Jun 25 08:26:51 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:26:51.68 UTC] Pushing ledger state for block 2b2f68b7d1772fa157bca6ef1b53753464864dc13e06cc17b4a8045202e8328d at slot 127694979. Progress: 89.81%
... snip ...
Jun 25 08:27:25 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:27:25.94 UTC] before next, messages elided = 127695028
Jun 25 08:27:25 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:27:25.94 UTC] Pushing ledger state for block 159d6126d2a2cbfe76cafe5d8f244d96e5f8a1ae3e426b4fd946f5be93448219 at slot 127699537. Progress: 100.00%
Jun 25 08:27:25 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:27:25.94 UTC] Valid candidate at tip b5b51f187630ebed4a3072f486a60b5a126241f1d99d563d2e42e02fcb97e31d at slot 127654809
Jun 25 08:27:25 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:27:25.94 UTC] Initial chain selected
Jun 25 08:27:25 bp1 cardano-node[783721]: [bp1:cardano.node.ChainDB:Info:11] [2024-06-24 22:27:25.94 UTC] Opened db with immutable tip at e11fd98ac6b0a6fc4a504b30024bc9c75ea1111225bd9eb0cfb925c0b2aff27e at slot 127654750 and tip 159d6126d

Note that it took 7 minutes to get to "Pushing ledger state" progress 22% which is typical for this machine from past experience. However it then took over 25 minutes to progress "Pushing ledger state" from 22% to 44%. Normally this stage takes only around 1 to 1.5 minutes. It then took 1.5 minutes to Push the ledger state from 45% to 67% and another 1.5 minutes from 67% to 90% both of which is typical for this slow ARM machine compared to the past.

Logs from more powerful AMD machine running cardano-node version 8.9.3:

Jun 25 08:27:21 relay1 systemd[1]: Started cardano-node.service - Cardano Node.
... snip ...
Jun 25 08:28:39 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:28:39.58 UTC] Opened lgr db
Jun 25 08:28:39 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:28:39.58 UTC] Started initial chain selection
Jun 25 08:28:39 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:28:39.92 UTC] Pushing ledger state for block bc48fc99af50edfd1f0443d1c735743797d11c511d2ca36d1fb375c0b164b33e at slot 127657338. Progress: 0.00%
Jun 25 08:28:54 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:28:54.80 UTC] before next, messages elided = 127657363
Jun 25 08:28:54 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:28:54.80 UTC] Pushing ledger state for block be03bf8d2c52b0f81ea0f3496af44c9462262d0b003d632f742c69cb6b7ee2db at slot 127667331. Progress: 22.52%
... snip ...
Jun 25 08:32:58 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:32:58.36 UTC] before next, messages elided = 127667424
Jun 25 08:32:58 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:32:58.36 UTC] Pushing ledger state for block 489b5d6a0914bed294f47750e38f5b7ce6d118e804bd505ffcb8be3136a010ff at slot 127677401. Progress: 45.22%
... snip ...
Jun 25 08:33:16 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:16.05 UTC] before next, messages elided = 127677490
Jun 25 08:33:16 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:16.06 UTC] Pushing ledger state for block 2852062d7f2186f1dd8556b78639993b2a46868ca2ea1067f9d8751e06fd9d25 at slot 127687464. Progress: 67.90%
... snip ...
Jun 25 08:33:33 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:33.02 UTC] before next, messages elided = 127687563
Jun 25 08:33:33 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:33.02 UTC] Pushing ledger state for block 0b7b08976bcb3a8ece7da3bb95a3816d7394d456917b5983cf2077d54e8409ae at slot 127697558. Progress: 90.65%
... snip ...
Jun 25 08:33:40 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:40.25 UTC] before next, messages elided = 127697621
Jun 25 08:33:40 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:40.25 UTC] Pushing ledger state for block 9d2e8da71a8732120e7aa2ec1b4facf731789643484c23e7ac9cda0864598aad at slot 127701707. Progress: 100.00%
Jun 25 08:33:40 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:40.25 UTC] Valid candidate at tip bc48fc99af50edfd1f0443d1c735743797d11c511d2ca36d1fb375c0b164b33e at slot 127657338
Jun 25 08:33:40 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:40.25 UTC] Initial chain selected
Jun 25 08:33:40 relay1 cardano-node[183103]: [relay1:cardano.node.ChainDB:Info:9] [2024-06-24 22:33:40.25 UTC] Opened db with immutable tip at d25ceb3b1a757d5f874c62d0f93526722c5609ede55b31c9c5ad82e50f089bf7 at slot 127657337 and tip 9d2e8da71

The AMD machine is more powerful but it has similarly seen a significant increase in restart time with all the time increase occurring when pushing the ledger state from 22% to 45%.

From restart this machine took 1.5 minutes to get to "Pushing ledger state" progress 22%. But then to progress from 22% to 45% took 4 minutes. While the progress from 45% to 67% and 67% to 90% took only 18 seconds. In the past it only took around 15-20 seconds to progress the ledger state from 22% to 45% on this machine.

Something in the ledger is now making the nodes take a lot longer to push their ledger state. I also checked a relay running version 8.9.2 (similarly compiled with ghc 9.8.2) and it also now takes much longer to restart with all the extra time occurring when pushing the ledger state from 22% to 45%.

Additional context
Something has changed in the ledger which is causing the node software to do a lot more processing. Maybe there is some particular transaction causing a problem??? Maybe there is a bug???

@TerminadaPool TerminadaPool added the type: bug Something is not working label Jun 25, 2024
@btbf
Copy link

btbf commented Jun 25, 2024

A similar issue is occurring on my server.
It used to sync in around 4 minutes, but I just restarted it and it's stuck at 60% even after 15 minutes.

OS: Ubuntu
cardano-node: 8.9.2
CPU: 14core

@TerminadaPool
Copy link
Author

Things are getting much worse.

Restarting the node used to take 8.5 minutes on my ARM machine but it is now taking over 2hrs!!! (It takes over 1hr "Pushing ledger state" from 68% to 90% and then nearly another hour "Pushing ledger state" from 90% to 100%.)

One of my AMD machines that used to take 2.5 minutes to restart just took 35 minutes!!!

I am now quite fearful to restart my node running on a Contabo vm because this might take a couple of hours.

Any ideas what is going on???

@HeptaSean
Copy link

HeptaSean commented Jun 25, 2024

https://cardanoscan.io/address/addr1q9lywtk836axkaf985822lh033aj5l6nauau8kg7fhdkze9hmc43c96exwsre5xktq4td5h2mzfjmayhtuk44ryy4uas72wh4t and two other addresses push senseless transactions with 194 withdrawal validators each every minute or so causing all blocks to be almost full and all nodes to constantly have to evaluate these validators (that withdraw 0 ADA from unused stake addresses).

EDIT: And it started almost exactly 24 hours ago with https://cardanoscan.io/transaction/ab02ef16b0d863bfe4bf9f488873c967adfd461eb774c27689ac011227ea4f9f, so that fits your observation 13 hours ago.

@rayanebel
Copy link

rayanebel commented Jun 25, 2024

Hello here,

Same thing on my side and I was not able to understand the reason but, event starting from a fresh snapshot, it took ~40 min to my node (producer & relayer) to process ledger state

Replaying ledger from snapshot at 345c904e5201ec9ef9d21dfdb060144b8055201bedc820e84351dfe76c6e68f8 at slot 127704286
Replayed block: slot 127704296 out of 127722624. Progress: 0.05%
Replayed block: slot 127720769 out of 127722624. Progress: 89.88%
Opened lgr db
Started initial chain selection
Pushing ledger state for block c326b4051567c51c7b16415e0e87a6ca622225eec3ed7597b4d0dc4402037d79 at slot 127722628. Progress: 0.00%
Pushing ledger state for block 5efd56b57a8e97c334fa0b72bc5a657b50c447a6e54bbc1a0f50ed7ce3774324 at slot 127732616. Progress: 19.67%
Pushing ledger state for block 665d46763ccff34f3e395d4a0ff46b63a5d61417a65260eb1eae819dad710790 at slot 127732648. Progress: 19.74%
Pushing ledger state for block b4bede6f51369880235eee856e0302bfd6696a16a6a441629d694f2a6d533675 at slot 127732667. Progress: 19.77%

I've also discovered that our node is loading a very old ledger state on a old slot (~19h ago). I thought the node took a new snapshot every 72 minutes. am i wrong ?

We are running our cardano instance with 4CPU and 32GB of RAM with an NVME disk of 250GB.

version: 8.9.3

@rayanebel
Copy link

Hi,

How can we easily check if we can safely restart our nodes ?

@HeptaSean How did you find this transaction quickly ? did you check each transactions for the last 24h ?

@TerminadaPool
Copy link
Author

I just restarted my slow ARM node and it started in 11 minutes which is not unusual and certainly a lot better than over 2hrs like it was taking yesterday. The best I normally see when restarting this ARM node is around 8.5 minutes.

@HeptaSean
Copy link

@HeptaSean How did you find this transaction quickly ? did you check each transactions for the last 24h ?

Someone in the IOG Discord saw the huge transactions on eutxo.org and traced them to that address, those addresses. Then started looking at them in more detail.

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

@github-actions github-actions bot added the Stale label Jul 28, 2024
@TerminadaPool
Copy link
Author

The issue was fixed by "hotfix" releases 8.9.4 and 8.12.2.

@github-actions github-actions bot removed the Stale label Aug 4, 2024
Copy link

github-actions bot commented Sep 4, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 120 days.

@github-actions github-actions bot added the Stale label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type: bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants