feat(ICP-Ledger): FI-1433: Implement V1 for ICP ledger - add ability to read from memory manager in post_upgrade #1020

maciejdfinity · 2024-08-20T15:20:17Z

No description provided.

…m-mgr

…j-icp-mem-mgr

rs/rosetta-api/icp_ledger/ledger/src/main.rs

rs/ledger_suite/icp/ledger/BUILD.bazel

rs/ledger_suite/icp/ledger/Cargo.toml

NikolasHai · 2024-10-03T08:43:16Z

rs/ledger_suite/icp/ledger/src/main.rs

+    // In order to read the first bytes we need to use ic_cdk.
+    // dfn_core assumes the first 4 bytes store stable memory length
+    // and return bytes starting from the 5th byte.
+    let mut magic_bytes_reader = ic_cdk::api::stable::StableReader::default();


To avoid possibly triggering this code without intent during a future upgrade, maybe we want to put the code that touches the memory behind a boolean that is set in the upgrade args?

Which code do you mean? For V1, the post upgrade needs to be able to read stable memory written with using StableWriter from dfn_core, or written using the MemoryManager. This code reads the first three bytes to detect which method was used, and reads the data, deserializes it, and initializes the ledger state on the heap. The code here needs to be run in all upgrades until we have upgraded successfully to V3 (which will have migrated the allowances to stable structures), at which point we won't support reading data written using StableWriter, and we can remove that part of the code.

I just meant that the code is protected by a conditional like

if upgrade_args.run_v1_phase{ ... }

This way we avoid unintentionally run the part that messes with memory. We have had instances in the past where we executed code that we did not mean to execute because of this kind of behaviour.

I see. We did discuss different options for handling the different versions during the migration, including upgrade arguments, feature flags, feature branches etc. The agreed upon approach was to small PRs for each version, only commit them to master when we are ready to release, and once merged, qualify a release and make an upgrade proposal. Only after the changes from one version have been deployed on mainnet would we start committing changes for the subsequent version. After merging V1, I don't see a situation where we wouldn't want this code to run. This new code also doesn't change anything in the memory, it only adds support for reading stable memory written using two different approaches.

rs/ledger_suite/icp/ledger/src/main.rs

rs/ledger_suite/icp/ledger/tests/tests.rs

…to V1

mraszyk · 2024-10-03T10:32:44Z

rs/ledger_suite/icp/ledger/src/main.rs

+    const MAGIC_BYTES: &[u8; 3] = b"MGR";
+    let mut first_bytes = [0u8; 3];
+    let memory_manager_found = match magic_bytes_reader.read_exact(&mut first_bytes) {
+        Ok(_) => first_bytes == *MAGIC_BYTES,


Could it be that the dfn_core also write "MGR" as the first three bytes because the binary encoding of the stable memory length would accidentally happen to start with "MGR"?

Good point! In addition to the magic bytes "MGR", the memory manager also writes the layout version, which is a 1u8. So the first four bytes would be [0x77, 0x71, 0x82, 0x01], and this would correspond to a stable memory size of 25325943 bytes (StableWriter writes the length u32 as le_bytes). Looking at the heap memory usages of the ledger canisters, they are all using an even amount of bytes. Based only on this, and without having looked into the details on how the serialization is done (if it could end up being an odd number of bytes), I would say that it is very unlikely that we would run into a situation where dfn_core would end up writing the same four bytes as the memory manager, WDYT?

this would correspond to a stable memory size of 25325943 bytes

I've got a different memory size:

>>> int.from_bytes(b"MGR" + bytes([1]), "little") 22169421

I agree that it is unlikely to hit exactly this value, but I'm also not sure how to be sure (I'm not sure that CBOR tries to impose any byte-alignment, so it could be a coincidence that the sizes are all even).

it could be a coincidence that the sizes are all even

Nevertheless, maybe an acceptable risk to take.

I did some further local testing where I wrote b"MGR" + bytes([1] to the first bytes of raw stable memory in the pre-upgrade without memory manager, such that in the V1 post-upgrade the code would detect the memory manager. In this case, when thereafter trying to deserialize the state, expecting that it was written using the memory manager, the canister trapped. A subsequent downgrade back to the V0 mainnet version was successful, so even if it happened that V0 wrote b"MGR" + bytes([1] as the stable memory size, it is very likely that the subsequent memory manager initialization would fail. This would still be an issue for SNS ICRC ledgers, in the sense that they would require a fixed version to be able to further upgrade, but at least there wouldn't be any data loss/corruption. This is all anyway extremely unlikely to happen, so as you said, it may be an acceptable risk to take.

A subsequent downgrade back to the V0 mainnet version was successful

I guess this is because the actual size of the stable memory that was read was less than

>>> int.from_bytes(b"MGR" + bytes([1]), "little") 22169421

(otherwise dfn_core would likely fail)

on the other hand, in practice, the actual size would not be overriden by b"MGR" + bytes([1]), so the downgrade would actually be possible

If this would be a legitimate scenario, the b"MGR" + bytes([1]) would be the size actual stable memory, so we should be able to read all that correctly in a downgrade. In a scenario where we (accidentally or on purpose) set the length to that value, but the actual serialized ledger state is in fact larger, then as you pointed out dfn_core would likely fail.

rs/ledger_suite/icp/ledger/src/main.rs

mraszyk · 2024-10-03T10:50:32Z

rs/ledger_suite/tests/sm-tests/src/lib.rs

+                            if !downgrade_to_mainnet_should_succeed {
+                                panic!("Downgrade from memory manager directly to mainnet should fail (since mainnet is V0)!")
+                            }
+                        }


now we're back at the mainnet version and will go up to the current again in the next step instead of down to the current

I reread the test again, and IIUC, the flow is as follows:

Install the ledger canister with the mainnet version ledger_wasm_mainnet (which is currently V0 for ICP, V1 for the ICRC ledger) (line 2618)

(Apply transactions)

Upgrade the ledger twice with ledger_wasm_current, which is V1 for both ledgers (lines 2651, 2653)

If specified, upgrade the ledger twice to ledger_wasm_nextmigrationversionmemorymanager, which is V2 for both ledgers (built with a feature flag used only for testing) (lines 2658, 2660)

Downgrade from V2 to the mainnet version ledger_wasm_mainnet (V0 for ICP, V1 for ICRC) (line 2663).

In case of the ICP ledger, this downgrade shall fail, since downgrading from V2 to V0 is not supported (writing with the memory manager, reading only from raw stable memory)

In case of the ICRC ledger, this downgrade succeeds, since it is now only downgrading from V2 to V1, and V1 can read data written using the memory manager.

Does this match your understanding? Are there changes needed?

In case of the ICRC ledger, this downgrade succeeds, since it is now only downgrading from V2 to V1,

this is based on the assumption that mainnet = current, right? in the general case, the test should do V0 -> V1 -> V1 (-> V2 -> V2 -> V1, if applicable) -> V0, but the code actually does: V0 -> V1 -> V1 -> V2 -> V2 -> V0 -> V1 -> V0 (meaning that V2 -> V1 is untested); if V0=V1 (as you seem to point out), then there's no difference, but in the general case, we seem to be lacking some coverage

this is based on the assumption that mainnet = current, right?

For the ICRC ledger this is now the case, yes!

Thanks for spelling it out with the example! The original test was doing V0 -> V1 -> V1 -> V2 -> V2 -> V0 (this would fail, so the ledger would still be at V2) -> V1 -> V0, which would cover all the cases. Now, since mainnet ICRC is at V1, we are in fact missing the V2 -> V1 for the ICRC ledger - I'll add this. In a separate PR, I will add fixed versions of the ledgers that are deployed on mainnet, so that if the versions currently in mainnet-canisters.bzl are updated, the test doesn't need to be adapted/no new test coverage gaps are introduced.

if the versions currently in mainnet-canisters.bzl are updated, the test doesn't need to be adapted/no new test coverage gaps are introduced

I don't see how this would make any difference

Co-authored-by: mraszyk <[email protected]>

rs/ledger_suite/tests/sm-tests/src/lib.rs

maciejdfinity and others added 26 commits August 5, 2024 12:03

add reading and writing with mem mgr

624d695

Merge branch 'master' into maciej-mem-mgr

9fa9c9a

Automatically updated Cargo*.lock

62d95ea

add commandline param, test

5ee34bc

Merge branch 'maciej-mem-mgr' of github.com:dfinity/ic into maciej-me…

e296b10

…m-mgr

clippy

b7375ea

Merge branch 'master' into maciej-mem-mgr

fc63de9

do not serialize the upgrade variable

cdba9f5

use feature flag for upgrading to mem mgr

a856eb2

remove upgrade param

96eec58

unused imports

082e328

Merge branch 'master' into maciej-mem-mgr

dd1c006

build fix

2b76ae5

clippy

3132edc

adding tests

13db625

clippy

3723009

test fail with mem mgr

0be0e82

testing upgrade instruction counter metric with memory mananger

6410905

test v2->v2 upgrade

1b24593

Merge branch 'master' into maciej-mem-mgr

badfbc9

Merge branch 'maciej-mem-mgr' into maciej-icp-mem-mgr

a9dcade

memory manager for upgrades in icp ledger

369e5e8

Merge branch 'master' into maciej-mem-mgr

64b3cdb

Automatically updated Cargo*.lock

f4b3ff2

Merge branch 'maciej-mem-mgr' into maciej-icp-mem-mgr

2599ead

Merge branch 'maciej-icp-mem-mgr' of github.com:dfinity/ic into macie…

1e87a3f

…j-icp-mem-mgr

maciejdfinity changed the title ~~Maciej icp mem mgr~~ feat(ICP-Ledger): add ability to read from memory manager in post_upgrade Aug 20, 2024

github-actions bot added the feat label Aug 20, 2024

clippy

cc532f8

maciejdfinity changed the title ~~feat(ICP-Ledger): add ability to read from memory manager in post_upgrade~~ feat(ICP-Ledger): FI-1433: add ability to read from memory manager in post_upgrade Aug 20, 2024

mbjorkqvist mentioned this pull request Sep 11, 2024

fix(ICRC-ledger): FI-1482: read magic bytes to determine if memory manager is used #1448

Merged

maciejdfinity commented Sep 12, 2024

View reviewed changes

rs/rosetta-api/icp_ledger/ledger/src/main.rs Outdated Show resolved Hide resolved

maciejdfinity and others added 4 commits September 13, 2024 09:55

check for magic bytes

f2fbe52

Automatically updated Cargo*.lock

1ceb047

Merge branch 'master' into maciej-icp-mem-mgr

13eb5d9

Merge branch 'master' into maciej-icp-mem-mgr

d8d4357

maciejdfinity changed the title ~~feat(ICP-Ledger): FI-1433: add ability to read from memory manager in post_upgrade~~ feat(ICP-Ledger): FI-1433: V1: add ability to read from memory manager in post_upgrade Oct 1, 2024

maciejdfinity changed the title ~~feat(ICP-Ledger): FI-1433: V1: add ability to read from memory manager in post_upgrade~~ feat(ICP-Ledger): FI-1433: Implement V1 for ICP ledger - add ability to read from memory manager in post_upgrade Oct 1, 2024

maciejdfinity requested a review from mraszyk October 2, 2024 07:30

mbjorkqvist requested a review from NikolasHai October 3, 2024 08:19

Merge branch 'master' into maciej-icp-mem-mgr

1f490b4

NikolasHai reviewed Oct 3, 2024

View reviewed changes

rs/ledger_suite/icp/ledger/BUILD.bazel Show resolved Hide resolved

NikolasHai reviewed Oct 3, 2024

View reviewed changes

rs/ledger_suite/icp/ledger/Cargo.toml Show resolved Hide resolved

NikolasHai reviewed Oct 3, 2024

View reviewed changes

rs/ledger_suite/icp/ledger/src/main.rs Show resolved Hide resolved

NikolasHai reviewed Oct 3, 2024

View reviewed changes

rs/ledger_suite/icp/ledger/tests/tests.rs Outdated Show resolved Hide resolved

mbjorkqvist added 2 commits October 3, 2024 11:04

Rename method

f6d003c

Fix

357782a

mbjorkqvist requested a review from NikolasHai October 3, 2024 09:17

Handle downgrade to mainnet since ICRC ledgers were already upgraded …

d828cc5

…to V1