bcachefs: fix initial page state after falloc #249

dlrobertson · 2021-06-09T02:40:16Z

On a new page reservation check if the backing extent was already
allocated before adding to the number of sectors we will allocate.
It is possible to write to a page that is beyond the inode i_size
that is also backed by an allocated extent. This can happen when
fallocate is used with FALLOC_FL_KEEP_SIZE.

Signed-off-by: Dan Robertson [email protected]

An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

this is used in only one place now, so just inline it into the caller. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

Change fsck code to always put btree iterators - also, make some flow control improvements to deal with lock restarts better, and refactor check_extents() to not walk extents twice for counting/checking i_sectors. Signed-off-by: Kent Overstreet <[email protected]>

This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <[email protected]>

We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <[email protected]>

This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <[email protected]>

comparison was wrong Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

Prep work for snapshots Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

…dvance The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <[email protected]>

btree node iterators need to obey the regular btree node invarionts w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have less that it needs to check. Signed-off-by: Kent Overstreet <[email protected]>

This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <[email protected]>

Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <[email protected]>

This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <[email protected]>

peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <[email protected]>

More prep work for snapshots. Signed-off-by: Kent Overstreet <[email protected]>

It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <[email protected]>

It had some silly redundancies. Signed-off-by: Kent Overstreet <[email protected]>

External (to the btree iterator code) users of bch2_btree_iter_traverse expect that on success the iterator will be pointed at iter->pos and have that position locked - but since we split iter->pos and iter->real_pos, that means it has to update iter->real_pos if necessary. Internal users don't expect it to modify iter->real_pos, so we need two separate functions. Signed-off-by: Kent Overstreet <[email protected]>

After the v5.12 rebase, we started oopsing when truncate was passed ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This refactors things so that truncate/extend finish by using bch2_setattr_nonsize(), which solves the problem. Signed-off-by: Kent Overstreet <[email protected]>

- We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <[email protected]>

Small improvements to some percpu utility code. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

Ensure that iter->should_be_locked value is set to true before we call bch2_trans_update in ec_stripe_update_ptrs. Signed-off-by: Dan Robertson <[email protected]>

dlrobertson · 2021-06-18T00:35:04Z

Updated and seems to pass xfstests.

It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <[email protected]>

Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <[email protected]>

Signed-off-by: Kent Overstreet <[email protected]>

We weren't checking bch2_btree_node_read_done() for errors, oops. Signed-off-by: Kent Overstreet <[email protected]>

We need to ensure that packed formats can't represent fields larger than the unpacked format, which is a bit tricky since the calculations can also overflow a u64. This patch fixes a shift and simplifies the overall calculations. Signed-off-by: Kent Overstreet <[email protected]>

The value of f_bfree and f_bavail should be the same. The value of f_bfree is not currently scaled by the availability factor. Signed-off-by: Dan Robertson <[email protected]>

Avoid calling kfree on the returned error pointer if bch2_acl_from_disk fails. Signed-off-by: Dan Robertson <[email protected]>

When initializing the page state, set the page sectors state to that of what exists in the btree. This allows us to check if the backing extent was already allocated before adding to the inode i_blocks value. Signed-off-by: Dan Robertson <[email protected]>

dlrobertson · 2021-06-29T12:21:33Z

Updated to only run the btree check if the page is not uptodate. I kept the check in the page state creation to ensure this check isn't run often.

dlrobertson · 2021-07-10T23:42:19Z

I think i have an idea for a better implementation of this. We don't seem to call write_begin for a buffered write. I think if we add that, i might be able to work something out there so that we essentially run the extra code added to __bch2_page_state_create that checks the btree for reserved sectors in write_begin when needed.

…frontend" As reported by Thomas Voegtle <[email protected]>, sometimes a DVB card does not initialize properly booting Linux 6.4-rc4. This is not always, maybe in 3 out of 4 attempts. After double-checking, the root cause seems to be related to the UAF fix, which is causing a race issue: [ 26.332149] tda10071 7-0005: found a 'NXP TDA10071' in cold state, will try to load a firmware [ 26.340779] tda10071 7-0005: downloading firmware from file 'dvb-fe-tda10071.fw' [ 989.277402] INFO: task vdr:743 blocked for more than 491 seconds. [ 989.283504] Not tainted 6.4.0-rc5-i5 #249 [ 989.288036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 989.295860] task:vdr state:D stack:0 pid:743 ppid:711 flags:0x00004002 [ 989.295865] Call Trace: [ 989.295867] <TASK> [ 989.295869] __schedule+0x2ea/0x12d0 [ 989.295877] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 989.295881] schedule+0x57/0xc0 [ 989.295884] schedule_preempt_disabled+0xc/0x20 [ 989.295887] __mutex_lock.isra.16+0x237/0x480 [ 989.295891] ? dvb_get_property.isra.10+0x1bc/0xa50 [ 989.295898] ? dvb_frontend_stop+0x36/0x180 [ 989.338777] dvb_frontend_stop+0x36/0x180 [ 989.338781] dvb_frontend_open+0x2f1/0x470 [ 989.338784] dvb_device_open+0x81/0xf0 [ 989.338804] ? exact_lock+0x20/0x20 [ 989.338808] chrdev_open+0x7f/0x1c0 [ 989.338811] ? generic_permission+0x1a2/0x230 [ 989.338813] ? link_path_walk.part.63+0x340/0x380 [ 989.338815] ? exact_lock+0x20/0x20 [ 989.338817] do_dentry_open+0x18e/0x450 [ 989.374030] path_openat+0xca5/0xe00 [ 989.374031] ? terminate_walk+0xec/0x100 [ 989.374034] ? path_lookupat+0x93/0x140 [ 989.374036] do_filp_open+0xc0/0x140 [ 989.374038] ? __call_rcu_common.constprop.91+0x92/0x240 [ 989.374041] ? __check_object_size+0x147/0x260 [ 989.374043] ? __check_object_size+0x147/0x260 [ 989.374045] ? alloc_fd+0xbb/0x180 [ 989.374048] ? do_sys_openat2+0x243/0x310 [ 989.374050] do_sys_openat2+0x243/0x310 [ 989.374052] do_sys_open+0x52/0x80 [ 989.374055] do_syscall_64+0x5b/0x80 [ 989.421335] ? __task_pid_nr_ns+0x92/0xa0 [ 989.421337] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421339] ? do_syscall_64+0x67/0x80 [ 989.421341] ? syscall_exit_to_user_mode+0x20/0x40 [ 989.421343] ? do_syscall_64+0x67/0x80 [ 989.421345] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 989.421348] RIP: 0033:0x7fe895d067e3 [ 989.421349] RSP: 002b:00007fff933c2ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [ 989.421351] RAX: ffffffffffffffda RBX: 00007fff933c2c10 RCX: 00007fe895d067e3 [ 989.421352] RDX: 0000000000000802 RSI: 00005594acdce160 RDI: 00000000ffffff9c [ 989.421353] RBP: 0000000000000802 R08: 0000000000000000 R09: 0000000000000000 [ 989.421353] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 989.421354] R13: 00007fff933c2ca0 R14: 00000000ffffffff R15: 00007fff933c2c90 [ 989.421355] </TASK> This reverts commit 6769a0b. Fixes: 6769a0b ("media: dvb-core: Fix use-after-free on race condition at dvb_frontend") Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Mauro Carvalho Chehab <[email protected]>

When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after sg_miter_next() call. The driver then increments it as bytes are read, so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop(): WARN_ON(miter->consumed > miter->length); WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249 Hardware name: Generic DT based system Workqueue: events_freezable mmc_rescan Call trace:. unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x44/0x5c dump_stack_lvl from __warn+0x78/0x16c __warn from warn_slowpath_fmt+0xb0/0x160 warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c sg_miter_stop from moxart_request+0xb0/0x468 moxart_request from mmc_start_request+0x94/0xa8 mmc_start_request from mmc_wait_for_req+0x60/0xa8 mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150 mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420 mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c mmc_attach_sd from mmc_rescan+0x1e0/0x298 mmc_rescan from process_scheduled_works+0x2e4/0x4ec process_scheduled_works from worker_thread+0x1ec/0x24c worker_thread from kthread+0xd4/0xe0 kthread from ret_from_fork+0x14/0x38 This patch adds initial zeroing of sgm->consumed. It is then incremented as bytes are read or written. Signed-off-by: Sergei Antonov <[email protected]> Cc: Linus Walleij <[email protected]> Fixes: 3ee0e7c ("mmc: moxart-mmc: Use sg_miter for PIO") Reviewed-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>

koverstreet added 30 commits June 9, 2021 19:40

bcachefs: Kill reflink option

eedba9b

An option was added to control whether reflink support was on or off because for a long time, reflink + inline data extent support was missing - but that's since been fixed, so we can drop the option now. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix a btree iterator leak

0000eb6

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Kill btree_iter_pos_changed()

421c083

this is used in only one place now, so just inline it into the caller. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Add a print statement for when we go read-write

68e2feb

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Don't list non journal devs in journal_debug_to_text()

9ad857b

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix btree iterator leak in extent_handle_overwrites()

058bb7a

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: btree_iter_set_dontneed()

049d47f

This is a bit clearer than using bch2_btree_iter_free(). Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Require all btree iterators to be freed

521bd0b

We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Assert that iterators aren't being double freed

012c691

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Kill bkey ops->debugcheck method

f67ffc8

This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Don't overwrite snapshot field in bch2_cut_back()

a323cdd

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Validate bset version field against sb version fields

931da4c

The superblock version fields need to be accurate to know whether a filesystem is supported, thus we should be verifying them. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Don't unconditially version_upgrade in initialize

0edf1f2

This is mkfs's job. Also, clean up the handling of feature bits some. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix iterator picking

2ea9ffb

comparison was wrong Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Optimize bch2_btree_iter_verify_level()

cc34d11

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Switch extent_handle_overwrites() to one key at a time

428d3b4

Prep work for snapshots Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Get disk reservation when overwriting data in old snapshot

39cc509

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Iterators are now always consistent with iter->real_pos

225f02f

This means bch2_btree_iter_traverse_one() can be made more efficient. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Kill btree_iter_peek_uptodate()

ff0c3d1

Since we're no longer doing next() immediately followed by peek(), this optimization isn't doing anything anymore. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Internal btree iterator renaming

85624e5

This just gives some internal helpers some better names. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Improve iter->real_pos handling

e7c28d3

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates()

04ec8e2

Ideally we'll be getting rid of peek_with_updates(), but the callers will need to be checked. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Update iter->real_pos lazily

2e33af1

peek() has to update iter->real_pos - there's no need for bch2_btree_iter_set_pos() to update it as well. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Include snapshot field in bch2_bpos_to_text

bf1bf0e

More prep work for snapshots. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Add an .invalid method for bch2_btree_ptr_v2

bde7a53

It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Improve inode deletion code

07d8dda

It had some silly redundancies. Signed-off-by: Kent Overstreet <[email protected]>

koverstreet and others added 6 commits June 15, 2021 13:58

bcachefs: Don't disable preemption unnecessarily

bf09fd4

Small improvements to some percpu utility code. Signed-off-by: Kent Overstreet <[email protected]>

fixup! bcachefs: More topology repair code

154c7c7

Signed-off-by: Kent Overstreet <[email protected]>

fixup! bcachefs: More topology repair code

4cc1334

bcachefs: ensure iter->should_be_locked is set

1a54de9

Ensure that iter->should_be_locked value is set to true before we call bch2_trans_update in ec_stripe_update_ptrs. Signed-off-by: Dan Robertson <[email protected]>

dlrobertson force-pushed the fix-generic-422 branch from 1778607 to 6ac3b1f Compare June 18, 2021 00:31

dlrobertson changed the base branch from master to testing June 18, 2021 00:32

dlrobertson force-pushed the fix-generic-422 branch from 6ac3b1f to b755ab3 Compare June 18, 2021 00:34

koverstreet added 7 commits June 21, 2021 17:42

bcachefs: Don't ratelimit certain fsck errors

39ad803

It's unhelpful if we see "Halting mark and sweep to start topology repair" but we don't see the error that triggered it. Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Start journal reclaim thread earlier

775b36b

Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <[email protected]>

fixup! bcachefs: More topology repair code

0df0b04

bcachefs: Don't loop into topology repair

1054926

Signed-off-by: Kent Overstreet <[email protected]>

bcachefs: Fix btree_node_read_all_replicas() error handling

49e4c3b

We weren't checking bch2_btree_node_read_done() for errors, oops. Signed-off-by: Kent Overstreet <[email protected]>

fixup! bcachefs: Improve iter->should_be_locked

ca3cfad

koverstreet force-pushed the testing branch from 67e5efa to cec8dcc Compare June 24, 2021 17:27

dlrobertson added 3 commits June 24, 2021 13:33

bcachefs: statfs bfree and bavail should be the same

c7dea9c

The value of f_bfree and f_bavail should be the same. The value of f_bfree is not currently scaled by the availability factor. Signed-off-by: Dan Robertson <[email protected]>

bcachefs: Fix bch2_acl_chmod() cleanup on error

21578ad

Avoid calling kfree on the returned error pointer if bch2_acl_from_disk fails. Signed-off-by: Dan Robertson <[email protected]>

dlrobertson force-pushed the fix-generic-422 branch from b755ab3 to 3fe6e97 Compare June 29, 2021 02:20

koverstreet force-pushed the testing branch from 21578ad to 54f9c5f Compare July 4, 2021 20:29

dlrobertson mentioned this pull request Jul 10, 2021

meta: get the quick group of xfstests (more or less) passing #285

Open

6 tasks

koverstreet force-pushed the testing branch from 54f9c5f to 15178a6 Compare July 15, 2021 20:25

koverstreet force-pushed the testing branch from 15178a6 to 215ae52 Compare January 11, 2022 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcachefs: fix initial page state after falloc #249

bcachefs: fix initial page state after falloc #249

dlrobertson commented Jun 9, 2021

dlrobertson commented Jun 18, 2021

dlrobertson commented Jun 29, 2021

dlrobertson commented Jul 10, 2021

bcachefs: fix initial page state after falloc #249

Are you sure you want to change the base?

bcachefs: fix initial page state after falloc #249

Conversation

dlrobertson commented Jun 9, 2021

dlrobertson commented Jun 18, 2021

dlrobertson commented Jun 29, 2021

dlrobertson commented Jul 10, 2021