Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relocation fixes #1427

Open
wants to merge 10,000 commits into
base: ci
Choose a base branch
from
Open

Relocation fixes #1427

wants to merge 10,000 commits into from

Conversation

josefbacik
Copy link

No description provided.

torvalds and others added 30 commits September 4, 2024 08:37
…rg/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes, 15 of which are cc:stable.

  Mostly MM, no identifiable theme.  And a few nilfs2 fixups"

* tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
  mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
  mailmap: update entry for Jan Kuliga
  codetag: debug: mark codetags for poisoned page as empty
  mm/memcontrol: respect zswap.writeback setting from parent cg too
  scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
  Revert "mm: skip CMA pages when they are not available"
  maple_tree: remove rcu_read_lock() from mt_validate()
  kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
  mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
  nilfs2: fix state management in error path of log writing function
  nilfs2: fix missing cleanup on rollforward recovery error
  nilfs2: protect references to superblock parameters exposed in sysfs
  userfaultfd: don't BUG_ON() if khugepaged yanks our page table
  userfaultfd: fix checks for huge PMDs
  mm: vmalloc: ensure vmap_block is initialised before adding to queue
  selftests: mm: fix build errors on armhf
…/kernel/git/deller/parisc-linux

Pull parisc architecture fix from Helge Deller:

 - Fix boot issue where boot memory is marked read-only too early

* tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Delay write-protection until mark_rodata_ro() call
Fixes this missed case:

xe 0000:00:02.0: [drm] Missing outer runtime PM protection
WARNING: CPU: 99 PID: 1455 at drivers/gpu/drm/xe/xe_pm.c:564 xe_pm_runtime_get_noresume+0x48/0x60 [xe]
Call Trace:
<TASK>
? show_regs+0x67/0x70
? __warn+0x94/0x1b0
? xe_pm_runtime_get_noresume+0x48/0x60 [xe]
? report_bug+0x1b7/0x1d0
? handle_bug+0x46/0x80
? exc_invalid_op+0x19/0x70
? asm_exc_invalid_op+0x1b/0x20
? xe_pm_runtime_get_noresume+0x48/0x60 [xe]
xe_device_declare_wedged+0x91/0x280 [xe]
gt_reset_worker+0xa2/0x250 [xe]

v2: Also move get and get the right Fixes tag (Himal, Brost)

Fixes: fb74b20 ("drm/xe: Introduce a simple wedged state")
Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit bc947d9)
Signed-off-by: Rodrigo Vivi <[email protected]>
…t/rmk/linux

Pull ARM fix from Russell King:

 - Fix a build issue with older binutils with LD dead code elimination
   disabled

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION
Ole reported that event->mmap_mutex is strictly insufficient to
serialize the AUX buffer, add a per RB mutex to fully serialize it.

Note that in the lock order comment the perf_event::mmap_mutex order
was already wrong, that is, it nesting under mmap_lock is not new with
this patch.

Fixes: 45bfb2e ("perf: Add AUX area to ring buffer for raw data streams")
Reported-by: Ole <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Suspend fbdev sooner, and disable user access before suspending to
prevent some races. I've noticed this when comparing xe suspend to
i915's.

Matches the following commits from i915:
24b412b ("drm/i915: Disable intel HPD poll after DRM poll init/enable")
1ef28d8 ("drm/i915: Suspend the framebuffer console earlier during system suspend")
bd738d8 ("drm/i915: Prevent modesets during driver init/shutdown")

Thanks to Imre for pointing me to those commits.

Driver shutdown is currently missing, but I have some idea how to
implement it next.

Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: Imre Deak <[email protected]>
Reviewed-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Maarten Lankhorst,,, <[email protected]>
(cherry picked from commit 492be2a)
Signed-off-by: Rodrigo Vivi <[email protected]>
Enable/Disable user access only during system suspend/resume.
This should not happen during runtime s/r

v2: rebased

Reviewed-by: Arun R Murthy <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Signed-off-by: Vinod Govindapillai <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a64e7e5)
Signed-off-by: Rodrigo Vivi <[email protected]>
Fix circular locking dependency on runtime suspend.

<4> [74.952215] ======================================================
<4> [74.952217] WARNING: possible circular locking dependency detected
<4> [74.952219] 6.10.0-rc7-xe #1 Not tainted
<4> [74.952221] ------------------------------------------------------
<4> [74.952223] kworker/7:1/82 is trying to acquire lock:
<4> [74.952226] ffff888120548488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_modeset_lock_all+0x40/0x1e0 [drm]
<4> [74.952260]
but task is already holding lock:
<4> [74.952262] ffffffffa0ae59c0 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}, at: xe_pm_runtime_suspend+0x2f/0x340 [xe]
<4> [74.952322]
which lock already depends on the new lock.

The commit 'b1d90a86 ("drm/xe: Use the encoder suspend helper also used
by the i915 driver")' didn't do anything wrong. It actually fixed a
critical bug, because the encoder_suspend was never getting actually
called because it was returning if (has_display(xe)) instead of
if (!has_display(xe)). However, this ended up introducing the encoder
suspend calls in the runtime routines as well, causing the circular
locking dependency.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2304
Fixes: b1d90a8 ("drm/xe: Use the encoder suspend helper also used by the i915 driver")
Cc: Imre Deak <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit 8da1944)
Signed-off-by: Rodrigo Vivi <[email protected]>
…kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "Two netfs fixes for this merge window:

   - Ensure that fscache_cookie_lru_time is deleted when the fscache
     module is removed to prevent UAF

   - Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()

     Before it used truncate_inode_pages_partial() which causes
     copy_file_range() to fail on cifs"

* tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF
  mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
Pull smb server fixes from Steve French:

 - Fix crash in session setup

 - Fix locking bug

 - Improve access bounds checking

* tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: Unlock on in ksmbd_tcp_set_interfaces()
  ksmbd: unset the binding mark of a reused connection
  smb: Annotate struct xattr_smb_acl with __counted_by()
…rnel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - followup fix for direct io and fsync under some conditions, reported
   by QEMU users

 - fix a potential leak when disabling quotas while some extent tracking
   work can still happen

 - in zoned mode handle unexpected change of zone write pointer in
   RAID1-like block groups, turn the zones to read-only

* tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix race between direct IO write and fsync when using same fd
  btrfs: zoned: handle broken write pointer on zones
  btrfs: qgroup: don't use extent changeset when not needed
If the length of the name string is 1 and the value of name[0] is NULL
byte, an OOB vulnerability occurs in btf_name_valid_section() and the
return value is true, so the invalid name passes the check.

To solve this, you need to check if the first position is NULL byte and
if the first character is printable.

Suggested-by: Eduard Zingerman <[email protected]>
Fixes: bd70a8f ("bpf: Allow all printable characters in BTF DATASEC names")
Signed-off-by: Jeongjun Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
…/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - hp-wmi-sensors: Check if WMI event data exists before accessing it

 - ltc2991: fix register bits defines

* tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (hp-wmi-sensors) Check if WMI event data exists
  hwmon: ltc2991: fix register bits defines
….org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A number of small fixes for the late cycle:

   - Two more build fixes on 32-bit archs

   - Fixed a segfault during perf test

   - Fixed spinlock/rwlock accounting bug in perf lock contention"

* tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf daemon: Fix the build on more 32-bit architectures
  perf python: include "util/sample.h"
  perf lock contention: Fix spinlock and rwlock accounting
  perf test pmu: Set uninitialized PMU alias to null
Add selftest for cases where btf_name_valid_section() does not properly
check for certain types of names.

Suggested-by: Eduard Zingerman <[email protected]>
Signed-off-by: Jeongjun Park <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
…id_section'

Jeongjun Park says:

====================
bpf: fix incorrect name check pass logic in btf_name_valid_section

This patch was written to fix an issue where btf_name_valid_section() would
not properly check names with certain conditions and would throw an OOB vuln.
And selftest was added to verify this patch.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Pull bcachefs fixes from Kent Overstreet:

 - Fix a typo in the rebalance accounting changes

 - BCH_SB_MEMBER_INVALID: small on disk format feature which will be
   needed for full erasure coding support; this is only the minimum so
   that 6.11 can handle future versions without barfing.

* tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs:
  bcachefs: BCH_SB_MEMBER_INVALID
  bcachefs: fix rebalance accounting
Bareudp devices update their stats concurrently.
Therefore they need proper atomic increments.

Fixes: 571912c ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Guillaume Nault <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/04b7b9d0b480158eb3ab4366ec80aa2ab7e41fcb.1725031794.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host.  [0]

The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.

When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.

So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.

Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.

[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000008
 PF: supervisor read access in kernel mode
 PF: error_code(0x0000) - not-present page
PGD 80000001032f4067 P4D 80000001032f4067 PUD 103240067 PMD 0
SMP PTI
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.216-204.855.amzn2.x86_64 #1
Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
RIP: 0010:fou_gro_receive (net/ipv4/fou.c:233) [fou]
Code: 41 5f c3 cc cc cc cc e8 e7 2e 69 f4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 49 89 f8 41 54 48 89 f7 48 89 d6 49 8b 80 88 02 00 00 <0f> b6 48 08 0f b7 42 4a 66 25 fd fd 80 cc 02 66 89 42 4a 0f b6 42
RSP: 0018:ffffa330c0003d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff93d9e3a6b900 RCX: 0000000000000010
RDX: ffff93d9e3a6b900 RSI: ffff93d9e3a6b900 RDI: ffff93dac2e24d08
RBP: ffff93d9e3a6b900 R08: ffff93dacbce6400 R09: 0000000000000002
R10: 0000000000000000 R11: ffffffffb5f369b0 R12: ffff93dacbce6400
R13: ffff93dac2e24d08 R14: 0000000000000000 R15: ffffffffb4edd1c0
FS:  0000000000000000(0000) GS:ffff93daee800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000102140001 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
 ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420)
 ? no_context (arch/x86/mm/fault.c:752)
 ? exc_page_fault (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 arch/x86/mm/fault.c:1435 arch/x86/mm/fault.c:1483)
 ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:571)
 ? fou_gro_receive (net/ipv4/fou.c:233) [fou]
 udp_gro_receive (include/linux/netdevice.h:2552 net/ipv4/udp_offload.c:559)
 udp4_gro_receive (net/ipv4/udp_offload.c:604)
 inet_gro_receive (net/ipv4/af_inet.c:1549 (discriminator 7))
 dev_gro_receive (net/core/dev.c:6035 (discriminator 4))
 napi_gro_receive (net/core/dev.c:6170)
 ena_clean_rx_irq (drivers/amazon/net/ena/ena_netdev.c:1558) [ena]
 ena_io_poll (drivers/amazon/net/ena/ena_netdev.c:1742) [ena]
 napi_poll (net/core/dev.c:6847)
 net_rx_action (net/core/dev.c:6917)
 __do_softirq (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200 include/trace/events/irq.h:142 kernel/softirq.c:299)
 asm_call_irq_on_stack (arch/x86/entry/entry_64.S:809)
</IRQ>
 do_softirq_own_stack (arch/x86/include/asm/irq_stack.h:27 arch/x86/include/asm/irq_stack.h:77 arch/x86/kernel/irq_64.c:77)
 irq_exit_rcu (kernel/softirq.c:393 kernel/softirq.c:423 kernel/softirq.c:435)
 common_interrupt (arch/x86/kernel/irq.c:239)
 asm_common_interrupt (arch/x86/include/asm/idtentry.h:626)
RIP: 0010:acpi_idle_do_entry (arch/x86/include/asm/irqflags.h:49 arch/x86/include/asm/irqflags.h:89 drivers/acpi/processor_idle.c:114 drivers/acpi/processor_idle.c:575)
Code: 8b 15 d1 3c c4 02 ed c3 cc cc cc cc 65 48 8b 04 25 40 ef 01 00 48 8b 00 a8 08 75 eb 0f 1f 44 00 00 0f 00 2d d5 09 55 00 fb f4 <fa> c3 cc cc cc cc e9 be fc ff ff 66 66 2e 0f 1f 84 00 00 00 00 00
RSP: 0018:ffffffffb5603e58 EFLAGS: 00000246
RAX: 0000000000004000 RBX: ffff93dac0929c00 RCX: ffff93daee833900
RDX: ffff93daee800000 RSI: ffff93daee87dc00 RDI: ffff93daee87dc64
RBP: 0000000000000001 R08: ffffffffb5e7b6c0 R09: 0000000000000044
R10: ffff93daee831b04 R11: 00000000000001cd R12: 0000000000000001
R13: ffffffffb5e7b740 R14: 0000000000000001 R15: 0000000000000000
 ? sched_clock_cpu (kernel/sched/clock.c:371)
 acpi_idle_enter (drivers/acpi/processor_idle.c:712 (discriminator 3))
 cpuidle_enter_state (drivers/cpuidle/cpuidle.c:237)
 cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
 cpuidle_idle_call (kernel/sched/idle.c:158 kernel/sched/idle.c:239)
 do_idle (kernel/sched/idle.c:302)
 cpu_startup_entry (kernel/sched/idle.c:395 (discriminator 1))
 start_kernel (init/main.c:1048)
 secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:310)
Modules linked in: udp_diag tcp_diag inet_diag nft_nat ipip tunnel4 dummy fou ip_tunnel nft_masq nft_chain_nat nf_nat wireguard nft_ct curve25519_x86_64 libcurve25519_generic nf_conntrack libchacha20poly1305 nf_defrag_ipv6 nf_defrag_ipv4 nft_objref chacha_x86_64 nft_counter nf_tables nfnetlink poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper mousedev psmouse button ena ptp pps_core crc32c_intel
CR2: 0000000000000008

Fixes: d92283e ("fou: change to use UDP socket GRO")
Reported-by: Alphonse Kurian <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
generic_ocp_write() asks the parameter "size" must be 4 bytes align.
Therefore, write the bp would fail, if the mac->bp_num is odd. Align the
size to 4 for fixing it. The way may write an extra bp, but the
rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose
index is more than mac->bp_num. That is, there is no influence for the
firmware.

Besides, I check the return value of generic_ocp_write() to make sure
everything is correct.

Fixes: e5c266a ("r8152: set bp in bulk")
Signed-off-by: Hayes Wang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().

If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.

If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.

Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.

Fixes: 710ae72 ("net: bridge: Mark FDB entries that were added by user as such")
Signed-off-by: Jonas Gorski <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
axienet_dma_err_handler can race with axienet_stop in the following
manner:

CPU 1                       CPU 2
======================      ==================
axienet_stop()
    napi_disable()
    axienet_dma_stop()
                            axienet_dma_err_handler()
                                napi_disable()
                                axienet_dma_stop()
                                axienet_dma_start()
                                napi_enable()
    cancel_work_sync()
    free_irq()

Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.

Signed-off-by: Sean Anderson <[email protected]>
Fixes: 8a3b7a2 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.11

Hopefully final fixes for v6.11 and this time only fixes to ath11k
driver. We need to revert hibernation support due to reported
regressions and we have a fix for kernel crash introduced in
v6.11-rc1.

* tag 'wireless-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  MAINTAINERS: wifi: cw1200: add net-cw1200.h
  Revert "wifi: ath11k: support hibernation"
  Revert "wifi: ath11k: restore country code during resume"
  wifi: ath11k: fix NULL pointer dereference in ath11k_mac_get_eirp_power()
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
…t/tnguy/net-queue

Tony Nguyen says:

====================
ice: fix synchronization between .ndo_bpf() and reset

Larysa Zaremba says:

PF reset can be triggered asynchronously, by tx_timeout or by a user. With some
unfortunate timings both ice_vsi_rebuild() and .ndo_bpf will try to access and
modify XDP rings at the same time, causing system crash.

The first patch factors out rtnl-locked code from VSI rebuild code to avoid
deadlock. The following changes lock rebuild and .ndo_bpf() critical sections
with an internal mutex as well and provide complementary fixes.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: do not bring the VSI up, if it was down before the XDP setup
  ice: remove ICE_CFG_BUSY locking from AF_XDP code
  ice: check ICE_VSI_DOWN under rtnl_lock when preparing for reset
  ice: check for XDP rings instead of bpf program when unconfiguring
  ice: protect XDP configuration with a mutex
  ice: move netif_queue_set_napi to rtnl-protected sections
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
We were allowing any users to create a high priority group without any
permission checks. As a result, this was allowing possible denial of
service.

We now only allow the DRM master or users with the CAP_SYS_NICE
capability to set higher priorities than PANTHOR_GROUP_PRIORITY_MEDIUM.

As the sole user of that uAPI lives in Mesa and hardcode a value of
MEDIUM [1], this should be safe to do.

Additionally, as those checks are performed at the ioctl level,
panthor_group_create now only check for priority level validity.

[1]https://gitlab.freedesktop.org/mesa/mesa/-/blob/f390835074bdf162a63deb0311d1a6de527f9f89/src/gallium/drivers/panfrost/pan_csf.c#L1038

Signed-off-by: Mary Guillemard <[email protected]>
Fixes: de85488 ("drm/panthor: Add the scheduler logical block")
Cc: [email protected]
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The WL-355608-A8 is a 3.5" 640x480@60Hz RGB LCD display from an unknown
OEM used in a number of handheld gaming devices made by Anbernic.
Previously committed using the OEM serial without a vendor prefix,
however following subsequent discussion the preference is to use the
integrating device vendor and name where the OEM is unknown.

There are 4 RG35XX series devices from Anbernic based on an Allwinner
H700 SoC using this panel, with the -Plus variant introduced first.
Therefore the -Plus is used as the fallback for the subsequent -H,
-2024, and -SP devices.

Alter the filename and compatible string to reflect the convention.

Fixes: 45b888a ("dt-bindings: display: panel: Add WL-355608-A8 panel")
Signed-off-by: Ryan Walklin <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
As per the previous dt-binding commit, update the WL-355608-A8 panel
compatible to reflect the the integrating device vendor and name as the
panel OEM is unknown.

Fixes: 62ea2ee ("drm: panel: nv3052c: Add WL-355608-A8 panel")
Signed-off-by: Ryan Walklin <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
In the off-chance that waiting for the firmware to signal its booted status
timed out in the fast reset path, one must flush the cache lines for the
entire FW VM address space before reloading the regions, otherwise stale
values eventually lead to a scheduler job timeout.

Fixes: 647810e ("drm/panthor: Add the MMU/VM logical block")
Cc: [email protected]
Signed-off-by: Adrián Larumbe <[email protected]>
Acked-by: Liviu Dudau <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Document what was discussed multiple times on list and various
virtual / in-person conversations. guard() being okay in functions
<= 20 LoC is a bit of my own invention. If the function is trivial
it should be fine, but feel free to disagree :)

We'll obviously revisit this guidance as time passes and we and other
subsystems get more experience.

Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
Deferred I/O requires struct page for framebuffer memory, which is
not guaranteed for all DMA ranges. We thus only install deferred I/O
if we have a framebuffer that requires it.

A reported bug affected the ipu-v3 and pl111 drivers, which have video
memory in either Normal or HighMem zones

[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000010000000-0x000000003fffffff]
[    0.000000]   HighMem  [mem 0x0000000040000000-0x000000004fffffff]

where deferred I/O only works correctly with HighMem. See the Closes
tags for bug reports.

v2:
- test if screen_buffer supports deferred I/O (Sima)

Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: 808a40b ("drm/fbdev-dma: Implement damage handling and deferred I/O")
Reported-by: Alexander Stein <[email protected]>
Closes: https://lore.kernel.org/all/23636953.6Emhk5qWAg@steina-w/
Reported-by: Linus Walleij <[email protected]>
Closes: https://lore.kernel.org/dri-devel/CACRpkdb+hb9AGavbWpY-=uQQ0apY9en_tWJioPKf_fAbXMP4Hg@mail.gmail.com/
Tested-by: Alexander Stein <[email protected]>
Tested-by: Linus Walleij <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Javier Martinez Canillas <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Reviewed-by: Simona Vetter <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
adam900710 and others added 30 commits October 3, 2024 19:53
[PROBLEM]
It is very common for udev to trigger device scan, and every time a
mounted btrfs device got re-scan from different soft links, we will get
some of unnecessary device path updates, this is especially common
for LVM based storage:

 # lvs
  scratch1 test -wi-ao---- 10.00g
  scratch2 test -wi-a----- 10.00g
  scratch3 test -wi-a----- 10.00g
  scratch4 test -wi-a----- 10.00g
  scratch5 test -wi-a----- 10.00g
  test     test -wi-a----- 10.00g

 # mkfs.btrfs -f /dev/test/scratch1
 # mount /dev/test/scratch1 /mnt/btrfs
 # dmesg -c
 [  205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154)
 [  205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9
 [  205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm
 [  205.713856] BTRFS info (device dm-4): using free-space-tree
 [  205.722324] BTRFS info (device dm-4): checking UUID tree

So far so good, but even if we just touched any soft link of
"dm-4", we will get quite some unnecessary device path updates.

 # touch /dev/mapper/test-scratch1
 # dmesg -c
 [  469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221)
 [  469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)

Such device path rename is unnecessary and can lead to random path
change due to the udev race.

[CAUSE]
Inside device_list_add(), we are using a very primitive way checking if
the device has changed, strcmp().

Which can never handle links well, no matter if it's hard or soft links.

So every different link of the same device will be treated as a different
device, causing the unnecessary device path update.

[FIX]
Introduce a helper, is_same_device(), and use path_equal() to properly
detect the same block device.
So that the different soft links won't trigger the rename race.

Reviewed-by: Filipe Manana <[email protected]>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
Reported-by: Fabian Vogt <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
[PROBLEM]
Currently btrfs accepts any file path for its device, resulting some
weird situation:

 # ./mount_by_fd /dev/test/scratch1  /mnt/btrfs/

The program has the following source code:

 #include <fcntl.h>
 #include <stdio.h>
 #include <sys/mount.h>

 int main(int argc, char *argv[]) {
	int fd = open(argv[1], O_RDWR);
	char path[256];
	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
	return mount(path, argv[2], "btrfs", 0, NULL);
 }

Then we can have the following weird device path:

 BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)

Normally it's not a big deal, and later udev can trigger a device path
rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
will show up in mtab.

[CAUSE]
For filename "/proc/self/fd/3", it means the opened file descriptor 3.
In above case, it's exactly the device we want to open, aka points to
"/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2".

Inside kernel we solve the mount source using LOOKUP_FOLLOW, which
follows the symbolic link and grab the proper block device.

But inside btrfs we also save the filename into btrfs_device::name, and
utilize that member to report our mount source, which leads to the above
situation.

[FIX]
Instead of unconditionally trust the path, check if the original file
(not following the symbolic link) is inside "/dev/", if not, then
manually lookup the path to its final destination, and use that as our
device path.

This allows us to still use symbolic links, like
"/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
with LVM2 setup.

And for really weird names, like the above case, we solve it to
"/dev/dm-2" instead.

Reviewed-by: Filipe Manana <[email protected]>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
Reported-by: Fabian Vogt <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Remove the duplicated transaction joining, block reserve setting and raid
extent inserting in btrfs_finish_ordered_extent().

While at it, also abort the transaction in case inserting a RAID
stripe-tree entry fails.

Suggested-by: Naohiro Aota <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…s enabled

When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.

Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.

Fix this by checking for errors and return them to the callers.

Fixes: 3cce39a ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
We are using the logical address ("bytenr") of an extent as the key for
qgroup records in the dirty extents xarray. This is a problem because the
xarrays use "unsigned long" for keys/indices, meaning that on a 32 bits
platform any extent starting at or beyond 4G is truncated, which is a too
low limitation as virtually everyone is using storage with more than 4G of
space. This means a "bytenr" of 4G gets truncated to 0, and so does 8G and
16G for example, resulting in incorrect qgroup accounting.

Fix this by using sector numbers as keys instead, that is, using keys that
match the logical address right shifted by fs_info->sectorsize_bits, which
is what we do for the fs_info->buffer_radix that tracks extent buffers
(radix trees also use an "unsigned long" type for keys). This also makes
the index space more dense which helps optimize the xarray (as mentioned
at Documentation/core-api/xarray.rst).

Fixes: 3cce39a ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
While running checkpatch against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:

  $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
  WARNING: Possible comma where semicolon could be used
  torvalds#215: FILE: include/trace/events/btrfs.h:1720:
  +		__entry->bytenr		= bytenr,
		__entry->num_bytes	= rec->num_bytes;

  total: 0 errors, 1 warnings, 184 lines checked

So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…ecord

Now that we track qgroup extent records in a xarray we don't need to have
a "bytenr" field in  struct btrfs_qgroup_extent_record, since we can get
it from the index of the record in the xarray.

So remove the field and grab the bytenr from either the index key or any
other place where it's available (delayed refs). This reduces the size of
struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
that we now can store 128 instances of this structure instead of 102 per
4K page.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…_post()

Instead of extracting fs_info from the transaction multiples times, store
it in a local variable and use it.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…extent()

There's no need to hold the delayed refs spinlock when calling
btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
it doesn't change anything in delayed refs and it only changes the xarray
used to track qgroup extent records, which is protected by the xarray's
lock.

Holding the lock is only adding unnecessary lock contention with other
tasks that actually need to take the lock to add/remove/change delayed
references. So remove the locking.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…xtent()

Instead of dereferecing the delayed refs from the transaction multiple
times, store it early in the local variable and then always use the
variable.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The qgroup record was allocated with kzalloc(), so it's pointless to set
its old_roots member to NULL. Remove the assignment.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…ecreased

During an incremental send we may end up sending an invalid clone
operation, for the last extent of a file which ends at an unaligned offset
that matches the final i_size of the file in the send snapshot, in case
the file had its initial size (the size in the parent snapshot) decreased
in the send snapshot. In this case the destination will fail to apply the
clone operation because its end offset is not sector size aligned and it
ends before the current size of the file.

Sending the truncate operation always happens when we finish processing an
inode, after we process all its extents (and xattrs, names, etc). So fix
this by ensuring the file has a valid size before we send a clone
operation for an unaligned extent that ends at the final i_size of the
file. The size we truncate to matches the start offset of the clone range
but it could be any value between that start offset and the final size of
the file since the clone operation will expand the i_size if the current
size is smaller than the end offset. The start offset of the range was
chosen because it's always sector size aligned and avoids a truncation
into the middle of a page, which results in dirtying the page due to
filling part of it with zeroes and then making the clone operation at the
receiver trigger IO.

The following test reproduces the issue:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Create a file with a size of 256K + 5 bytes, having two extents, one
  # with a size of 128K and another one with a size of 128K + 5 bytes.
  last_ext_size=$((128 * 1024 + 5))
  xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
         -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
         $MNT/foo

  # Another file which we will later clone foo into, but initially with
  # a larger size than foo.
  xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap1

  # Now resize bar and clone foo into it.
  xfs_io -c "truncate 0" \
         -c "reflink $MNT/foo" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap2

  rm -f /tmp/send-full /tmp/send-inc
  btrfs send -f /tmp/send-full $MNT/snap1
  btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive -f /tmp/send-full $MNT
  btrfs receive -f /tmp/send-inc $MNT

  umount $MNT

Running it before this patch:

  $ ./test.sh
  (...)
  At subvol snap1
  At snapshot snap2
  ERROR: failed to clone extents to bar: Invalid argument

A test case for fstests will be sent soon.

Reported-by: Ben Millwood <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
Fixes: 46a6e10 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…acntion

[BUG]
Syzbot reported a NULL pointer dereference with the following crash:

FAULT_INJECTION: forcing a failure.
 start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676
 prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642
 relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678
...
BTRFS info (device loop0): balance: ended with status: -12
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667]
RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926
Call Trace:
 <TASK>
 commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496
 btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430
 del_balance_item fs/btrfs/volumes.c:3678 [inline]
 reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742
 btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574
 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
---[ end trace 0000000000000000 ]---

[CAUSE]
The allocation failure happens at the start_transaction() inside
prepare_to_relocate(), and during the error handling we call
unset_reloc_control(), which makes fs_info->balance_ctl to be NULL.

Then we continue the error path cleanup in btrfs_balance() by calling
reset_balance_state() which will call del_balance_item() to fully delete
the balance item in the root tree.

However during the small window between set_reloc_contrl() and
unset_reloc_control(), we can have a subvolume tree update and created a
reloc_root for that subvolume.

Then we go into the final btrfs_commit_transaction() of
del_balance_item(), and into btrfs_update_reloc_root() inside
commit_fs_roots().

That function checks if fs_info->reloc_ctl is in the merge_reloc_tree
stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer
dereference.

[FIX]
Just add extra check on fs_info->reloc_ctl inside
btrfs_update_reloc_root(), before checking
fs_info->reloc_ctl->merge_reloc_tree.

That DEAD_RELOC_TREE handling is to prevent further modification to the
reloc tree during merge stage, but since there is no reloc_ctl at all,
we do not need to bother that.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The variable stop_loop was originally introduced in commit 625f1c8
("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in
commit 3b080b2 ("Btrfs: scrub raid56 stripes in the right way").
However, in a later commit 18d30ab ("btrfs: scrub: use
scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code
that modified stop_loop was removed, making the variable redundant.

Currently, stop_loop is only initialized with 0 and is never used or
modified within the scrub_stripe() function. As a result, this patch
removes the stop_loop variable to clean up the code and eliminate
unnecessary redundancy.

This change has no impact on functionality, as stop_loop was never
utilized in any meaningful way in the final version of the code.

Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Riyan Dhiman <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…umount

During unmount, at close_ctree(), we have the following steps in this order:

1) Park the cleaner kthread - this doesn't destroy the kthread, it basically
   halts its execution (wake ups against it work but do nothing);

2) We stop the cleaner kthread - this results in freeing the respective
   struct task_struct;

3) We call btrfs_stop_all_workers() which waits for any jobs running in all
   the work queues and then free the work queues.

Syzbot reported a case where a fixup worker resulted in a crash when doing
a delayed iput on its inode while attempting to wake up the cleaner at
btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread
was already freed. This can happen during unmount because we don't wait
for any fixup workers still running before we call kthread_stop() against
the cleaner kthread, which stops and free all its resources.

Fix this by waiting for any fixup workers at close_ctree() before we call
kthread_stop() against the cleaner and run pending delayed iputs.

The stack traces reported by syzbot were the following:

   BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
   Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52

   CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0
   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
   Workqueue: btrfs-fixup btrfs_work_helper
   Call Trace:
    <TASK>
    __dump_stack lib/dump_stack.c:94 [inline]
    dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
    print_address_description mm/kasan/report.c:377 [inline]
    print_report+0x169/0x550 mm/kasan/report.c:488
    kasan_report+0x143/0x180 mm/kasan/report.c:601
    __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
    lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
    class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
    try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154
    btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842
    btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314
    process_one_work kernel/workqueue.c:3229 [inline]
    process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
    worker_thread+0x870/0xd30 kernel/workqueue.c:3391
    kthread+0x2f0/0x390 kernel/kthread.c:389
    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
    </TASK>

   Allocated by task 2:
    kasan_save_stack mm/kasan/common.c:47 [inline]
    kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
    unpoison_slab_object mm/kasan/common.c:319 [inline]
    __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345
    kasan_slab_alloc include/linux/kasan.h:247 [inline]
    slab_post_alloc_hook mm/slub.c:4086 [inline]
    slab_alloc_node mm/slub.c:4135 [inline]
    kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187
    alloc_task_struct_node kernel/fork.c:180 [inline]
    dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
    copy_process+0x5d1/0x3d50 kernel/fork.c:2206
    kernel_clone+0x223/0x880 kernel/fork.c:2787
    kernel_thread+0x1bc/0x240 kernel/fork.c:2849
    create_kthread kernel/kthread.c:412 [inline]
    kthreadd+0x60d/0x810 kernel/kthread.c:765
    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

   Freed by task 61:
    kasan_save_stack mm/kasan/common.c:47 [inline]
    kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
    kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
    poison_slab_object mm/kasan/common.c:247 [inline]
    __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
    kasan_slab_free include/linux/kasan.h:230 [inline]
    slab_free_hook mm/slub.c:2343 [inline]
    slab_free mm/slub.c:4580 [inline]
    kmem_cache_free+0x1a2/0x420 mm/slub.c:4682
    put_task_struct include/linux/sched/task.h:144 [inline]
    delayed_put_task_struct+0x125/0x300 kernel/exit.c:228
    rcu_do_batch kernel/rcu/tree.c:2567 [inline]
    rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
    handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
    __do_softirq kernel/softirq.c:588 [inline]
    invoke_softirq kernel/softirq.c:428 [inline]
    __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
    irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
    instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline]
    sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037
    asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702

   Last potentially related work creation:
    kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
    __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
    __call_rcu_common kernel/rcu/tree.c:3086 [inline]
    call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190
    context_switch kernel/sched/core.c:5318 [inline]
    __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675
    schedule_idle+0x56/0x90 kernel/sched/core.c:6793
    do_idle+0x56a/0x5d0 kernel/sched/idle.c:354
    cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424
    start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314
    common_startup_64+0x13e/0x147

   The buggy address belongs to the object at ffff8880272a8000
    which belongs to the cache task_struct of size 7424
   The buggy address is located 2584 bytes inside of
    freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00)

   The buggy address belongs to the physical page:
   page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8
   head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
   flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
   page_type: f5(slab)
   raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
   raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
   head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
   head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
   head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000
   head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
   page dumped because: kasan: bad access detected
   page_owner tracks the page as allocated
   page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153
    set_page_owner include/linux/page_owner.h:32 [inline]
    post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
    prep_new_page mm/page_alloc.c:1545 [inline]
    get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457
    __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733
    alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
    alloc_slab_page+0x6a/0x120 mm/slub.c:2413
    allocate_slab+0x5a/0x2f0 mm/slub.c:2579
    new_slab mm/slub.c:2632 [inline]
    ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819
    __slab_alloc+0x58/0xa0 mm/slub.c:3909
    __slab_alloc_node mm/slub.c:3962 [inline]
    slab_alloc_node mm/slub.c:4123 [inline]
    kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187
    alloc_task_struct_node kernel/fork.c:180 [inline]
    dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
    copy_process+0x5d1/0x3d50 kernel/fork.c:2206
    kernel_clone+0x223/0x880 kernel/fork.c:2787
    kernel_thread+0x1bc/0x240 kernel/fork.c:2849
    create_kthread kernel/kthread.c:412 [inline]
    kthreadd+0x60d/0x810 kernel/kthread.c:765
    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
   page last free pid 5230 tgid 5230 stack trace:
    reset_page_owner include/linux/page_owner.h:25 [inline]
    free_pages_prepare mm/page_alloc.c:1108 [inline]
    free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638
    discard_slab mm/slub.c:2678 [inline]
    __put_partials+0xeb/0x130 mm/slub.c:3146
    put_cpu_partial+0x17c/0x250 mm/slub.c:3221
    __slab_free+0x2ea/0x3d0 mm/slub.c:4450
    qlink_free mm/kasan/quarantine.c:163 [inline]
    qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
    kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
    kasan_slab_alloc include/linux/kasan.h:247 [inline]
    slab_post_alloc_hook mm/slub.c:4086 [inline]
    slab_alloc_node mm/slub.c:4135 [inline]
    kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142
    getname_flags+0xb7/0x540 fs/namei.c:139
    do_sys_openat2+0xd2/0x1d0 fs/open.c:1409
    do_sys_open fs/open.c:1430 [inline]
    __do_sys_openat fs/open.c:1446 [inline]
    __se_sys_openat fs/open.c:1441 [inline]
    __x64_sys_openat+0x247/0x2a0 fs/open.c:1441
    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
    entry_SYSCALL_64_after_hwframe+0x77/0x7f

   Memory state around the buggy address:
    ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
    ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ==================================================================

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
This macro is no longer used after the "btrfs: Cleaned up folio->page
conversion" series patch [1] was applied, so remove it.

[1]: https://patchwork.kernel.org/project/linux-btrfs/cover/[email protected]/

Reviewed-by: Neal Gompa <[email protected]>
Signed-off-by: Youling Tang <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Fix some confusing spelling errors that were currently identified,
the details are as follows:

	block-group.c: 2800: 	uncompressible 	==> incompressible
	extent-tree.c: 3131:	EXTEMT		==> EXTENT
	extent_io.c: 3124: 	utlizing 	==> utilizing
	extent_map.c: 1323: 	ealier		==> earlier
	extent_map.c: 1325:	possiblity	==> possibility
	fiemap.c: 189:		emmitted	==> emitted
	fiemap.c: 197:		emmitted	==> emitted
	fiemap.c: 203:		emmitted	==> emitted
	transaction.h: 36:	trasaction	==> transaction
	volumes.c: 5312:	filesysmte	==> filesystem
	zoned.c: 1977:		trasnsaction	==> transaction

Signed-off-by: Shen Lichuan <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Disable ratelimiting for btrfs_printk when CONFIG_BTRFS_DEBUG is
enabled. This allows for more verbose output which is often needed by
functions like btrfs_dump_space_info().

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Leo Martins <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Add first stash of very basic self tests for the RAID stripe-tree.

More test cases will follow exercising the tree.

Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
…one info

At btrfs_load_zone_info() we have an error path that is dereferecing the
name of a device which is a RCU string but we are not holding a RCU read
lock, which is incorrect.

Fix this by using btrfs_err_in_rcu() instead of btrfs_err().

The problem is there since commit 08e11a3 ("btrfs: zoned: load zone's
allocation offset"), back then at btrfs_load_block_group_zone_info() but
then later on that code was factored out into the helper
btrfs_load_zone_info() by commit 09a4672 ("btrfs: zoned: factor out
per-zone logic from btrfs_load_block_group_zone_info").

Fixes: 08e11a3 ("btrfs: zoned: load zone's allocation offset")
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Naohiro Aota <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
This BUG_ON is meant to catch backref cache problems, but these can
arise from either bugs in the backref cache or corruption in the extent
tree.  Fix it to be a proper error and change it to an ASSERT() so that
developers notice problems.

Signed-off-by: Josef Bacik <[email protected]>
Now that we're not updating the backref cache when we switch transids we
can remove the changed list.

We're going to keep the new_bytenr field because it serves as a good
sanity check for the backref cache and relocation, and can prevent us
from making extent tree corruption worse.

Signed-off-by: Josef Bacik <[email protected]>
Add a comment for this field so we know what it is used for.  Previously
we used it to update the backref cache, so people may mistakenly think
it is useless, but in fact exists to make sure the backref cache makes
sense.

Signed-off-by: Josef Bacik <[email protected]>
We have this setup as a loop, but in reality we will never walk back up
the backref tree, if we do then it's a bug.  Get rid of the loop and
handle the case where we have node->new_bytenr set at all.  Previous the
check was only if node->new_bytenr != root->node->start, but if it did
then we would hit the WARN_ON() and walk back up the tree.

Instead we want to just freak out if ->new_bytenr is set, and then do
the normal updating of the node for the reloc root and carry on.

Signed-off-by: Josef Bacik <[email protected]>
Since we no longer maintain backref cache across transactions, and this
is only called when we're creating the reloc root for a newly created
snapshot in the transaction critical section, we will end up doing a
bunch of work that will just get thrown away when we start the
transaction in the relocation loop.  Delete this code as it no longer
does anything for us.

Signed-off-by: Josef Bacik <[email protected]>
We already determine the owner for any blocks we find when we're
relocating, and for cowonly blocks (and the data reloc tree) we cow down
to the block and call it good enough.  However we still build a whole
backref tree for them, even though we're not going to use it, and then
just don't put these blocks in the cache.

Rework the code to check if the block belongs to a cowonly root or the
data reloc root, and then just cow down to the block, skipping the
backref cache generation.

Signed-off-by: Josef Bacik <[email protected]>
Now that we handle relocation for non-shareable roots without using the
backref cache, remove the ->cowonly field from the backref nodes and
update the handling to throw an ASSERT()/error.

Signed-off-by: Josef Bacik <[email protected]>
We rely on finding all our nodes on the various lists in the backref
cache, when they are all also in the rbtree.  Instead just search
through the rbtree and free everything.

Signed-off-by: Josef Bacik <[email protected]>
Before we were keeping all of our nodes on various lists in order to
make sure everything got cleaned up correctly.  We used node->lowest to
indicate that node->lower was linked into the cache->leaves list.  Now
that we do cleanup based on the rb tree both the list and the flag are
useless, so delete them both.

Signed-off-by: Josef Bacik <[email protected]>
We don't ever look at this list, remove it.

Signed-off-by: Josef Bacik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.