Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Linux v6.6 drivers #283

Merged
merged 4,599 commits into from
Aug 11, 2024
Merged

Update to Linux v6.6 drivers #283

merged 4,599 commits into from
Aug 11, 2024

Conversation

wulf7
Copy link
Contributor

@wulf7 wulf7 commented Jan 29, 2024

Status:

Both Intel and AMD generally works. Tested on SkyLake, TigerLake, AlderLake, AMD780M, 7600M and GreenSardine

What is not done yet:

  • Intel DG2 GUC/HUC support. Depends on MEI driver (not ported yet) and PXP
  • Intel PXP. Depends on aggregate driver support
  • AMD NUMA
  • folio support

It is required to specify full path to linuxkpi_video.ko module as it conflicts with system one. This may be done e.g. with addition of following lines to rc.conf:
kld_list="/boot/modules/linuxkpi_video.ko /boot/modules/i915kms.ko"
for i915kms or
kld_list="/boot/modules/linuxkpi_video.ko /boot/modules/amdgpu.ko"
for amdgpu

@evadot
Copy link
Contributor

evadot commented Jun 27, 2024

I think that all the linuxkpi patches are in -CURRENT now ?
What's the status of suspend/resume, iirc @emaste said he had problems on a intel laptop with this branch (while it worked with 6.1) ?

@wulf7
Copy link
Contributor Author

wulf7 commented Jul 2, 2024

I think that all the linuxkpi patches are in -CURRENT now ?

I placed a bunch of remaining patches to phabricator just now. Except pci_iomap_range one which requires reworking to get proper meteorlake+ support.

What's the status of suspend/resume, iirc @emaste said he had problems on a intel laptop with this branch (while it worked with 6.1) ?

I don't know. Both of my S3-capable Intel laptops died recently. It is up to someone else now to bisect 6.6 branch.

@evadot
Copy link
Contributor

evadot commented Jul 4, 2024

I think that all the linuxkpi patches are in -CURRENT now ?

I placed a bunch of remaining patches to phabricator just now. Except pci_iomap_range one which requires reworking to get proper meteorlake+ support.

What's the status of suspend/resume, iirc @emaste said he had problems on a intel laptop with this branch (while it worked with 6.1) ?

I don't know. Both of my S3-capable Intel laptops died recently. It is up to someone else now to bisect 6.6 branch.

Ok thanks for the info, I'll do test/bisect once all the needed lkpi parts are commited then.

@rudrabhoj
Copy link

How much longer before we can expect these patches to be merged? What would be the goto way to try this right now?

@wulf7
Copy link
Contributor Author

wulf7 commented Jul 7, 2024

All patches are in phabricator now. I expect them to be committed to current in 2 weeks and than MFC-ed to 14-stable 1 week later

@wulf7 wulf7 changed the title WIP: Update to Linux v6.6 drivers Update to Linux v6.6 drivers Jul 21, 2024
@wulf7 wulf7 marked this pull request as ready for review July 21, 2024 13:26
@wulf7
Copy link
Contributor Author

wulf7 commented Jul 21, 2024

How much longer before we can expect these patches to be merged? What would be the goto way to try this right now?

All patches are merged to CURRENT now. You may test them. Don't forget to add full path to linuxkpi_video.ko module as it conflicts with system one. This may be done e.g. with addition of following lines to rc.conf:
kld_list="/boot/modules/linuxkpi_video.ko /boot/modules/i915kms.ko"
or
kld_list="/boot/modules/linuxkpi_video.ko /boot/modules/amdgpu.ko"

@JohnAZoidberg
Copy link

JohnAZoidberg commented Jul 22, 2024

I tried it on a AlderLake system and while the driver can load, I get an error when starting Xorg:

intel(0): intel_uxa_set_pixmap_bo: size of buffer object does not match constraints: size=14680064, must be greater than 13860864, but less than 4194304

See attached logfile
Xorg.0.log

My world and kernel is built from freebsd-src on main branch (dirty is just a patch to use a different ethernet driver):

FreeBSD marigold-dvt2 15.0-CURRENT FreeBSD 15.0-CURRENT #3 main-n271308-1cbd613f3343-dirty: Sun Jul 21 20:04:01 UTC 2024     root@marigold-dvt2:/usr/obj/home/zoid/freebsd-src/amd64.amd64/sys/GENERIC-NODEBUG amd64

@wulf7
Copy link
Contributor Author

wulf7 commented Jul 22, 2024

intel driver does not work on AlderLake. delete xf86-video-intel and use modesetting

@aokblast
Copy link

aokblast commented Jul 24, 2024

I have tested on Meteor Lake (thought it is unstable in v6.6) and it works fine when entering xorg. But there are problems happen at drmfb as the picture shows. (Not because of my bad photographing skill, what it shows is the real phenomena I saw.
photo_2024-07-22_15-49-46

@wulf7
Copy link
Contributor Author

wulf7 commented Jul 27, 2024

What is sysctl hw.intel_graphics_stolen_size and sysctl hw.intel_graphics_stolen_base result? Is it non-zero?

@aokblast
Copy link

All of the value are zeros.

@wulf7
Copy link
Contributor Author

wulf7 commented Jul 27, 2024

Probably import of stolen area detection code for MeteorLake will fix the problem. It is on base system side rather then on drm-kmod.

@daniloegea
Copy link

I tested it on my AMD Ryzen 9 7950X (#305) but unfortunately it's not quite working yet.

Using the Display Port, when I load the driver something apparently works. I don't get any video output after that but the system doesn't hang. I can enter commands and access it via SSH.

When I use the HDMI output it seems to be hanging.

@aokblast
Copy link

Hmm, I don't find the corresponding quirk in the original Linux kernel source. Maybe it is just unstable. I found intel says Meteor Lake Arc graphic is stable at 6.8.

@evadot
Copy link
Contributor

evadot commented Aug 2, 2024

Tested on my RX550, seems to work fine. Suspend-resume is working fine too.

I'll test on some intel devices next week I think.
@wulf7 can you upstream the linuxkpi_video changes to src so we don't have to include this module anymore in drm-kmod ? Thanks

@wulf7
Copy link
Contributor Author

wulf7 commented Aug 5, 2024

@wulf7 can you upstream the linuxkpi_video changes to src so we don't have to include this module anymore in drm-kmod

See https://reviews.freebsd.org/D46224

@evadot
Copy link
Contributor

evadot commented Aug 7, 2024

@wulf7 can you upstream the linuxkpi_video changes to src so we don't have to include this module anymore in drm-kmod

See https://reviews.freebsd.org/D46224

Thanks.
Since we have (or will have) everything needed in base for linuxkpi_video I'm removed the module from the master branch.
I've also tested on a few intel machine and didn't have any problems so once you've rebased for the linuxkpi_video conflict feel free to merge this into master and if there is any problems we can work from there instead of this PR.

Thanks again.

Ran Sun added 6 commits August 9, 2024 22:15
Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line
ERROR: Use C99 flexible arrays

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fix the following errors reported by checkpatch:

ERROR: open brace '{' following struct go on the same line
ERROR: space prohibited before open square bracket '['
ERROR: "foo * bar" should be "foo *bar"

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fix the following errors reported by checkpatch:

ERROR: space prohibited before open square bracket '['
ERROR: "foo * bar" should be "foo *bar"

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxW)

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fix the following errors reported by checkpatch:

ERROR: that open brace { should be on the previous line

Signed-off-by: Ran Sun <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
vsyrjala and others added 21 commits August 9, 2024 22:15
If we can't find a free fence register to handle a fault in the GMADR
range just return VM_FAULT_NOPAGE without populating the PTE so that
userspace will retry the access and trigger another fault. Eventually
we should find a free fence and the fault will get properly handled.

A further improvement idea might be to reserve a fence (or one per CPU?)
for the express purpose of handling faults without having to retry. But
that would require some additional work.

Looks like this may have gotten broken originally by
commit 39965b376601 ("drm/i915: don't trash the gtt when running out of fences")
as that changed the errno to -EDEADLK which wasn't handle by the gtt
fault code either. But later in commit 2feeb52859fc ("drm/i915/gt: Fix
-EDEADLK handling regression") I changed it again to -ENOBUFS as -EDEADLK
was now getting used for the ww mutex dance. So this fix only makes
sense after that last commit.

Cc: [email protected]
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9479
Fixes: 2feeb52859fc ("drm/i915/gt: Fix -EDEADLK handling regression")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Andi Shyti <[email protected]>
(cherry picked from commit 7f403caabe811b88ab0de3811ff3f4782c415761)
Signed-off-by: Rodrigo Vivi <[email protected]>
Looks like RADV is actually hitting this.

Signed-off-by: Christian König <[email protected]>
Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3")
Acked-by: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
As drm_dp_get_mst_branch_device_by_guid() is called from
drm_dp_get_mst_branch_device_by_guid(), mstb parameter has to be checked,
otherwise NULL dereference may occur in the call to
the memcpy() and cause following:

[12579.365869] BUG: kernel NULL pointer dereference, address: 0000000000000049
[12579.365878] #PF: supervisor read access in kernel mode
[12579.365880] #PF: error_code(0x0000) - not-present page
[12579.365882] PGD 0 P4D 0
[12579.365887] Oops: 0000 [freebsd#1] PREEMPT SMP NOPTI
...
[12579.365895] Workqueue: events_long drm_dp_mst_up_req_work
[12579.365899] RIP: 0010:memcmp+0xb/0x29
[12579.365921] Call Trace:
[12579.365927] get_mst_branch_device_by_guid_helper+0x22/0x64
[12579.365930] drm_dp_mst_up_req_work+0x137/0x416
[12579.365933] process_one_work+0x1d0/0x419
[12579.365935] worker_thread+0x11a/0x289
[12579.365938] kthread+0x13e/0x14f
[12579.365941] ? process_one_work+0x419/0x419
[12579.365943] ? kthread_blkcg+0x31/0x31
[12579.365946] ret_from_fork+0x1f/0x30

As get_mst_branch_device_by_guid_helper() is recursive, moving condition
to the first line allow to remove a similar one for step over of NULL elements
inside a loop.

Fixes: 5e93b8208d3c ("drm/dp/mst: move GUID storage from mgr, port to only mst branch")
Cc: <[email protected]> # 4.14+
Signed-off-by: Lukasz Majczak <[email protected]>
Reviewed-by: Radoslaw Biernacki <[email protected]>
Signed-off-by: Manasi Navare <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.

Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.

Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Remove a redundant call to amdgpu_ctx_priority_is_valid() from
amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is
called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where
we've called amdgpu_ctx_priority_is_valid() already first thing in the
function.

Cc: Alex Deucher <[email protected]>
Cc: Christian König <[email protected]>
Signed-off-by: Luben Tuikov <[email protected]>
Acked-by: Christian König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
When supporting OA for TGL, it was seen that the context valid bit in
the report ID was not defined, however revisiting the spec seems to have
this bit defined. The bit is used to determine if a context is valid on
a context switch and is essential to determine active and idle periods
for a context. Re-enable the context valid bit for gen12 platforms.

BSpec: 52196 (description of report_id)

v2: Include BSpec reference (Ashutosh)

Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 7eeaedf79989a8f131939782832e21e9218ed2a0)
Signed-off-by: Rodrigo Vivi <[email protected]>
The steering control and semaphore registers are inside an "always on"
power domain with respect to RC6.  However there are some issues if
higher-level platform sleep states are entering/exiting at the same time
these registers are accessed.  Grabbing GT forcewake and holding it over
the entire lock/steer/unlock cycle ensures that those sleep states have
been fully exited before we access these registers.

This is expected to become a formally documented/numbered workaround
soon.

Note that this patch alone isn't expected to have an immediately
noticeable impact on MCR (mis)behavior; an upcoming pcode firmware
update will also be necessary to provide the other half of this
workaround.

v2:
 - Move the forcewake inside the Xe_LPG-specific IP version check.  This
   should only be necessary on platforms that have a steering semaphore.

Fixes: 3100240bf846 ("drm/i915/mtl: Add hardware-level lock for steering")
Cc: Radhakrishna Sripada <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Radhakrishna Sripada <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 8fa1c7cd1fe9cdfc426a603e1f1eecd3f463c487)
Signed-off-by: Rodrigo Vivi <[email protected]>
When the driver unbinds, pmu is unregistered and i915->uabi_engines is
set to RB_ROOT. Due to this, when i915 PMU tries to stop the engine
events, it issues a warn_on because engine lookup fails.

All perf hooks are taking care of this using a pmu->closed flag that is
set when PMU unregisters. The stop event seems to have been left out.

Check for pmu->closed in pmu_event_stop as well.

Based on discussion here -
https://patchwork.freedesktop.org/patch/492079/?series=105790&rev=2

v2: s/is/if/ in commit title
v3: Add fixes tag and cc stable

Cc: <[email protected]> # v5.11+
Fixes: b00bccb3f0bb ("drm/i915/pmu: Handle PCI unbind")
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 31f6a06f0c543b43a38fab10f39e5fc45ad62aa2)
Signed-off-by: Rodrigo Vivi <[email protected]>
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.

Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.

Cc: [email protected] # 5.15+
Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <[email protected]>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
If *any* object of a certain WW mutex class is locked, lockdep will
consider *all* mutexes of that class as locked. Also the lock allocation
tracking code will apparently register only the address of the first
mutex of a given class locked in a sequence.
This has the odd consequence that if that first mutex is unlocked while
other mutexes of the same class remain locked and then its memory then
freed, the lock alloc tracking code will incorrectly assume that memory
is freed with a held lock in there.

For now, work around that for drm_exec by releasing the first grabbed
object lock last.

v2:
- Fix a typo (Danilo Krummrich)
- Reword the commit message a bit.
- Add a Fixes: tag

Related lock alloc tracking warning:
[  322.660067] =========================
[  322.660070] WARNING: held lock freed!
[  322.660074] 6.5.0-rc7+ freebsd#155 Tainted: G     U           N
[  322.660078] -------------------------
[  322.660081] kunit_try_catch/4981 is freeing memory ffff888112adc000-ffff888112adc3ff, with a lock still held there!
[  322.660089] ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec]
[  322.660104] 2 locks held by kunit_try_catch/4981:
[  322.660108]  #0: ffffc9000343fe18 (reservation_ww_class_acquire){+.+.}-{0:0}, at: test_early_put+0x22f/0x490 [drm_exec_test]
[  322.660123]  freebsd#1: ffff888112adc1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_exec_lock_obj+0x11a/0x600 [drm_exec]
[  322.660135]
               stack backtrace:
[  322.660139] CPU: 7 PID: 4981 Comm: kunit_try_catch Tainted: G     U           N 6.5.0-rc7+ freebsd#155
[  322.660146] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
[  322.660152] Call Trace:
[  322.660155]  <TASK>
[  322.660158]  dump_stack_lvl+0x57/0x90
[  322.660164]  debug_check_no_locks_freed+0x20b/0x2b0
[  322.660172]  slab_free_freelist_hook+0xa1/0x160
[  322.660179]  ? drm_exec_unlock_all+0x168/0x2a0 [drm_exec]
[  322.660186]  __kmem_cache_free+0xb2/0x290
[  322.660192]  drm_exec_unlock_all+0x168/0x2a0 [drm_exec]
[  322.660200]  drm_exec_fini+0xf/0x1c0 [drm_exec]
[  322.660206]  test_early_put+0x289/0x490 [drm_exec_test]
[  322.660215]  ? __pfx_test_early_put+0x10/0x10 [drm_exec_test]
[  322.660222]  ? __kasan_check_byte+0xf/0x40
[  322.660227]  ? __ksize+0x63/0x140
[  322.660233]  ? drmm_add_final_kfree+0x3e/0xa0 [drm]
[  322.660289]  ? _raw_spin_unlock_irqrestore+0x30/0x60
[  322.660294]  ? lockdep_hardirqs_on+0x7d/0x100
[  322.660301]  ? __pfx_kunit_try_run_case+0x10/0x10 [kunit]
[  322.660310]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit]
[  322.660319]  kunit_generic_run_threadfn_adapter+0x4a/0x90 [kunit]
[  322.660328]  kthread+0x2e7/0x3c0
[  322.660334]  ? __pfx_kthread+0x10/0x10
[  322.660339]  ret_from_fork+0x2d/0x70
[  322.660345]  ? __pfx_kthread+0x10/0x10
[  322.660349]  ret_from_fork_asm+0x1b/0x30
[  322.660358]  </TASK>
[  322.660818]     ok 8 test_early_put

Cc: Christian König <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: [email protected]
Fixes: 09593216bff1 ("drm: execution context for GEM buffers v7")
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Reviewed-by: Danilo Krummrich <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
They are stubbed in LinuxKPI now.

Sponsored by:	Serenity Cyber Security, LLC
The function has been removed since Linux kernel v5.4.

Sponsored by:	Serenity Cyber Security, LLC
With addition of following files:
- drivers/gpu/drm/amd/display/dc/dce/dce_clk_mgr.c
- drivers/gpu/drm/ttm/tests/*
- include/drm/drm_gem_dma_helper.h
- include/drm/drm_gpuva_mgr.h
- include/drm/drm_kunit_helpers.h
- include/uapi/drm/ivpu_accel.h

Sponsored by:	Serenity Cyber Security, LLC
Sponsored by:	Serenity Cyber Security, LLC
It is unneeded as sysfs link support is stubbed in LKPI now

Sponsored by:	Serenity Cyber Security, LLC
It is unneeded as sysfs link support routines are stubbed in LKPI now

Sponsored by:	Serenity Cyber Security, LLC
It is unneeded as sysfs link support routines are stubbed in LKPI now

Sponsored by:	Serenity Cyber Security, LLC
It is unneeded as sysfs link support routines are stubbed in LKPI now

Sponsored by:	Serenity Cyber Security, LLC
Sponsored by:   Serenity CyberSecurity, LLC
Fixes:  "drm/amdgpu: register a dirty framebuffer callback for fbcon"
@chrislongros
Copy link

I compiled the drm-61-kmod port under FreeBSD CURRENT but I did not set kld_list="/boot/modules/linuxkpi_video.ko /boot/modules/amdgpu.ko" ... I used # sysrc kld_list+=amdgpu instead and I got another kernel panic :(

image

@chrislongros
Copy link

Is there a way to fix it from this point or should I reinstall FreeBSD? What do you think is the problem?

@wulf7
Copy link
Contributor Author

wulf7 commented Aug 10, 2024

I bet you forget to install firmwares

@chrislongros
Copy link

chrislongros commented Aug 10, 2024

Yeah 😅 I thought the port provided them

It is a part of LKPI in the base system now.
It may be reenabled with adding DEVELOPER parameter to make.

This effectively reapplies 199e4b9 "linuxkpi_video: Remove module"

Sponsored by:	Serenity CyberSecurity, LLC
@wulf7 wulf7 merged commit ad81525 into freebsd:master Aug 11, 2024
1 check failed
@wulf7
Copy link
Contributor Author

wulf7 commented Aug 11, 2024

feel free to merge this into master and if there is any problems we can work from there instead of this PR.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.