-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lolipop dev #1
Open
quesoo
wants to merge
2,213
commits into
lolipop
Choose a base branch
from
lolipop-dev
base: lolipop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Lolipop dev #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit c8f6d8351ba8c89d5cd4c562552ec7ec29274e31 upstream. Like on UL30VT, the ACPI video driver can't control backlight correctly on Asus UL30A. Vendor driver (asus-laptop) can work. This patch is to add "Asus UL30A" to ACPI video detect blacklist in order to use asus-laptop for video control on the "Asus UL30A" rather than ACPI video driver. Signed-off-by: Bastian Triller <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4ef366c583d6180b1c951147869ee5a3038834f2 upstream. On HP 1000 lapops, BIOS reports minimum backlight on boot and causes backlight to dim completely. This ignores the initial backlight values and set to max brightness. References:: https://bugs.launchpad.net/bugs/1167760 Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit e6391a182865efc896cb2a8d79e07b7ac2f45b48 upstream. The element of `das08_boards[]` for the 'das08jr-16-ao' board has the `ai_encoding` member set to `das08_encode12`. It should be set to `das08_encode16` same as the 'das08jr/16' board. After all, this board has 16-bit AI resolution. The description of the A/D LSB register at offset 0 seems incorrect in the user manual "cio-das08jr-16-ao.pdf" as it implies that the AI resolution is only 12 bits. The diagrams of the A/D LSB and MSB registers show 15 data bits and a sign bit, which matches what the software expects for the `das08_encode16` AI encoding method. Signed-off-by: Ian Abbott <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> [bwh: Backported to 3.2: adjust indentation] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4b18f08be01a7b3c7b6df497137b6e3cb28adaa3 upstream. `do_cmd_ioctl()` is called with the comedi device's mutex locked to process the `COMEDI_CMD` ioctl to set up comedi's asynchronous command handling on a comedi subdevice. `comedi_read()` and `comedi_write()` are the `read` and `write` handlers for the comedi device, but do not lock the mutex (for performance reasons, as some things can hold the mutex for quite a long time). There is a race condition if `comedi_read()` or `comedi_write()` is running at the same time and for the same file object and comedi subdevice as `do_cmd_ioctl()`. `do_cmd_ioctl()` sets the subdevice's `busy` pointer to the file object way before it sets the `SRF_RUNNING` flag in the subdevice's `runflags` member. `comedi_read() and `comedi_write()` check the subdevice's `busy` pointer is pointing to the current file object, then if the `SRF_RUNNING` flag is not set, will call `do_become_nonbusy()` to shut down the asyncronous command. Bad things can happen if the asynchronous command is being shutdown and set up at the same time. To prevent the race, don't set the `busy` pointer until after the `SRF_RUNNING` flag has been set. Also, make sure the mutex is held in `comedi_read()` and `comedi_write()` while calling `do_become_nonbusy()` in order to avoid moving the race condition to a point within that function. Change some error handling `goto cleanup` statements in `do_cmd_ioctl()` to simple `return -ERRFOO` statements as a result of changing when the `busy` pointer is set. Signed-off-by: Ian Abbott <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit b5e2f339865fb443107e5b10603e53bbc92dc054 upstream. We need to check the length parameter before doing the memcpy(). I've actually changed it to strlcpy() as well so that it's NUL terminated. You need CAP_NET_ADMIN to trigger these so it's not the end of the world. Reported-by: Nico Golde <[email protected]> Reported-by: Fabian Yamaguchi <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit e66fc1fba248738d32f3b64508f9ef1176d9e767 upstream. This patch create and initalizes a new device id of 0x172 as reported by Rinat Camalov <[email protected]>. In addition, a comment is added to the potential invalid existing device id. Reported-by: Rinat Camalov <[email protected]> Signed-off-by: Kevin McKinney <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 4f29ef050848245f7c180b95ccf67dfcd76b1fd8 upstream. This patch adds two new products and modifies the device id table to include them. In addition, product of 0xbccd - BCM_USB_PRODUCT_ID_SM250 is removed because Beceem, ZTE, Sprint use this id for block devices. Reported-by: Muhammad Minhazul Haque <[email protected]> Signed-off-by: Kevin McKinney <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit bf593907f7236e95698a76b7c7a2bbf8b1165327 upstream. Normally, the kernel emulates a few instructions that are unimplemented on some processors (e.g. the old dcba instruction), or privileged (e.g. mfpvr). The emulation of unimplemented instructions is currently not working on the PowerNV platform. The reason is that on these machines, unimplemented and illegal instructions cause a hypervisor emulation assist interrupt, rather than a program interrupt as on older CPUs. Our vector for the emulation assist interrupt just calls program_check_exception() directly, without setting the bit in SRR1 that indicates an illegal instruction interrupt. This fixes it by making the emulation assist interrupt set that bit before calling program_check_interrupt(). With this, old programs that use no-longer implemented instructions such as dcba now work again. Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…nning_secondaries commit 8246aca7058f3f2c2ae503081777965cd8df7b90 upstream. the smp_release_cpus is a normal funciton and called in normal environments, but it calls the __initdata spinning_secondaries. need modify spinning_secondaries to match smp_release_cpus. the related warning: (the linker report boot_paca.33377, but it should be spinning_secondaries) ----------------------------------------------------------------------------- WARNING: arch/powerpc/kernel/built-in.o(.text+0x23176): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377 The function .smp_release_cpus() references the variable __initdata boot_paca.33377. This is often because .smp_release_cpus lacks a __initdata annotation or the annotation of boot_paca.33377 is wrong. WARNING: arch/powerpc/kernel/built-in.o(.text+0x231fe): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377 The function .smp_release_cpus() references the variable __initdata boot_paca.33377. This is often because .smp_release_cpus lacks a __initdata annotation or the annotation of boot_paca.33377 is wrong. ----------------------------------------------------------------------------- Signed-off-by: Chen Gang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…rvisor commit f5f6cbb61610b7bf9d9d96db9c3979d62a424bab upstream. /proc/powerpc/lparcfg is an ancient facility (though still actively used) which allows access to some informations relative to the partition when running underneath a PAPR compliant hypervisor. It makes no sense on non-pseries machines. However, currently, not only can it be created on these if the kernel has pseries support, but accessing it on such a machine will crash due to trying to do hypervisor calls. In fact, it should also not do HV calls on older pseries that didn't have an hypervisor either. Finally, it has the plumbing to be a module but is a "bool" Kconfig option. This fixes the whole lot by turning it into a machine_device_initcall that is only created on pseries, and adding the necessary hypervisor check before calling the H_GET_EM_PARMS hypercall Signed-off-by: Benjamin Herrenschmidt <[email protected]> [bwh: Backported to 3.2: lparcfg_cleanup() was a bit different] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…ic() commit 8f21bd0090052e740944f9397e2be5ac7957ded7 upstream. The csum_partial_copy_generic() function saves the PowerPC non-volatile r14, r15, and r16 registers for the main checksum-and-copy loop. Unfortunately, it fails to restore them upon error exit from this loop, which results in silent corruption of these registers in the presumably rare event of an access exception within that loop. This commit therefore restores these register on error exit from the loop. Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> [bwh: Backported to 3.2: register name macros use lower-case 'r'] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 5676005acf26ab7e924a8438ea4746e47d405762 upstream. need set '\0' for 'local_buffer'. SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of rtas_data_buf may truncated in memcpy. if contents are really truncated. the splpar_strlen is more than 1026. the next while loop checking will not find the end of buffer. that will cause memory access violation. Signed-off-by: Chen Gang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 84b073868b9d9e754ae48b828337633d1b386482 upstream. When reading from the dispatch trace log (dtl) userspace interface, I sometimes see duplicate entries. One example: # hexdump -C dtl.out 00000000 07 04 00 0c 00 00 48 44 00 00 00 00 00 00 00 00 00000010 00 0c a0 b4 16 83 6d 68 00 00 00 00 00 00 00 00 00000020 00 00 00 00 10 00 13 50 80 00 00 00 00 00 d0 32 00000030 07 04 00 0c 00 00 48 44 00 00 00 00 00 00 00 00 00000040 00 0c a0 b4 16 83 6d 68 00 00 00 00 00 00 00 00 00000050 00 00 00 00 10 00 13 50 80 00 00 00 00 00 d0 32 The problem is in scan_dispatch_log() where we call dtl_consumer() but bail out before incrementing the index. To fix this I moved dtl_consumer() after the timebase comparison. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 771d09b3c4c45d4d534a83a68e6331b97fd82e15 upstream. On a HP Pavilion dm4 laptop the BIOS sets minimum backlight on boot, completely dimming the screen. Ignore this initial value for this machine. Signed-off-by: Gustavo Maciel Dias Vieira <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> [wyj: Backported to 3.4: adjust context] Signed-off-by: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit d1211af3049f4c9c1d8d4eb8f8098cc4f4f0d0c7 upstream. arch/powerpc/kernel/sysfs.c exports PURR with write permission. This may be valid for kernel in phyp mode. But writing to the file in guest mode causes crash due to a priviledge violation Signed-off-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> [Backported to 3.4: adjust context] Signed-off-by: Yijing Wang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
… entry commit 4a705fef986231a3e7a6b1a6d3c37025f021f49f upstream. There's a race between fork() and hugepage migration, as a result we try to "dereference" a swap entry as a normal pte, causing kernel panic. The cause of the problem is that copy_hugetlb_page_range() can't handle "swap entry" family (migration entry and hwpoisoned entry) so let's fix it. [[email protected]: coding-style fixes] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: <[email protected]> [2.6.37+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit d05f0cdcbe6388723f1900c549b4850360545201 upstream. In v2.6.34 commit 9d8cebd ("mm: fix mbind vma merge problem") introduced vma merging to mbind(), but it should have also changed the convention of passing start vma from queue_pages_range() (formerly check_range()) to new_vma_page(): vma merging may have already freed that structure, resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. Fixes: 9d8cebd ("mm: fix mbind vma merge problem") Reported-by: Naoya Horiguchi <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 3d28bd840b2d3981cd28caf5fe1df38f1344dd60 upstream. Add ID of the Telewell 4G v2 hardware to option driver to get legacy serial interface working Signed-off-by: Bernd Wachter <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit b9326057a3d8447f5d2e74a7b521ccf21add2ec0 upstream. Corsair USB Dongles are shipped with Corsair AXi series PSUs. These are cp210x serial usb devices, so make driver detect these. I have a program, that can get information from these PSUs. Tested with 2 different dongles shipped with Corsair AX860i and AX1200i units. Signed-off-by: Andras Kovacs <[email protected]> Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 5a7fbe7e9ea0b1b9d7ffdba64db1faa3a259164c upstream. This patch adds PID 0x0003 to the VID 0x128d (Testo). At least the Testo 435-4 uses this, likely other gear as well. Signed-off-by: Bert Vermeulen <[email protected]> Cc: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 391acf970d21219a2a5446282d3b20eace0c0d7a upstream. When runing with the kernel(3.15-rc7+), the follow bug occurs: [ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586 [ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python [ 9969.441175] INFO: lockdep is turned off. [ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85 [ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012 [ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18 [ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c [ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000 [ 9969.974071] Call Trace: [ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66 [ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130 [ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0 [ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210 [ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140 [ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150 [ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150 [ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40 [ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10 [ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50 [ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40 [ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380 [ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0 [ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20 [ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90 [ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b The cause is that cpuset_mems_allowed() try to take mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in __mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock protection region to protect the access to cpuset only in current_cpuset_is_being_rebound(). So that we can avoid this bug. This patch is a temporary solution that just addresses the bug mentioned above, can not fix the long-standing issue about cpuset.mems rebinding on fork(): "When the forker's task_struct is duplicated (which includes ->mems_allowed) and it races with an update to cpuset_being_rebound in update_tasks_nodemask() then the task's mems_allowed doesn't get updated. And the child task's mems_allowed can be wrong if the cpuset's nodemask changes before the child has been added to the cgroup's tasklist." Signed-off-by: Gu Zheng <[email protected]> Acked-by: Li Zefan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit df86754b746e9a0ff6f863f690b1c01d408e3cdc upstream. temp2_input should not be writable, fix it. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Axel Lin <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 1035a9e3e9c76b64a860a774f5b867d28d34acc2 upstream. Writing to fanX_div does not clear the cache. As a result, reading from fanX_div may return the old value for up to two seconds after writing a new value. This patch ensures the fan_div cache is updated in set_fan_div(). Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Axel Lin <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f56029410a13cae3652d1f34788045c40a13ffc7 upstream. We are seeing a lot of PMU warnings on POWER8: Can't find PMC that caused IRQ Looking closer, the active PMC is 0 at this point and we took a PMU exception on the transition from negative to 0. Some versions of POWER8 have an issue where they edge detect and not level detect PMC overflows. A number of places program the PMC with (0x80000000 - period_left), where period_left can be negative. We can either fix all of these or just ensure that period_left is always >= 1. This patch takes the second option. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit ae0f78de2c43b6fadd007c231a352b13b5be8ed2 upstream. Make it clear that values printed are times, and that it is error since last fsck. Also add note about fsck version required. Signed-off-by: Pavel Machek <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0986c1a55ca64b44ee126a2f719a6e9f28cbe0ed upstream. When we set the valid bit on invalid GART entries they are loaded into the TLB when an adjacent entry is loaded. This poisons the TLB with invalid entries which are sometimes not correctly removed on TLB flush. For stable inclusion the patch probably needs to be modified a bit. Signed-off-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 099ed151675cd1d2dbeae1dac697975f6a68716d upstream. Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 397335f004f41e5fcf7a795e94eb3ab83411a17c upstream. The current deadlock detection logic does not work reliably due to the following early exit path: /* * Drop out, when the task has no waiters. Note, * top_waiter can be NULL, when we are in the deboosting * mode! */ if (top_waiter && (!task_has_pi_waiters(task) || top_waiter != task_top_pi_waiter(task))) goto out_unlock_pi; So this not only exits when the task has no waiters, it also exits unconditionally when the current waiter is not the top priority waiter of the task. So in a nested locking scenario, it might abort the lock chain walk and therefor miss a potential deadlock. Simple fix: Continue the chain walk, when deadlock detection is enabled. We also avoid the whole enqueue, if we detect the deadlock right away (A-A). It's an optimization, but also prevents that another waiter who comes in after the detection and before the task has undone the damage observes the situation and detects the deadlock and returns -EDEADLOCK, which is wrong as the other task is not in a deadlock situation. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Cc: Lai Jiangshan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 82084984383babe728e6e3c9a8e5c46278091315 upstream. When we walk the lock chain, we drop all locks after each step. So the lock chain can change under us before we reacquire the locks. That's harmless in principle as we just follow the wrong lock path. But it can lead to a false positive in the dead lock detection logic: T0 holds L0 T0 blocks on L1 held by T1 T1 blocks on L2 held by T2 T2 blocks on L3 held by T3 T4 blocks on L4 held by T4 Now we walk the chain lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> drop locks T2 times out and blocks on L0 Now we continue: lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all. Brad tried to work around that in the deadlock detection logic itself, but the more I looked at it the less I liked it, because it's crystal ball magic after the fact. We actually can detect a chain change very simple: lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> next_lock = T2->pi_blocked_on->lock; drop locks T2 times out and blocks on L0 Now we continue: lock T2 -> if (next_lock != T2->pi_blocked_on->lock) return; So if we detect that T2 is now blocked on a different lock we stop the chain walk. That's also correct in the following scenario: lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 -> next_lock = T2->pi_blocked_on->lock; drop locks T3 times out and drops L3 T2 acquires L3 and blocks on L4 now Now we continue: lock T2 -> if (next_lock != T2->pi_blocked_on->lock) return; We don't have to follow up the chain at that point, because T2 propagated our priority up to T4 already. [ Folded a cleanup patch from peterz ] Signed-off-by: Thomas Gleixner <[email protected]> Reported-by: Brad Mouring <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream. The following bug can be triggered by hot adding and removing a large number of xen domain0's vcpus repeatedly: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group PGD 5a9d5067 PUD 13067 PMD 0 Oops: 0000 [#3] SMP [...] Call Trace: load_balance ? _raw_spin_unlock_irqrestore idle_balance __schedule schedule schedule_timeout ? lock_timer_base schedule_timeout_uninterruptible msleep lock_device_hotplug_sysfs online_store dev_attr_store sysfs_write_file vfs_write SyS_write system_call_fastpath Last level cache shared mask is built during CPU up and the build_sched_domain() routine takes advantage of it to setup the sched domain CPU topology. However, llc_shared_mask is not released during CPU disable, which leads to an invalid sched domainCPU topology. This patch fix it by releasing the llc_shared_mask correctly during CPU disable. Yasuaki also reported that this can happen on real hardware: https://lkml.org/lkml/2014/7/22/1018 His case is here: == Here is an example on my system. My system has 4 sockets and each socket has 15 cores and HT is enabled. In this case, each core of sockes is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-44, 90-104 Socket#3 | 45-59, 105-119 Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-44 and 90-104. When hot-removing socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains having 0x3fff80000001fffc0000000. After that, when hot-adding socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-59 Socket#3 | 90-119 Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-59 and 90-104. So the mask has the wrong value. Signed-off-by: Wanpeng Li <[email protected]> Tested-by: Linn Crosetto <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Reviewed-by: Toshi Kani <[email protected]> Reviewed-by: Yasuaki Ishimatsu <[email protected]> Cc: David Rientjes <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit e0e5070b20e01f0321f97db4e4e174f3f6b49e50 upstream. This will simplify code when we add new flags. v3: - Kees pointed out that no_new_privs should never be cleared, so we shouldn't define task_clear_no_new_privs(). we define 3 macros instead of a single one. v2: - updated scripts/tags.sh, suggested by Peter Cc: Ingo Molnar <[email protected]> Cc: Miao Xie <[email protected]> Cc: Tetsuo Handa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Zefan Li <[email protected]> Signed-off-by: Tejun Heo <[email protected]> [lizf: Backported to 3.4: - adjust context - remove no_new_priv code - add atomic_flags to struct task_struct]
commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream. When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Miao Xie <[email protected]> Cc: Kees Cook <[email protected]> Fixes: 950592f ("cpusets: update tasks' page/slab spread flags in time") Reported-by: Tetsuo Handa <[email protected]> Signed-off-by: Zefan Li <[email protected]> Signed-off-by: Tejun Heo <[email protected]> [lizf: Backported to 3.4: - adjust context - check current->flags & PF_MEMPOLICY rather than current->mempolicy]
commit 8a574cfa2652545eb95595d38ac2a0bb501af0ae upstream. Every mcount() call in the MIPS 32-bit kernel is done as follows: [...] move at, ra jal _mcount addiu sp, sp, -8 [...] but upon returning from the mcount() function, the stack pointer is not adjusted properly. This is explained in details in 58b69401c797 (MIPS: Function tracer: Fix broken function tracing). Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.) fixed the stack manipulation for 64-bit but it didn't fix it completely for MIPS32. Signed-off-by: Markos Chandras <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/7792/ Signed-off-by: Ralf Baechle <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 56d7acc792c0d98f38f22058671ee715ff197023 upstream. This bug leads to reproducible silent data loss, despite the use of msync(), sync() and a clean unmount of the file system. It is easily reproducible with the following script: ----------------[BEGIN SCRIPT]-------------------- mkfs.nilfs2 -f /dev/sdb mount /dev/sdb /mnt dd if=/dev/zero bs=1M count=30 of=/mnt/testfile umount /mnt mount /dev/sdb /mnt CHECKSUM_BEFORE="$(md5sum /mnt/testfile)" /root/mmaptest/mmaptest /mnt/testfile 30 10 5 sync CHECKSUM_AFTER="$(md5sum /mnt/testfile)" umount /mnt mount /dev/sdb /mnt CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)" umount /mnt echo "BEFORE MMAP:\t$CHECKSUM_BEFORE" echo "AFTER MMAP:\t$CHECKSUM_AFTER" echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT" ----------------[END SCRIPT]-------------------- The mmaptest tool looks something like this (very simplified, with error checking removed): ----------------[BEGIN mmaptest]-------------------- data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE, MAP_SHARED, fd, file_offset); for (i = 0; i < write_count; ++i) { memcpy(data + i * 4096, buf, sizeof(buf)); msync(data, file_size - file_offset, MS_SYNC)) } ----------------[END mmaptest]-------------------- The output of the script looks something like this: BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile So it is clear, that the changes done using mmap() do not survive a remount. This can be reproduced a 100% of the time. The problem was introduced in commit 136e8770cd5d ("nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary"). If the page was read with mpage_readpage() or mpage_readpages() for example, then it has no buffers attached to it. In that case page_has_buffers(page) in nilfs_set_page_dirty() will be false. Therefore nilfs_set_file_dirty() is never called and the pages are never collected and never written to disk. This patch fixes the problem by also calling nilfs_set_file_dirty() if the page has no buffers attached to it. [[email protected]: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/] Signed-off-by: Andreas Rohner <[email protected]> Tested-by: Andreas Rohner <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 5760a97c7143c208fa3a8f8cad0ed7dd672ebd28 upstream. There is a deadlock case which reported by Guozhonghua: https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html This case is caused by &res->spinlock and &dlm->master_lock misordering in different threads. It was introduced by commit 8d400b8 ("ocfs2/dlm: Clean up refmap helpers"). Since lockres is new, it doesn't not require the &res->spinlock. So remove it. Fixes: 8d400b8 ("ocfs2/dlm: Clean up refmap helpers") Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: joyce.xue <[email protected]> Reported-by: Guozhonghua <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit b928095b0a7cff7fb9fcf4c706348ceb8ab2c295 upstream. If overwriting an empty directory with rename, then need to drop the extra nlink. Test prog: #include <stdio.h> #include <fcntl.h> #include <err.h> #include <sys/stat.h> int main(void) { const char *test_dir1 = "test-dir1"; const char *test_dir2 = "test-dir2"; int res; int fd; struct stat statbuf; res = mkdir(test_dir1, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir1); res = mkdir(test_dir2, 0777); if (res == -1) err(1, "mkdir(\"%s\")", test_dir2); fd = open(test_dir2, O_RDONLY); if (fd == -1) err(1, "open(\"%s\")", test_dir2); res = rename(test_dir1, test_dir2); if (res == -1) err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2); res = fstat(fd, &statbuf); if (res == -1) err(1, "fstat(%i)", fd); if (statbuf.st_nlink != 0) { fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink); return 1; } return 0; } Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Al Viro <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 5ca918e5e3f9df4634077c06585c42bc6a8d699a upstream. The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn instructions (where the optional alignment hint is given but incorrect) as LDR/STR, leading to register corruption. Detect these and correctly treat them as unhandled, so that userspace gets the fault it expects. Reported-by: Simon Hosie <[email protected]> Signed-off-by: Robin Murphy <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit d3cb8bf6081b8b7a2dabb1264fe968fd870fa595 upstream. A migration entry is marked as write if pte_write was true at the time the entry was created. The VMA protections are not double checked when migration entries are being removed as mprotect marks write-migration-entries as read. It means that potentially we take a spurious fault to mark PTEs write again but it's straight-forward. However, there is a race between write migrations being marked read and migrations finishing. This potentially allows a PTE to be write that should have been read. Close this race by double checking the VMA permissions using maybe_mkwrite when migration completes. [[email protected]: use maybe_mkwrite] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <[email protected]>
commit 6c72e3501d0d62fc064d3680e5234f3463ec5a86 upstream. Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: Oleg Nesterov <[email protected]> Reported-by: Oleg Nesterov <[email protected]> Reported-by: Sylvain 'ythier' Hitier <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 361e9dfbaae84b0b246ed18d1ab7c82a1a41b53e upstream. The buffers sized by CONFIG_LOG_BUF_SHIFT and CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't ask about their size at all. Signed-off-by: Josh Triplett <[email protected]> Acked-by: Randy Dunlap <[email protected]> [lizf: Backported to 3.4: - drop the change to CONFIG_LOG_CPU_MAX_BUF_SHIFT as it doesn't exist in 3.4] Signed-off-by: Zefan Li <[email protected]>
commit 46f341ffcfb5d8530f7d1e60f3be06cce6661b62 upstream. Commit 2da78092 changed the locking from a mutex to a spinlock, so we now longer sleep in this context. But there was a leftover might_sleep() in there, which now triggers since we do the final free from an RCU callback. Get rid of it. Reported-by: Pontus Fuchs <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit bd8c78e78d5011d8111bc2533ee73b13a3bd6c42 upstream. In testmode and vendor command reply/event SKBs we use the skb cb data to store nl80211 parameters between allocation and sending. This causes the code for CONFIG_NETLINK_MMAP to get confused, because it takes ownership of the skb cb data when the SKB is handed off to netlink, and it doesn't explicitly clear it. Clear the skb cb explicitly when we're done and before it gets passed to netlink to avoid this issue. Reported-by: Assaf Azulay <[email protected]> Reported-by: David Spinadel <[email protected]> Signed-off-by: Johannes Berg <[email protected]> [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li <[email protected]> Conflicts: net/wireless/nl80211.c
commit 36de928641ee48b2078d3fe9514242aaa2f92013 upstream. If we run into some kind of error, such as ENOMEM, while calling ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error gets propagated up to ext4_find_entry() and then to its callers. This way, transient errors such as ENOMEM can get propagated to the VFS. This is important so that the system calls return the appropriate error, and also so that in the case of ext4_lookup(), we return an error instead of a NULL inode, since that will result in a negative dentry cache entry that will stick around long past the OOM condition which caused a transient ENOMEM error. Google-Bug-Id: #17142205 Signed-off-by: Theodore Ts'o <[email protected]> [lizf: Backported to 3.4: - adjust context - s/old.bh/old_bh/g - s/new.bh/new_bh/g - drop the changes to ext4_find_delete_entry() and ext4_cross_rename() - add return value check for one more exr4_find_entry() in ext4_rename()] Signed-off-by: Zefan Li <[email protected]>
commit a9cfcd63e8d206ce4235c355d857c4fbdf0f4587 upstream. Thanks to Dan Carpenter for extending smatch to find bugs like this. (This was found using a development version of smatch.) Fixes: 36de928641ee48b2078d3fe9514242aaa2f92013 Reported-by: Dan Carpenter <[email protected] Signed-off-by: Theodore Ts'o <[email protected]> [lizf: Backported to 3.4: - s/new.bh/new_bh/ - drop the change to ext4_cross_rename()] Signed-off-by: Zefan Li <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/1348670 Fix regression introduced in pre-3.14 kernels by cherry-picking aa07c713ecfc0522916f3cd57ac628ea6127c0ec (NFSD: Call ->set_acl with a NULL ACL structure if no entries). The affected code was removed in 3.14 by commit 4ac7249ea5a0ceef9f8269f63f33cc873c3fac61 (nfsd: use get_acl and ->set_acl). The ->set_acl methods are already able to cope with a NULL argument. Signed-off-by: Sergio Gelato <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream. When running a 32-bit userspace on a 64-bit kernel (eg. i386 application on x86_64 kernel or 32-bit arm userspace on arm64 kernel) some of the perf ioctls must be treated with special care, as they have a pointer size encoded in the command. For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded as 0x80042407, but 64-bit kernel will expect 0x80082407. In result the ioctl will fail returning -ENOTTY. This patch solves the problem by adding code fixing up the size as compat_ioctl file operation. Reported-by: Drew Richardson <[email protected]> Signed-off-by: Pawel Moll <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Cc: David Ahern <[email protected]> [lizf: Backported to 3.4 by David Ahern] Signed-off-by: Zefan Li <[email protected]>
commit da64c27d3c93ee9f89956b9de86c4127eb244494 upstream. LDISCs shouldn't call tty->ops->write() from within ->write_wakeup(). ->write_wakeup() is called with port lock taken and IRQs disabled, tty->ops->write() will try to acquire the same port lock and we will deadlock. Acked-by: Marcel Holtmann <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Reported-by: Huang Shijie <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Tested-by: Andreas Bießmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> [[email protected]: rebased on 3.4.103] Signed-off-by: Tim Niemeyer <[email protected]> Signed-off-by: Zefan Li <[email protected]> Conflicts: drivers/bluetooth/hci_ldisc.c
commit a6138db815df5ee542d848318e5dae681590fccd upstream. Kenton Varda <[email protected]> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Acked-by: Serge E. Hallyn <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]> Cc: Francis Moreau <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream. The DM crypt target accesses memory beyond allocated space resulting in a crash on 32 bit x86 systems. This bug is very old (it dates back to 2.6.25 commit 3a7f6c9 "dm crypt: use async crypto"). However, this bug was masked by the fact that kmalloc rounds the size up to the next power of two. This bug wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio data"). By switching to using per-bio data there was no longer any padding beyond the end of a dm-crypt allocated memory block. To minimize allocation overhead dm-crypt puts several structures into one block allocated with kmalloc. The block holds struct ablkcipher_request, cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))), struct dm_crypt_request and an initialization vector. The variable dmreq_start is set to offset of struct dm_crypt_request within this memory block. dm-crypt allocates the block with this size: cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size. When accessing the initialization vector, dm-crypt uses the function iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq + 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1). dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request structure. However, when dm-crypt accesses the initialization vector, it takes a pointer to the end of dm_crypt_request, aligns it, and then uses it as the initialization vector. If the end of dm_crypt_request is not aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the alignment causes the initialization vector to point beyond the allocated space. Fix this bug by calculating the variable iv_size_padding and adding it to the allocated size. Also correct the alignment of dm_crypt_request. struct dm_crypt_request is specific to dm-crypt (it isn't used by the crypto subsystem at all), so it is aligned on __alignof__(struct dm_crypt_request). Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is aligned as if the block was allocated with kmalloc. Reported-by: Krzysztof Kolasa <[email protected]> Tested-by: Milan Broz <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> [lizf: Backported by Mikulas] Signed-off-by: Zefan Li <[email protected]>
commit d555a2abf3481f81303d835046a5ec2c4fb3ca8e upstream. We unconditionally execute scsi_eh_get_sense() to make sure all failed commands that should have sense attached, do. However, the routine forgets that some commands, because of the way they fail, will not have any sense code ... we should not bother them with a REQUEST_SENSE command. Fix this by testing to see if we actually got a CHECK_CONDITION return and skip asking for sense if we don't. Tested-by: Alan Stern <[email protected]> Signed-off-by: James Bottomley <[email protected]> Signed-off-by: Zefan Li <[email protected]>
Currently the route garbage collector gets called by dst_alloc() if it have more entries than the threshold. But it's an expensive call, that don't really need to be done by then. Another issue with current way is that it allows running the garbage collector with the same start parameters on multiple CPUs at once, which is not optimal. A system may even soft lockup if the cache is big enough as the garbage collectors will be fighting over the hash lock entries. This patch thus moves the garbage collector to run asynchronously on a work queue, much similar to how rt_expire_check runs. There is one condition left that allows multiple executions, which is handled by the next patch. Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Zefan Li <[email protected]>
When rt_intern_hash() has to deal with neighbour cache overflowing, it triggers the route cache garbage collector in an attempt to free some references on neighbour entries. Such call cannot be done async but should also not run in parallel with an already-running one, so that they don't collapse fighting over the hash lock entries. This patch thus blocks parallel executions with spinlocks: - A call from worker and from rt_intern_hash() are not the same, and cannot be merged, thus they will wait each other on rt_gc_lock. - Calls to gc from rt_intern_hash() may happen in parallel but we must wait for it to finish in order to try again. This dedup and synchrinozation is then performed by the locking just before calling __do_rt_garbage_collect(). Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Zefan Li <[email protected]>
Further tests revealed that after moving the garbage collector to a work queue and protecting it with a spinlock may leave the system prone to soft lockups if bottom half gets very busy. It was reproced with a set of firewall rules that REJECTed packets. If the NIC bottom half handler ends up running on the same CPU that is running the garbage collector on a very large cache, the garbage collector will not be able to do its job due to the amount of work needed for handling the REJECTs and also won't reschedule. The fix is to disable bottom half during the garbage collecting, as it already was in the first place (most calls to it came from softirqs). Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Zefan Li <[email protected]>
Dmitry Semyonov reported that after upgrading from 3.2.54 to 3.2.57 the rtl8192ce driver will crash when its interface is brought up. The oops message shows: [ 1833.611397] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 1833.611455] IP: [<ffffffffa0410c6a>] rtl92ce_update_hal_rate_tbl+0x29/0x4db [rtl8192ce] ... [ 1833.613326] Call Trace: [ 1833.613346] [<ffffffffa02ad9c6>] ? rtl92c_dm_watchdog+0xd0b/0xec9 [rtl8192c_common] [ 1833.613391] [<ffffffff8105b5cf>] ? process_one_work+0x161/0x269 [ 1833.613425] [<ffffffff8105c598>] ? worker_thread+0xc2/0x145 [ 1833.613458] [<ffffffff8105c4d6>] ? manage_workers.isra.25+0x15b/0x15b [ 1833.613496] [<ffffffff8105f6d9>] ? kthread+0x76/0x7e [ 1833.613527] [<ffffffff81356b74>] ? kernel_thread_helper+0x4/0x10 [ 1833.613563] [<ffffffff8105f663>] ? kthread_worker_fn+0x139/0x139 [ 1833.613598] [<ffffffff81356b70>] ? gs_change+0x13/0x13 Disassembly of rtl92ce_update_hal_rate_tbl() shows that the 'sta' parameter was null. None of the changes to the rtlwifi family between 3.2.54 and 3.2.57 seem to directly cause this, and reverting commit f78bccd79ba3 ('rtlwifi: rtl8192ce: Fix too long disable of IRQs') doesn't fix it. rtl92c_dm_watchdog() calls rtl92ce_update_hal_rate_tbl() via rtl92c_dm_refresh_rate_adaptive_mask(), which does not appear in the call trace as it was inlined. That function has been completely removed upstream which may explain why this crash wasn't seen there. I'm not sure that it is sensible to completely remove rtl92c_dm_refresh_rate_adaptive_mask() without making other compensating changes elsewhere, so try to work around this for 3.2 by checking for a null pointer in rtl92c_dm_refresh_rate_adaptive_mask() and then skipping the call to rtl92ce_update_hal_rate_tbl(). References: https://bugs.debian.org/745137 References: https://bugs.debian.org/745462 Reported-by: Dmitry Semyonov <[email protected]> Signed-off-by: Ben Hutchings <[email protected]> Cc: Larry Finger <[email protected]> Cc: Chaoming Li <[email protected]> Cc: Satoshi IWAMOTO <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream. Currently we generate a new fragmentation id on UFO segmentation. It is pretty hairy to identify the correct net namespace and dst there. Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst available at all. This causes unreliable or very predictable ipv6 fragmentation id generation while segmentation. Luckily we already have pregenerated the ip6_frag_id in ip6_ufo_append_data and can use it here. Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]> [bwh: Backported to 3.2: adjust filename, indentation] Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c upstream. There are many cases where this feature does not improve performance or even reduces it. For example, here are the results from tests that I've run using 3.12.6 on one Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are from the Xeon, but they're similar on the i7. All numbers report the mean±stddev over 10 runs of 10s. 1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache copy from user on transmit" There is no statistically significant difference between tx-nocache-copy on/off. nic irqs spread out (one queue per cpu) 200x netperf -r 1400,1 tx-nocache-copy off 692000±1000 tps 50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3 tx-nocache-copy on 693000±1000 tps 50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7 200x netperf -r 14000,14000 tx-nocache-copy off 86450±80 tps 50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40 tx-nocache-copy on 86110±60 tps 50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20 2) single stream throughput tests tx-nocache-copy leads to higher service demand throughput cpu0 cpu1 demand (Gb/s) (Gcycle) (Gcycle) (cycle/B) nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send) tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01 tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004 nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send) tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007 tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002 As a second example, here are some results from Eric Dumazet with latest net-next. tx-nocache-copy also leads to higher service demand (cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz) lpq83:~# ./ethtool -K eth0 tx-nocache-copy on lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000 Performance counter stats for './netperf -H lpq84 -c': 4282.648396 task-clock # 0.423 CPUs utilized 9,348 context-switches # 0.002 M/sec 88 CPU-migrations # 0.021 K/sec 355 page-faults # 0.083 K/sec 11,812,797,651 cycles # 2.758 GHz [82.79%] 9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%] 4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%] 6,053,172,792 instructions # 0.51 insns per cycle # 1.49 stalled cycles per insn [83.64%] 597,275,583 branches # 139.464 M/sec [83.70%] 8,960,541 branch-misses # 1.50% of all branches [83.65%] 10.128990264 seconds time elapsed lpq83:~# ./ethtool -K eth0 tx-nocache-copy off lpq83:~# perf stat ./netperf -H lpq84 -c MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB 87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000 Performance counter stats for './netperf -H lpq84 -c': 2847.375441 task-clock # 0.281 CPUs utilized 11,632 context-switches # 0.004 M/sec 49 CPU-migrations # 0.017 K/sec 354 page-faults # 0.124 K/sec 7,646,889,749 cycles # 2.686 GHz [83.34%] 6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%] 1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%] 2,079,702,453 instructions # 0.27 insns per cycle # 2.94 stalled cycles per insn [83.22%] 363,773,213 branches # 127.757 M/sec [83.29%] 4,242,732 branch-misses # 1.17% of all branches [83.51%] 10.128449949 seconds time elapsed CC: Tom Herbert <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit 936597631dd310e220544dc5c6075d924efd39b2 upstream. The commit 4197aa7 implements 64 bit per ring statistics. But the driver resets the 'total_bytes' and 'total_packets' from RX and TX rings in the RX and TX interrupt handlers to zero. This results in statistics being lost and user space reporting RX and TX statistics as zero. This patch addresses the issue by preventing the resetting of RX and TX ring statistics to zero. Signed-off-by: Narendra K <[email protected]> Tested-by: Sibai Li <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]> Cc: Weng Meiling <[email protected]> Signed-off-by: Zefan Li <[email protected]>
commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb upstream. Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU. The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get() could return NULL if tunnel->sock->sk_dst_cache was reset just before the call, thus making dst_mtu() dereference a NULL pointer: [ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0 [ 1937.664005] Oops: 0000 [#1] SMP [ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core] [ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1 [ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008 [ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000 [ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282 [ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5 [ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000 [ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000 [ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000 [ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009 [ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000 [ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0 [ 1937.664005] Stack: [ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009 [ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57 [ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000 [ 1937.664005] Call Trace: [ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp] [ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5 [ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5 [ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26 [ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1 [ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56 [ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1 [ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb [ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb [ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b [ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89 [ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp] [ 1937.664005] RSP <ffff8800c43c7de8> [ 1937.664005] CR2: 0000000000000020 [ 1939.559375] ---[ end trace 82d44500f28f8708 ]--- Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket") Signed-off-by: Guillaume Nault <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Cc: Guillaume Nault <[email protected]> Signed-off-by: Zefan Li <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.