Skip to content

Commit

Permalink
Automatic merge of 'next-test' into merge-test (2023-10-22 12:35)
Browse files Browse the repository at this point in the history
  • Loading branch information
mpe committed Oct 22, 2023
2 parents 1a6e74a + 5e69d46 commit 3c13950
Show file tree
Hide file tree
Showing 90 changed files with 1,364 additions and 1,248 deletions.
12 changes: 12 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-fadump
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,15 @@ Contact: [email protected]
Description: read only
Provide information about the amount of memory reserved by
FADump to save the crash dump in bytes.
What: /sys/kernel/fadump/hotplug_ready
Date: Sep 2023
Contact: [email protected]
Description: read only
The Kdump scripts utilize udev rules to monitor memory add/remove
events, ensuring that FADUMP is automatically re-registered when
system memory changes occur. This re-registration was necessary
to update the elfcorehdr, which describes the system memory to the
second kernel. Now If this sysfs node holds a value of 1, it
indicates to userspace that FADUMP does not require re-registration
since the elfcorehdr is now generated in the second kernel.
User: kexec-tools
91 changes: 42 additions & 49 deletions Documentation/powerpc/firmware-assisted-dump.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,12 +134,12 @@ that are run. If there is dump data, then the
memory is held.

If there is no waiting dump data, then only the memory required to
hold CPU state, HPTE region, boot memory dump, FADump header and
elfcore header, is usually reserved at an offset greater than boot
memory size (see Fig. 1). This area is *not* released: this region
will be kept permanently reserved, so that it can act as a receptacle
for a copy of the boot memory content in addition to CPU state and
HPTE region, in the case a crash does occur.
hold CPU state, HPTE region, boot memory dump, and FADump header is
usually reserved at an offset greater than boot memory size (see Fig. 1).
This area is *not* released: this region will be kept permanently
reserved, so that it can act as a receptacle for a copy of the boot
memory content in addition to CPU state and HPTE region, in the case
a crash does occur.

Since this reserved memory area is used only after the system crash,
there is no point in blocking this significant chunk of memory from
Expand All @@ -153,22 +153,22 @@ that were present in CMA region::

o Memory Reservation during first kernel

Low memory Top of memory
0 boot memory size |<--- Reserved dump area --->| |
| | | Permanent Reservation | |
V V | | V
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| | |///|////| DUMP | HDR | ELF |////| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| ^ ^ ^ ^ ^
| | | | | |
\ CPU HPTE / | |
------------------------------ | |
Boot memory content gets transferred | |
to reserved area by firmware at the | |
time of crash. | |
FADump Header |
(meta area) |
Low memory Top of memory
0 boot memory size |<------ Reserved dump area ----->| |
| | | Permanent Reservation | |
V V | | V
+-----------+-----/ /---+---+----+-----------+-------+----+-----+
| | |///|////| DUMP | HDR |////| |
+-----------+-----/ /---+---+----+-----------+-------+----+-----+
| ^ ^ ^ ^ ^
| | | | | |
\ CPU HPTE / | |
-------------------------------- | |
Boot memory content gets transferred | |
to reserved area by firmware at the | |
time of crash. | |
FADump Header |
(meta area) |
|
|
Metadata: This area holds a metadata structure whose
Expand All @@ -186,20 +186,33 @@ that were present in CMA region::
0 boot memory size |
| |<------------ Crash preserved area ------------>|
V V |<--- Reserved dump area --->| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| | |///|////| DUMP | HDR | ELF |////| |
+-----------+-----/ /---+---+----+-------+-----+-----+----+--+
| |
V V
Used by second /proc/vmcore
kernel to boot
+----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
| |ELF| | |///|////| DUMP | HDR |/////| |
+----+---+--+-----/ /---+---+----+-------+-----+-----+-------+
| | | | | |
----- ------------------------------ ---------------
\ | |
\ | |
\ | |
\ | ----------------------------
\ | /
\ | /
\ | /
/proc/vmcore


+---+
|///| -> Regions (CPU, HPTE & Metadata) marked like this in the above
+---+ figures are not always present. For example, OPAL platform
does not have CPU & HPTE regions while Metadata region is
not supported on pSeries currently.

+---+
|ELF| -> elfcorehdr, it is created in second kernel after crash.
+---+

Note: Memory from 0 to the boot memory size is used by second kernel

Fig. 2


Expand Down Expand Up @@ -353,26 +366,6 @@ TODO:
- Need to come up with the better approach to find out more
accurate boot memory size that is required for a kernel to
boot successfully when booted with restricted memory.
- The FADump implementation introduces a FADump crash info structure
in the scratch area before the ELF core header. The idea of introducing
this structure is to pass some important crash info data to the second
kernel which will help second kernel to populate ELF core header with
correct data before it gets exported through /proc/vmcore. The current
design implementation does not address a possibility of introducing
additional fields (in future) to this structure without affecting
compatibility. Need to come up with the better approach to address this.

The possible approaches are:

1. Introduce version field for version tracking, bump up the version
whenever a new field is added to the structure in future. The version
field can be used to find out what fields are valid for the current
version of the structure.
2. Reserve the area of predefined size (say PAGE_SIZE) for this
structure and have unused area as reserved (initialized to zero)
for future field additions.

The advantage of approach 1 over 2 is we don't need to reserve extra space.

Author: Mahesh Salgaonkar <[email protected]>

Expand Down
1 change: 1 addition & 0 deletions arch/powerpc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,7 @@ config PPC
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_DESCRIPTORS if PPC64_ELF_ABI_V1
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
Expand Down
16 changes: 10 additions & 6 deletions arch/powerpc/boot/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,17 @@ set -e
# this should work for both the pSeries zImage and the iSeries vmlinux.sm
image_name=`basename $2`

if [ -f $4/$image_name ]; then
mv $4/$image_name $4/$image_name.old

echo "Warning: '${INSTALLKERNEL}' command not available... Copying" \
"directly to $4/$image_name-$1" >&2

if [ -f $4/$image_name-$1 ]; then
mv $4/$image_name-$1 $4/$image_name-$1.old
fi

if [ -f $4/System.map ]; then
mv $4/System.map $4/System.old
if [ -f $4/System.map-$1 ]; then
mv $4/System.map-$1 $4/System-$1.old
fi

cat $2 > $4/$image_name
cp $3 $4/System.map
cat $2 > $4/$image_name-$1
cp $3 $4/System.map-$1
83 changes: 31 additions & 52 deletions arch/powerpc/include/asm/book3s/32/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,15 @@

#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */
#define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */
#define _PAGE_USER 0x004 /* usermode access allowed */
#define _PAGE_READ 0x004 /* software: read access allowed */
#define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */
#define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */
#define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */
#define _PAGE_WRITETHRU 0x040 /* W: cache write-through */
#define _PAGE_DIRTY 0x080 /* C: page changed */
#define _PAGE_ACCESSED 0x100 /* R: page referenced */
#define _PAGE_EXEC 0x200 /* software: exec allowed */
#define _PAGE_RW 0x400 /* software: user write access allowed */
#define _PAGE_WRITE 0x400 /* software: user write access allowed */
#define _PAGE_SPECIAL 0x800 /* software: Special page */

#ifdef CONFIG_PTE_64BIT
Expand All @@ -42,26 +42,13 @@
#define _PMD_PRESENT_MASK (PAGE_MASK)
#define _PMD_BAD (~PAGE_MASK)

/* We borrow the _PAGE_USER bit to store the exclusive marker in swap PTEs. */
#define _PAGE_SWP_EXCLUSIVE _PAGE_USER
/* We borrow the _PAGE_READ bit to store the exclusive marker in swap PTEs. */
#define _PAGE_SWP_EXCLUSIVE _PAGE_READ

/* And here we include common definitions */

#define _PAGE_KERNEL_RO 0
#define _PAGE_KERNEL_ROX (_PAGE_EXEC)
#define _PAGE_KERNEL_RW (_PAGE_DIRTY | _PAGE_RW)
#define _PAGE_KERNEL_RWX (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)

#define _PAGE_HPTEFLAGS _PAGE_HASHPTE

#ifndef __ASSEMBLY__

static inline bool pte_user(pte_t pte)
{
return pte_val(pte) & _PAGE_USER;
}
#endif /* __ASSEMBLY__ */

/*
* Location of the PFN in the PTE. Most 32-bit platforms use the same
* as _PAGE_SHIFT here (ie, naturally aligned).
Expand Down Expand Up @@ -97,20 +84,7 @@ static inline bool pte_user(pte_t pte)
#define _PAGE_BASE_NC (_PAGE_PRESENT | _PAGE_ACCESSED)
#define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)

/*
* Permission masks used to generate the __P and __S table.
*
* Note:__pgprot is defined in arch/powerpc/include/asm/page.h
*
* Write permissions imply read permissions for now.
*/
#define PAGE_NONE __pgprot(_PAGE_BASE)
#define PAGE_SHARED __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
#define PAGE_SHARED_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | _PAGE_EXEC)
#define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER)
#define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
#define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER)
#define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
#include <asm/pgtable-masks.h>

/* Permission masks used for kernel mappings */
#define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
Expand Down Expand Up @@ -170,7 +144,14 @@ void unmap_kernel_page(unsigned long va);
* value (for now) on others, from where we can start layout kernel
* virtual space that goes below PKMAP and FIXMAP
*/
#include <asm/fixmap.h>

#define FIXADDR_SIZE 0
#ifdef CONFIG_KASAN
#include <asm/kasan.h>
#define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE)
#else
#define FIXADDR_TOP ((unsigned long)(-PAGE_SIZE))
#endif

/*
* ioremap_bot starts at that address. Early ioremaps move down from there,
Expand Down Expand Up @@ -224,9 +205,6 @@ void unmap_kernel_page(unsigned long va);
/* Bits to mask out from a PGD to get to the PUD page */
#define PGD_MASKED_BITS 0

#define pte_ERROR(e) \
pr_err("%s:%d: bad pte %llx.\n", __FILE__, __LINE__, \
(unsigned long long)pte_val(e))
#define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
/*
Expand Down Expand Up @@ -343,7 +321,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
{
pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
}

static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
Expand Down Expand Up @@ -402,8 +380,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
}

/* Generic accessors to PTE bits */
static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & _PAGE_RW);}
static inline int pte_read(pte_t pte) { return 1; }
static inline bool pte_read(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_READ);
}

static inline bool pte_write(pte_t pte)
{
return !!(pte_val(pte) & _PAGE_WRITE);
}

static inline int pte_dirty(pte_t pte) { return !!(pte_val(pte) & _PAGE_DIRTY); }
static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & _PAGE_ACCESSED); }
static inline int pte_special(pte_t pte) { return !!(pte_val(pte) & _PAGE_SPECIAL); }
Expand Down Expand Up @@ -438,10 +424,10 @@ static inline bool pte_ci(pte_t pte)
static inline bool pte_access_permitted(pte_t pte, bool write)
{
/*
* A read-only access is controlled by _PAGE_USER bit.
* We have _PAGE_READ set for WRITE and EXECUTE
* A read-only access is controlled by _PAGE_READ bit.
* We have _PAGE_READ set for WRITE
*/
if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
if (!pte_present(pte) || !pte_read(pte))
return false;

if (write && !pte_write(pte))
Expand All @@ -465,7 +451,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
/* Generic modifiers for PTE bits */
static inline pte_t pte_wrprotect(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_RW);
return __pte(pte_val(pte) & ~_PAGE_WRITE);
}

static inline pte_t pte_exprotect(pte_t pte)
Expand Down Expand Up @@ -495,6 +481,9 @@ static inline pte_t pte_mkpte(pte_t pte)

static inline pte_t pte_mkwrite_novma(pte_t pte)
{
/*
* write implies read, hence set both
*/
return __pte(pte_val(pte) | _PAGE_RW);
}

Expand All @@ -518,16 +507,6 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}

static inline pte_t pte_mkprivileged(pte_t pte)
{
return __pte(pte_val(pte) & ~_PAGE_USER);
}

static inline pte_t pte_mkuser(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_USER);
}

static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
Expand Down
Loading

0 comments on commit 3c13950

Please sign in to comment.