Skip to content

Commit

Permalink
Merge branch 'master' into next
Browse files Browse the repository at this point in the history
Merge master back into next, this allows us to resolve some conflicts in
arch/powerpc/Kconfig, and also re-sort the symbols under config PPC so
that they are in alphabetical order again.
  • Loading branch information
mpe committed May 8, 2021
2 parents 32b48bf + dd86005 commit f96271c
Show file tree
Hide file tree
Showing 1,332 changed files with 33,908 additions and 14,493 deletions.
30 changes: 30 additions & 0 deletions Documentation/ABI/testing/sysfs-bus-event_source-devices-dsa
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
What: /sys/bus/event_source/devices/dsa*/format
Date: April 2021
KernelVersion: 5.13
Contact: Tom Zanussi <tom.zanussi@linux.intel.com>
Description: Read-only. Attribute group to describe the magic bits
that go into perf_event_attr.config or
perf_event_attr.config1 for the IDXD DSA pmu. (See also
ABI/testing/sysfs-bus-event_source-devices-format).

Each attribute in this group defines a bit range in
perf_event_attr.config or perf_event_attr.config1.
All supported attributes are listed below (See the
IDXD DSA Spec for possible attribute values)::

event_category = "config:0-3" - event category
event = "config:4-31" - event ID

filter_wq = "config1:0-31" - workqueue filter
filter_tc = "config1:32-39" - traffic class filter
filter_pgsz = "config1:40-43" - page size filter
filter_sz = "config1:44-51" - transfer size filter
filter_eng = "config1:52-59" - engine filter

What: /sys/bus/event_source/devices/dsa*/cpumask
Date: April 2021
KernelVersion: 5.13
Contact: Tom Zanussi <tom.zanussi@linux.intel.com>
Description: Read-only. This file always returns the cpu to which the
IDXD DSA pmu is bound for access to all dsa pmu
performance monitoring events.
2 changes: 1 addition & 1 deletion Documentation/ABI/testing/sysfs-devices-system-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ Description: Disable L3 cache indices

All AMD processors with L3 caches provide this functionality.
For details, see BKDGs at
http://developer.amd.com/documentation/guides/Pages/default.aspx
https://www.amd.com/en/support/tech-docs?keyword=bios+kernel


What: /sys/devices/system/cpu/cpufreq/boost
Expand Down
9 changes: 9 additions & 0 deletions Documentation/ABI/testing/sysfs-driver-input-exc3000
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,12 @@ Description: Reports the model identification provided by the touchscreen, fo
Access: Read

Valid values: Represented as string

What: /sys/bus/i2c/devices/xxx/type
Date: Jan 2021
Contact: linux-input@vger.kernel.org
Description: Reports the type identification provided by the touchscreen, for example "PCAP82H80 Series"

Access: Read

Valid values: Represented as string
31 changes: 30 additions & 1 deletion Documentation/ABI/testing/sysfs-fs-f2fs
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ Date April 2019
Contact: "Daniel Rosenberg" <drosen@google.com>
Description: If checkpoint=disable, it displays the number of blocks that
are unusable.
If checkpoint=enable it displays the enumber of blocks that
If checkpoint=enable it displays the number of blocks that
would be unusable if checkpoint=disable were to be set.

What: /sys/fs/f2fs/<disk>/encoding
Expand Down Expand Up @@ -409,3 +409,32 @@ Description: Give a way to change checkpoint merge daemon's io priority.
I/O priority "3". We can select the class between "rt" and "be",
and set the I/O priority within valid range of it. "," delimiter
is necessary in between I/O class and priority number.

What: /sys/fs/f2fs/<disk>/ovp_segments
Date: March 2021
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description: Shows the number of overprovision segments.

What: /sys/fs/f2fs/<disk>/compr_written_block
Date: March 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Show the block count written after compression since mount. Note
that when the compressed blocks are deleted, this count doesn't
decrease. If you write "0" here, you can initialize
compr_written_block and compr_saved_block to "0".

What: /sys/fs/f2fs/<disk>/compr_saved_block
Date: March 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Show the saved block count with compression since mount. Note
that when the compressed blocks are deleted, this count doesn't
decrease. If you write "0" here, you can initialize
compr_written_block and compr_saved_block to "0".

What: /sys/fs/f2fs/<disk>/compr_new_inode
Date: March 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Show the count of inode newly enabled for compression since mount.
Note that when the compression is disabled for the files, this count
doesn't decrease. If you write "0" here, you can initialize
compr_new_inode to "0".
25 changes: 25 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-mm-cma
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
What: /sys/kernel/mm/cma/
Date: Feb 2021
Contact: Minchan Kim <minchan@kernel.org>
Description:
/sys/kernel/mm/cma/ contains a subdirectory for each CMA
heap name (also sometimes called CMA areas).

Each CMA heap subdirectory (that is, each
/sys/kernel/mm/cma/<cma-heap-name> directory) contains the
following items:

alloc_pages_success
alloc_pages_fail

What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_success
Date: Feb 2021
Contact: Minchan Kim <minchan@kernel.org>
Description:
the number of pages CMA API succeeded to allocate

What: /sys/kernel/mm/cma/<cma-heap-name>/alloc_pages_fail
Date: Feb 2021
Contact: Minchan Kim <minchan@kernel.org>
Description:
the number of pages CMA API failed to allocate
2 changes: 1 addition & 1 deletion Documentation/admin-guide/devices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

1 char Memory devices
1 = /dev/mem Physical memory access
2 = /dev/kmem Kernel virtual memory access
2 = /dev/kmem OBSOLETE - replaced by /proc/kcore
3 = /dev/null Null device
4 = /dev/port I/O port access
5 = /dev/zero Null byte source
Expand Down
11 changes: 6 additions & 5 deletions Documentation/admin-guide/gpio/gpio-mockup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,18 @@ module.
gpio_mockup_ranges

This parameter takes an argument in the form of an array of integer
pairs. Each pair defines the base GPIO number (if any) and the number
of lines exposed by the chip. If the base GPIO is -1, the gpiolib
will assign it automatically.
pairs. Each pair defines the base GPIO number (non-negative integer)
and the first number after the last of this chip. If the base GPIO
is -1, the gpiolib will assign it automatically. while the following
parameter is the number of lines exposed by the chip.

Example: gpio_mockup_ranges=-1,8,-1,16,405,4
Example: gpio_mockup_ranges=-1,8,-1,16,405,409

The line above creates three chips. The first one will expose 8 lines,
the second 16 and the third 4. The base GPIO for the third chip is set
to 405 while for two first chips it will be assigned automatically.

gpio_named_lines
gpio_mockup_named_lines

This parameter doesn't take any arguments. It lets the driver know that
GPIO lines exposed by it should be named.
Expand Down
41 changes: 36 additions & 5 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1469,6 +1469,12 @@
Don't use this when you are not running on the
android emulator

gpio-mockup.gpio_mockup_ranges
[HW] Sets the ranges of gpiochip of for this device.
Format: <start1>,<end1>,<start2>,<end2>...
gpio-mockup.gpio_mockup_named_lines
[HW] Let the driver know GPIO lines should be named.

gpt [EFI] Forces disk with valid GPT signature but
invalid Protective MBR to be treated as GPT. If the
primary GPT is corrupted, it enables the backup/alternate
Expand All @@ -1492,10 +1498,6 @@
Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0.
Default: 1024

gpio-mockup.gpio_mockup_ranges
[HW] Sets the ranges of gpiochip of for this device.
Format: <start1>,<end1>,<start2>,<end2>...

hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
Expand Down Expand Up @@ -1833,6 +1835,18 @@
initcall functions. Useful for debugging built-in
modules and initcalls.

initramfs_async= [KNL]
Format: <bool>
Default: 1
This parameter controls whether the initramfs
image is unpacked asynchronously, concurrently
with devices being probed and
initialized. This should normally just work,
but as a debugging aid, one can get the
historical behaviour of the initramfs
unpacking being completed before device_ and
late_ initcalls.

initrd= [BOOT] Specify the location of the initial ramdisk

initrdmem= [KNL] Specify a physical address and size from which to
Expand Down Expand Up @@ -2802,7 +2816,24 @@
seconds. Use this parameter to check at some
other rate. 0 disables periodic checking.

memtest= [KNL,X86,ARM,PPC] Enable memtest
memory_hotplug.memmap_on_memory
[KNL,X86,ARM] Boolean flag to enable this feature.
Format: {on | off (default)}
When enabled, runtime hotplugged memory will
allocate its internal metadata (struct pages)
from the hotadded memory which will allow to
hotadd a lot of memory without requiring
additional memory to do so.
This feature is disabled by default because it
has some implication on large (e.g. GB)
allocations in some configurations (e.g. small
memory blocks).
The state of the flag can be read in
/sys/module/memory_hotplug/parameters/memmap_on_memory.
Note that even when enabled, there are a few cases where
the feature is not effective.

memtest= [KNL,X86,ARM,PPC,RISCV] Enable memtest
Format: <integer>
default : 0 <disable>
Specifies the number of memtest passes to be
Expand Down
9 changes: 9 additions & 0 deletions Documentation/admin-guide/mm/memory-hotplug.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,15 @@ creates ZONE_MOVABLE as following.
Unfortunately, there is no information to show which memory block belongs
to ZONE_MOVABLE. This is TBD.

.. note::
Techniques that rely on long-term pinnings of memory (especially, RDMA and
vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that
memory can still get hot removed - be aware that pinning can fail even if
there is plenty of free memory in ZONE_MOVABLE. In addition, using
ZONE_MOVABLE might make page pinning more expensive, because pages have to be
migrated off that zone first.

.. _memory_hotplug_how_to_offline_memory:

How to offline memory
Expand Down
107 changes: 66 additions & 41 deletions Documentation/admin-guide/mm/userfaultfd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,68 +63,93 @@ the generic ioctl available.

The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
defines what memory types are supported by the ``userfaultfd`` and what
events, except page fault notifications, may be generated.

If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
set if the kernel supports registering ``userfaultfd`` ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
``MAP_SHARED``, ``memfd_create``, etc).

The userland application that wants to use ``userfaultfd`` with hugetlbfs
or shared memory need to set the corresponding flag in
``uffdio_api.features`` to enable those features.

If the userland desires to receive notifications for events other than
page faults, it has to verify that ``uffdio_api.features`` has appropriate
``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
detail below in `Non-cooperative userfaultfd`_ section.

Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
register a memory range in the ``userfaultfd`` by setting the
events, except page fault notifications, may be generated:

- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
other than page faults are supported. These events are described in more
detail below in the `Non-cooperative userfaultfd`_ section.

- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
registrations for hugetlbfs and shared memory (covering all shmem APIs,
i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
etc) virtual memory areas, respectively.

- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
areas.

The userland application should set the feature flags it intends to use
when invoking the ``UFFDIO_API`` ioctl, to request that those features be
enabled if supported.

Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
bitmask) to register a memory range in the ``userfaultfd`` by setting the
uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for
the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
pages). The ``UFFDIO_REGISTER`` ioctl will return the
the range. The ``UFFDIO_REGISTER`` ioctl will return the
``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked
mappings).
supported for all memory types (e.g. anonymous memory vs. shmem vs.
hugetlbfs), or all types of intercepted faults.

Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove
memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.

The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
atomically copies a page into the userfault registered range and wakes
up the blocked userfaults
(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
guaranteeing that nothing can see an half copied page since it'll
keep userfaulting until the copy has finished.
Resolving Userfaults
--------------------

There are three basic ways to resolve userfaults:

- ``UFFDIO_COPY`` atomically copies some existing page contents from
userspace.

- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.

- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.

These operations are atomic in the sense that they guarantee nothing can
see a half-populated page, since readers will keep userfaulting until the
operation has finished.

By default, these wake up userfaults blocked on the range in question.
They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
that waking will be done separately at some later time.

Which ioctl to choose depends on the kind of page fault, and what we'd
like to do to resolve it:

- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
resolved by either providing a new page (``UFFDIO_COPY``), or mapping
the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
the zero page for a missing fault. With userfaultfd, userspace can
decide what content to provide before the faulting thread continues.

- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
the page cache). Userspace has the option of modifying the page's
contents before resolving the fault. Once the contents are correct
(modified or not), userspace asks the kernel to map the page and let the
faulting thread continue with ``UFFDIO_CONTINUE``.

Notes:

- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then
you must provide some kind of page in your thread after reading from
the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``.
The normal behavior of the OS automatically providing a zero page on
an anonymous mmaping is not in place.
- You can tell which kind of fault occurred by examining
``pagefault.flags`` within the ``uffd_msg``, checking for the
``UFFD_PAGEFAULT_FLAG_*`` flags.

- None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate
ioctl struct including the range.

- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or
``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then
the first of any of those IOCTLs wakes up the faulting thread.
uffd. You can supply as many pages as you want with these IOCTLs.
Keep in mind that unless you used DONTWAKE then the first of any of
those IOCTLs wakes up the faulting thread.

- Be sure to test for all errors including
(``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
Expand Down
Loading

0 comments on commit f96271c

Please sign in to comment.