Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Browse files Browse the repository at this point in the history
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
  • Loading branch information
torvalds committed Aug 6, 2020
2 parents 8186749 + c1055b7 commit 47ec530
Show file tree
Hide file tree
Showing 2,088 changed files with 117,531 additions and 41,040 deletions.
36 changes: 36 additions & 0 deletions Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,42 @@ kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct
bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
beginning of section (``btf_ext_info_sec->sec_name_off``).

4.2 .BTF_ids section
====================

The .BTF_ids section encodes BTF ID values that are used within the kernel.

This section is created during the kernel compilation with the help of
macros defined in ``include/linux/btf_ids.h`` header file. Kernel code can
use them to create lists and sets (sorted lists) of BTF ID values.

The ``BTF_ID_LIST`` and ``BTF_ID`` macros define unsorted list of BTF ID values,
with following syntax::

BTF_ID_LIST(list)
BTF_ID(type1, name1)
BTF_ID(type2, name2)

resulting in following layout in .BTF_ids section::

__BTF_ID__type1__name1__1:
.zero 4
__BTF_ID__type2__name2__2:
.zero 4

The ``u32 list[];`` variable is defined to access the list.

The ``BTF_ID_UNUSED`` macro defines 4 zero bytes. It's used when we
want to define unused entry in BTF_ID_LIST, like::

BTF_ID_LIST(bpf_skb_output_btf_ids)
BTF_ID(struct, sk_buff)
BTF_ID_UNUSED
BTF_ID(struct, task_struct)

All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.

5. Using BTF
************

Expand Down
21 changes: 15 additions & 6 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ BPF Documentation
This directory contains documentation for the BPF (Berkeley Packet
Filter) facility, with a focus on the extended BPF version (eBPF).

This kernel side documentation is still work in progress. The main
This kernel side documentation is still work in progress. The main
textual documentation is (for historical reasons) described in
`Documentation/networking/filter.rst`_, which describe both classical
and extended BPF instruction-set.
:ref:`networking-filter`, which describe both classical and extended
BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.

Expand Down Expand Up @@ -48,6 +48,15 @@ Program types
bpf_lsm


Map types
=========

.. toctree::
:maxdepth: 1

map_cgroup_storage


Testing and debugging BPF
=========================

Expand All @@ -67,7 +76,7 @@ Other
ringbuf

.. Links:
.. _Documentation/networking/filter.rst: ../networking/filter.txt
.. _networking-filter: ../networking/filter.rst
.. _man-pages: https://www.kernel.org/doc/man-pages/
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/
169 changes: 169 additions & 0 deletions Documentation/bpf/map_cgroup_storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
.. SPDX-License-Identifier: GPL-2.0-only
.. Copyright (C) 2020 Google LLC.
===========================
BPF_MAP_TYPE_CGROUP_STORAGE
===========================

The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
attach to cgroups; the programs are made available by the same Kconfig. The
storage is identified by the cgroup the program is attached to.

The map provide a local storage at the cgroup that the BPF program is attached
to. It provides a faster and simpler access than the general purpose hash
table, which performs a hash table lookups, and requires user to track live
cgroups on their own.

This document describes the usage and semantics of the
``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
Linux 5.9 and this document will describe the differences.

Usage
=====

The map uses key of type of either ``__u64 cgroup_inode_id`` or
``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::

struct bpf_cgroup_storage_key {
__u64 cgroup_inode_id;
__u32 attach_type;
};

``cgroup_inode_id`` is the inode id of the cgroup directory.
``attach_type`` is the the program's attach type.

Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
When this key type is used, then all attach types of the particular cgroup and
map will share the same storage. Otherwise, if the type is
``struct bpf_cgroup_storage_key``, then programs of different attach types
be isolated and see different storages.

To access the storage in a program, use ``bpf_get_local_storage``::

void *bpf_get_local_storage(void *map, u64 flags)

``flags`` is reserved for future use and must be 0.

There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
can be accessed by multiple programs across different CPUs, and user should
take care of synchronization by themselves. The bpf infrastructure provides
``struct bpf_spin_lock`` to synchronize the storage. See
``tools/testing/selftests/bpf/progs/test_spin_lock.c``.

Examples
========

Usage with key type as ``struct bpf_cgroup_storage_key``::

#include <bpf/bpf.h>

struct {
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
__type(key, struct bpf_cgroup_storage_key);
__type(value, __u32);
} cgroup_storage SEC(".maps");

int program(struct __sk_buff *skb)
{
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
__sync_fetch_and_add(ptr, 1);

return 0;
}

Userspace accessing map declared above::

#include <linux/bpf.h>
#include <linux/libbpf.h>

__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
{
struct bpf_cgroup_storage_key = {
.cgroup_inode_id = cgrp,
.attach_type = type,
};
__u32 value;
bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
// error checking omitted
return value;
}

Alternatively, using just ``__u64 cgroup_inode_id`` as key type::

#include <bpf/bpf.h>

struct {
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
__type(key, __u64);
__type(value, __u32);
} cgroup_storage SEC(".maps");

int program(struct __sk_buff *skb)
{
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
__sync_fetch_and_add(ptr, 1);

return 0;
}

And userspace::

#include <linux/bpf.h>
#include <linux/libbpf.h>

__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
{
__u32 value;
bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
// error checking omitted
return value;
}

Semantics
=========

``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
per-CPU variant will have different memory regions for each CPU for each
storage. The non-per-CPU will have the same memory region for each storage.

Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
that uses the map. A program may be attached to multiple cgroups or have
multiple attach types, and each attach creates a fresh zeroed storage. The
storage is freed upon detach.

There is a one-to-one association between the map of each type (per-CPU and
non-per-CPU) and the BPF program during load verification time. As a result,
each map can only be used by one BPF program and each BPF program can only use
one storage map of each type. Because of map can only be used by one BPF
program, sharing of this cgroup's storage with other BPF programs were
impossible.

Since Linux 5.9, storage can be shared by multiple programs. When a program is
attached to a cgroup, the kernel would create a new storage only if the map
does not already contain an entry for the cgroup and attach type pair, or else
the old storage is reused for the new attachment. If the map is attach type
shared, then attach type is simply ignored during comparison. Storage is freed
only when either the map or the cgroup attached to is being freed. Detaching
will not directly free the storage, but it may cause the reference to the map
to reach zero and indirectly freeing all storage in the map.

The map is not associated with any BPF program, thus making sharing possible.
However, the BPF program can still only associate with one map of each type
(per-CPU and non-per-CPU). A BPF program cannot use more than one
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.

In all versions, userspace may use the the attach parameters of cgroup and
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
APIs to read or update the storage for a given attachment. For Linux 5.9
attach type shared storages, only the first value in the struct, cgroup inode
id, is used during comparison, so userspace may just specify a ``__u64``
directly.

The storage is bound at attach time. Even if the program is attached to parent
and triggers in child, the storage still belongs to the parent.

Userspace cannot create a new entry in the map or delete an existing entry.
Program test runs always use a temporary storage.
2 changes: 1 addition & 1 deletion Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ such as network interfaces, crypto accelerator instances, L2 switches,
etc.

For an overview of the DPAA2 architecture and fsl-mc bus see:
Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
Documentation/networking/device_drivers/ethernet/freescale/dpaa2/overview.rst

As described in the above overview, all DPAA2 objects in a DPRC share the
same hardware "isolation context" and a 10-bit value called an ICID
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ select:
- amlogic,meson8m2-dwmac
- amlogic,meson-gxbb-dwmac
- amlogic,meson-axg-dwmac
- amlogic,meson-g12a-dwmac
required:
- compatible

Expand All @@ -36,6 +37,7 @@ allOf:
- amlogic,meson8m2-dwmac
- amlogic,meson-gxbb-dwmac
- amlogic,meson-axg-dwmac
- amlogic,meson-g12a-dwmac

then:
properties:
Expand Down Expand Up @@ -95,6 +97,7 @@ properties:
- amlogic,meson8m2-dwmac
- amlogic,meson-gxbb-dwmac
- amlogic,meson-axg-dwmac
- amlogic,meson-g12a-dwmac
contains:
enum:
- snps,dwmac-3.70a
Expand Down
Loading

0 comments on commit 47ec530

Please sign in to comment.