Kristof Provost [Tue, 22 Feb 2022 09:21:38 +0000 (10:21 +0100)]
ovpn: Introduce OpenVPN DCO support
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.
In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.
Kristof Provost [Fri, 24 Jun 2022 07:41:00 +0000 (09:41 +0200)]
pf: ensure mbufs are long enough before we copy out IP(v6) headers
This isn't likely to be an issue on real hardware (as Ethernet has a
minimal packet length of 64 bytes), but can cause panics with short
packets on if_epair.
Roger Pau Monné [Mon, 27 Jun 2022 13:51:28 +0000 (15:51 +0200)]
elfnote: place note in a PT_NOTE program header
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.
Output from readelf -Wl for and amd64 kernel after the change:
Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R 0x1
[Requesting program interpreter: /red/herring]
LOAD 0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R 0x200000
LOAD 0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
LOAD 0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R 0x200000
LOAD 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x200000
LOAD 0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW 0x200000
DYNAMIC 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x8
GNU_RELRO 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R 0x4
Section to Segment mapping:
Segment Sections...
[...]
10 .note.gnu.build-id .note.Xen
Reported by: cperciva Fixes: 1a9cdd373a6a ('xen: add PV/PVH kernel entry point') Fixes: 93ee134a24fa ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
Kirk McKusick [Tue, 28 Jun 2022 04:46:15 +0000 (21:46 -0700)]
Correctly update fs_dsize in growfs(8)
When growing a UFS/FFS filesystem, the size of the summary information
may expand into additional blocks. These blocks must be removed from
fs_dsize which records the number of blocks in the filesystem that can
be used to hold filesystem data.
While here also update the fs_old_dsize and fs_old_size fields for
compatibility with kernels that were compiled before the addition
of UFS2.
Reported by: Edward Tomasz Napiera
MFC after: 1 week
Kyle Evans [Tue, 28 Jun 2022 03:54:13 +0000 (22:54 -0500)]
date: attempt to more accurately describe year limitations with -v
The previous description was both incorrect and incomplete in its
description -- the 2038 limit doesn't apply on !i386 platforms, and
it didn't note that values above 100 are accepted and interpreted
differently. Further, it didn't note that absolute years are accepted.
Greg V [Mon, 27 Jun 2022 20:41:59 +0000 (14:41 -0600)]
devmatch: Properly ignore commented fields
Any field that starts with # is a commented out field (there as a place
holder only, the data in that place holder is completely ignored). The
previous code improperly detected this using strcmp. Instead, any field
whose names starts with '#' is ignored.
The function's goal is to compare old/new nhop/nexthop group for the route
and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
are sorted nhop-# ascending.
routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
by some their property to simplify reporting cost when changing
large nexthop groups.
Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.
Yuri [Mon, 27 Jun 2022 15:48:31 +0000 (09:48 -0600)]
smartpqi: Allocate DMA memory NOWAIT
We're not allowed to wait in this allocation path, so allocate the
memory NOWAIT instead of WAITOK. The code already copes with the
failures that may result, so no additional code is needed.
PR: 263008
Reviewed by: markj, Scott Benesh at Microsemi, imp
Differential Revision: https://reviews.freebsd.org/D35601
Ed Maste [Sun, 26 Jun 2022 17:23:39 +0000 (13:23 -0400)]
Fix cross-builds from macOS
The macOS linker does not support -zrelro/-znorelro. Since it is only
used to for build tools that run on the host, and WITH_RELRO or
WITHOUT_RELRO does not matter there, just skip the option.
Reviewed by: markj
Fixes: 2f3a961487c9 ("Add RELRO build knob, default to enabled")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35589
Bjoern A. Zeeb [Mon, 27 Jun 2022 01:23:24 +0000 (01:23 +0000)]
hwpmc: further fix build (__diagused/debug/missing files entries)
Fix builds after 1459a22787ea16e3798694067c8dcb20325dca4b and 59191f3573f6cb2ea055ac319cbcb68823ca8e17 by using __diagused
for variables only used in KASSERT().
In addition remove two debug lines that look like a copy and paste error
from dmc620 to cmn600.
Further add the newly introduced files to sys/confg/files.arm64 as well
so that LINT compiles without missing symbols.
Fix build after e3572eb654733a94e1e765fe9e95e0579981d851 as
struct pmc_md_dmc620_pmu_op_pmcallocate is needed when building
libpmc/pmclog.c as it is partof the public API via machine/pmc_mdep.h.
Alan Cox [Sun, 26 Jun 2022 16:48:12 +0000 (11:48 -0500)]
iommu_gas: Fix a recent regression with IOMMU_MF_CANSPLIT
As of 19bb5a7244ff, the IOMMU_MF_CANSPLIT case in iommu_gas_match_one()
must take into account the specified offset. Otherwise, the recently
changed end calculation in iommu_gas_match_insert() could produce an
end address that crosses the specified boundary by one page.
Bjoern A. Zeeb [Sun, 26 Jun 2022 19:17:04 +0000 (19:17 +0000)]
LinuxKPI: 802.11: remove an early bandaid to make sure queues are allocated
iwlwifi allocates queues on first wakeup. This takes a lot longer on
FreeBSD's work implementation that it seems to on Linux based on some
discussion. That meant that we couldn't get non-data frames out quickly
enough initially and failed to associate. d0d2911035192473e8bd3f6b99ed5ca9b1b29e47 should have solved most of this
for us with iwlwifi. None of the other drivers ported to LinuxKPI/802.11
up to today will call a dequeue so we get notified when the queus are
allocated or even need to do so.
Remove the bandaid initilly put in for iwlwifi now and speed up the
overall process of getting us associated.
Bjoern A. Zeeb [Sun, 26 Jun 2022 19:13:00 +0000 (19:13 +0000)]
LinuxKPI: 802.11: cleanup lsta better
This changes cleans up lsta from the VIF station list as well as
deals with freeing the lsta itself so it is not leaked.
lkpi_iv_update_bss() makes this more complicated than it should be
as we ties more sta state (incl. drv/fw) to the node that net80211
does not know about. There is more work to be done detangling this
now that is better understood.
Bjoern A. Zeeb [Sun, 26 Jun 2022 19:04:16 +0000 (19:04 +0000)]
LinuxKPI: 802.11: sync sta->addr in lkpi_iv_update_bss()
In lkpi_iv_update_bss() introduced in d9f59799fc3e7 we swap lsta and
along with that sta and drv state if ni gets reused and swapped under
us by net80211. What we did not do was to sync sta->addr which later
(usually in lkpi_sta_assoc_to_run) during a bss_info update cause
problems in drivers (or firmware) as the BSSID and the station address
were not aligned.
If this proves to hold up to fix iwlwifi issues seem on firmware
for older chipsets, multi-assoc runs, and rtw89 (which this fixes)
we should add asserts that lkpi_iv_update_bss() can only happen in
pre-auth stages and/or make sure we factor out synching more state
fields.
Move pytest wrapper to the collection of the other atf wrappers
in libexec. It solves the problem of combining bits & pieces from
bsd.test.mk and bgs.prog.mk to address "test binary, but not the
suite binary".
Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.
routing: fix crash when RTM_CHANGE results in no-op for the multipath
route.
Reporting logic assumed there is always some nhop change for every
successful modification operation. Explicitly check that the changed
nexthop indeed exists when reporting back to userland.
Implementation consists of the pytest plugin implementing ATF format and
a simple C++ wrapper, which reorders the provided arguments from ATF format
to the format understandable by pytest. Each test has this wrapper specified
after the shebang. When kyua executes the test, wrapper calls pytest, which
loads atf plugin, does the work and returns the result. Additionally, a
separate python "package", `/usr/tests/atf_python` has been added to collect
code that may be useful across different tests.
Current limitations:
* Opaque metadata passing via X-Name properties. Require some fixtures to write
* `-s srcdir` parameter passed by the runner is ignored.
* No `atf-c-api(3)` or similar - relying on pytest framework & existing python libraries
* No support for `atf_tc_<get|has>_config_var()` & `atf_tc_set_md_var()`.
Can be probably implemented with env variables & autoload fixtures
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
group, using check_info_match_nhop. The problem with the current code that it incorrectly
assumes that `check_info_match_nhop()` returns true value on match, while in reality it
returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.
busdma: Protect ARM busdma bounce page counters using the bounce page lock.
In bus_dmamap_unload() on ARM, the counters for free_bpages and reserved_bpages
appear to be vulnerable to unprotected read-modify-write operations that result
in accounting that looks like a page leak.
This was noticed on a 2GB quad core i.MX6 system that has more than one device
attached via FTDI based USB serial connection.
Submitted by: John Hein <jcfyecrayz@liamekaens.com>
Differential Revision: https://reviews.freebsd.org/D35553
PR: 264836
MFC after: 3 days
Sponsored by: NVIDIA Networking
Doug Moore [Sat, 25 Jun 2022 07:40:16 +0000 (02:40 -0500)]
rb_tree: optimize tree rotation
The RB_ROTATE macros begin with fetching a field via a pointer. In
most cases, that value is one that has already been pulled into a
register, and the compiler cannot infer that. So, to eliminate those
needless fetches, have the caller of the RB_ROTATE macros present the
data in the third macro parameter, rather than having the macro fetch
it.
Alan Cox [Wed, 22 Jun 2022 21:51:47 +0000 (16:51 -0500)]
busdma_iommu: Fine-grained locking for the dmamap's map list
Introduce fine-grained locking on the dmamap's list of map entries,
replacing the use of the domain lock. This is not the most significant
source of lock contention, but it is the easiest to address.
Rick Macklem [Fri, 24 Jun 2022 20:56:35 +0000 (13:56 -0700)]
nfscommon: Clean up the code by removing the vnode_vtype() macro
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code and, therefore,
use of the macro has been deleted by previous commits.
This commit deletes the, now unused, macro.
This commit should not result in a semantics change.
Rick Macklem [Fri, 24 Jun 2022 20:47:57 +0000 (13:47 -0700)]
nfscommon: Clean up the code by not using the vnode_vtype() macro
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.
This commit should not result in a semantics change.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:11 +0000 (09:09 -0700)]
tests/unix_passfd: compile SOCK_STREAM and SOCK_DGRAM versions
Most test pass identically on different kinds of sockets. However,
few edge cases work differently on stream and datagram sockets. We
want to exercise this and document.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:11 +0000 (09:09 -0700)]
libc/syslog: fully deprecate and don't try to open "/dev/log"
The "/dev/log" socket existed in pre-FreeBSD times. Later it was
substituted to a compatibility symlink. The symlink creation was
deprecated in FreeBSD 10.2 and 9-STABLE.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:11 +0000 (09:09 -0700)]
unix/dgram: smart socket buffers for one-to-many sockets
A one-to-many unix/dgram socket is a socket that has been bound
with bind(2) and can get multiple connections. A typical example
is /var/run/log bound by syslogd(8) and receiving multiple
connections from libc syslog(3) API. Until now all of these
connections shared the same receive socket buffer of the bound
socket. This made the socket vulnerable to overflow attack.
See 240d5a9b1ce for a historical attempt to workaround the problem.
This commit creates a per-connection socket buffer for every single
connected socket and eliminates the problem. The new behavior will
optimize seldom writers over frequent writers. See added test case
scenarios and code comments for more detailed description of the
new behavior.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:11 +0000 (09:09 -0700)]
unix/dgram: reduce mbuf chain traversals in send(2) and recv(2)
o Use m_pkthdr.memlen from m_uiotombuf()
o Modify unp_internalize() to keep track of allocated space and memory
as well as pointer to the last buffer.
o Modify unp_addsockcred() to keep track of allocated space and memory
as well as pointer to the last buffer.
o Record the datagram len/memlen/ctllen in the first (from) mbuf of the
chain in uipc_sosend_dgram() and reuse it in uipc_soreceive_dgram().
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:11 +0000 (09:09 -0700)]
m_uiotombuf: write total memory length of the allocated chain in pkthdr
Data allocated by m_uiotombuf() usually goes into a socket buffer.
We are interested in the length of useful data to be added to sb_acc,
as well as total memory used by mbufs. The later would be added to
sb_mbcnt. Calculating this value at allocation time allows to save
on extra traversal of the mbuf chain.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:10 +0000 (09:09 -0700)]
sockets: enable protocol specific socket buffers
Split struct sockbuf into common shared fields and protocol specific
union, where protocols are free to implement whatever buffer they
want. Such protocols should mark themselves with PR_SOCKBUF and are
expected to initialize their buffers in their pr_attach and tear
them down in pr_detach.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:10 +0000 (09:09 -0700)]
unix: provide an option to return locked from unp_connectat()
Use this new version in unix/dgram socket when sending to a target
address. This removes extra lock release/acquisition and possible
counter-intuitive ENOTCONN.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:10 +0000 (09:09 -0700)]
unix/dgram: add a specific receive method - uipc_soreceive_dgram
With this second step PF_UNIX/SOCK_DGRAM has protocol specific
implementation. This gives some possibility performance
optimizations. However, it still operates on the same struct
socket as all other sockets do.
Gleb Smirnoff [Fri, 24 Jun 2022 16:09:10 +0000 (09:09 -0700)]
unix/dgram: add a specific send method - uipc_sosend_dgram()
This is first step towards splitting classic BSD socket
implementation into separate classes. The first to be
split is PF_UNIX/SOCK_DGRAM as it has most differencies
to SOCK_STREAM sockets and to PF_INET sockets.
Historically a protocol shall provide two methods for sendmsg(2):
pru_sosend and pru_send. The former is a generic send method,
e.g. sosend_generic() which would internally call the latter,
uipc_send() in our case. There is one important exception, though,
the sendfile(2) code will call pru_send directly. But sendfile
doesn't work on SOCK_DGRAM, so we can do the trick. We will create
socket class specific uipc_sosend_dgram() which will carry only
important bits from sosend_generic() and uipc_send().
Rick Macklem [Thu, 23 Jun 2022 23:13:12 +0000 (16:13 -0700)]
nfscl: Clean up the code by not using the vnode_vtype() macro
The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.
This commit should not result in a semantics change.
Mitchell Horne [Thu, 23 Jun 2022 18:44:28 +0000 (15:44 -0300)]
subr_bus: restore bus_null_rescan()
Partially revert the previous change; we need to keep this method as a
specific override for pci_driver subclasses which should not use
pci_rescan_method() -- cardbus and ofw_pcibus. However, change the return
value to ENODEV for the same reasoning given in the original commit, and
use this as the default rescan method in bus_if.m.
Reported by: jhb
Fixes: 36a8572ee8f5 ("bus_if: provide a default null rescan method")
MFC with: 36a8572ee8f5
Vitaliy Gusev [Thu, 23 Jun 2022 18:46:06 +0000 (11:46 -0700)]
bhyve: Snapshot impovements for 'blockif' backend
When pausing a block I/O device model as part of suspending a VM, wait
for all active block I/O requests to finish before saving snapshot
data. This avoids having to save information about in-flight requests
both in the block_if layer and in storage device models.
For the AHCI device model, the queues are now guaranteed to be idle
when taking a snapshot, so remove the code to save queue state and
rely on the initial state in a resumed VM having all queues already
idle.
This will also simplify adding NVMe snapshot support in the future.