Dawid Gorecki [Fri, 10 Jun 2022 09:18:10 +0000 (11:18 +0200)]
ena: Use device_set_desc in probe
During probe the driver created a temporary buffer to which the value of
DEVICE_DESC constant was printed. This buffer was then copied to the
device structure using device_set_desc_copy. Since the value of this
string is exactly the same for every device using the ENA driver, using
sprintf is unnecessary, and device_set_desc can be used instead.
Dawid Gorecki [Fri, 10 Jun 2022 09:18:10 +0000 (11:18 +0200)]
ena: Move ena_copy_eni_metrics into separate task
Copying ENI metrics was done in callout context, this caused the driver
to panic when sample_interval was set to a value other than 0, as the
admin queue call which was executed could sleep while waiting on
a condition variable. Taskqueue, unlike callout, allows for sleeping, so
moving the function to a separate taskqueue fixes the problem.
ena_timer_service is still responsible for scheduling the taskqueue.
Stop draining the callout during ena_up/ena_down. This was done to
prevent a race between ena_up/down and ena_copy_eni_metrics admin queue
calls. Since ena_metrics_task is protected by ENA_LOCK there is no
possibility of a race between ena_up/down and ena_metrics_task.
Remove a comment about locking in ena_timer_service. With ENI metrics
in a separate task this comment became obsolete.
Dawid Gorecki [Fri, 10 Jun 2022 09:18:09 +0000 (11:18 +0200)]
ena: Store ticks of last Tx cleanup
Store timestamp of last cleanup in Tx ring structure. This does not
change anything during normal operation of the driver but could be
useful when the device fails for some reason.
Dawid Gorecki [Fri, 10 Jun 2022 09:18:08 +0000 (11:18 +0200)]
ena: Prevent LLQ initialization when membar isn't exposed
The ena_com_config_dev_mode() function performs many LLQ related
calculations and sends an admin command to configure LLQ in the device.
All the LLQ related operations are unnecessary if the driver fails to
find LLQ memory bar.
Move LLQ memory bar allocation to separate helper function
ena_map_llq_mem_bar and execute this function before LLQ configuration.
If the LLQ memory bar cannot be allocated, then LLQ configuration is
skipped.
Dawid Gorecki [Fri, 10 Jun 2022 09:17:53 +0000 (11:17 +0200)]
ena: Move reset completion logging to the reset function
While ena_restore_device is called from the reset task, it can also be
called from other locations in the driver, for example in netmap
specific code. Move the reset completion logging to reset task, so it
better represents when the reset actually happened.
Mark Johnston [Thu, 30 Jun 2022 14:19:23 +0000 (10:19 -0400)]
pf: Make sure that pfi_update_status() always zeros counters
pfi_update_status() can return early if the status interface doesn't
exist. But in this case pf_getstatus() was copying uninitialized stack
memory into the output nvlist.
Reported by: Jenkins (KMSAN job)
Reviewed by: kp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35659
sysctl.conf.5: Document rc.d/sysctl and rc.d/sysctl_lastload
Also, update the BUGS section. The example describes an issue, which is
not true anymore thanks to sysctl_lastload. Point readers to rcorder(8)
instead.
Kristof Provost [Thu, 30 Jun 2022 11:34:53 +0000 (13:34 +0200)]
dummynet: handle IPV6 layer 2 traffic
When pf sends layer 2 traffic into dummynet it still marks IPv6 with
IPFW_ARGS_IPV6 (which dummynet translates to PROTO_V6). That in turn
results in it not matching the 'DIR_IN | PROTO_LAYER2' case, and
triggering the 'bad switch' error message.
Roger Pau Monné [Tue, 28 Jun 2022 15:37:00 +0000 (17:37 +0200)]
x86/xen: stop assuming kernel memory loading order in PVH
Do not assume that start_info will always be loaded at the highest
memory address, and instead check the position of all the loaded
elements in order to find the last loaded one, and thus a likely safe
place to use as early boot allocation memory space.
Jamie Gritton [Wed, 29 Jun 2022 17:47:39 +0000 (10:47 -0700)]
jail: Remove a prison's shared memory when it dies
Add shm_remove_prison(), that removes all POSIX shared memory segments
belonging to a prison. Call it from prison_cleanup() so a prison
won't be stuck in a dying state due to the resources still held.
Jamie Gritton [Wed, 29 Jun 2022 17:33:05 +0000 (10:33 -0700)]
jail: add prison_cleanup() to release resources held by a dying jail
Currently, when a jail starts dying, either by losing its last user
reference or by being explicitly killed,
osd_jail_call(...PR_METHOD_REMOVE...) is called. Encapsulate this
into a function prison_cleanup() that can then do other cleanup.
Andrew Turner [Tue, 28 Jun 2022 11:44:49 +0000 (11:44 +0000)]
Decode the arm64 SVE ID register
The field values are only valid when the ID_AA64PFR0_EL1.SVE or
ID_AA64PFR1_EL1.SME vields are non-zero. When this is not the case
the register is reserved as zero so is safe to read, but the SVEver
field will be incorrect so only print the decoded register when
the SVE or SME fields indicate it is valid.
Andrew Turner [Wed, 29 Jun 2022 16:34:41 +0000 (17:34 +0100)]
Allow use of the arm64 unnamed register form
On arm64 all registers have a name that encodes op0, op1, CRn, CRm, and
op2 that are used to encode the register in the instruction. As some
registers we need to access may not be supportedby older compilers, or
are only supported when specific extensions are enabled support this
alternative form.
Gleb Smirnoff [Wed, 29 Jun 2022 16:42:58 +0000 (09:42 -0700)]
unix: change error code for recvmsg() failed due to RLIMIT_NOFILE
Instead of returning EMSGSIZE pass the error code from fdallocn() directly
to userland. That would be EMFILE, which makes much more sense. This
error code is not listed in the specification[1], but the specification
doesn't cover such edge case at all. Meanwhile the specification lists
EMSGSIZE as the error code for invalid value of msg_iovlen, and FreeBSD
follows that, see sys_recmsg(). Differentiating these two cases will make
a developer/admin life much easier when debugging.
Wojciech Macek [Wed, 29 Jun 2022 08:48:01 +0000 (10:48 +0200)]
mac_veriexec: Authorize reads of secured sysctls
Writes to sysctls flagged with CTLFLAG_SECURE are blocked if the appropriate secure level is set. mac_veriexec does not behave this way, it blocks such sysctls in read-only mode as well.
This change aims to make mac_veriexec behave like secure levels, as it was meant by the original commit ed377cf41.
Bjoern A. Zeeb [Thu, 23 Jun 2022 00:17:14 +0000 (00:17 +0000)]
ACPI: change arguments to internal acpi_find_dsd()
acpi_find_dsd() is not a bus function and we only need the acpi_device (ad).
The only caller has already looked up the ad (from ivars) for us.
Directly pass the ad to acpi_find_dsd() instead of bus, dev and remove
the extra call to device_get_ivars(); the changed argument also means we
now call AcpiEvaluateObject directly on the handle.
This optimisation was done a while ago while debugging a driver which
ended up with a bad bus, dev combination making the old version fail.
testing: pass ATF vars to pytest via env instead of arguments.
This change is a continuation of 9c42645a1e4d workaround.
Apparently pytest argument parser is not happy when parsing values
with spaces or just more than one --atf-var argument.
Switch wrapper to send these kv pairs as env variables. Specifically,
use _ATF_VAR_key=value format to distinguish from the other vars.
Add the `atf_vars` fixture returning all passed kv pairs as a dict.
Kristof Provost [Tue, 22 Feb 2022 09:21:38 +0000 (10:21 +0100)]
ovpn: Introduce OpenVPN DCO support
OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.
In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.
Kristof Provost [Fri, 24 Jun 2022 07:41:00 +0000 (09:41 +0200)]
pf: ensure mbufs are long enough before we copy out IP(v6) headers
This isn't likely to be an issue on real hardware (as Ethernet has a
minimal packet length of 64 bytes), but can cause panics with short
packets on if_epair.
Roger Pau Monné [Mon, 27 Jun 2022 13:51:28 +0000 (15:51 +0200)]
elfnote: place note in a PT_NOTE program header
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.
Output from readelf -Wl for and amd64 kernel after the change:
Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R 0x1
[Requesting program interpreter: /red/herring]
LOAD 0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R 0x200000
LOAD 0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
LOAD 0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R 0x200000
LOAD 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x200000
LOAD 0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW 0x200000
DYNAMIC 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW 0x8
GNU_RELRO 0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R 0x4
Section to Segment mapping:
Segment Sections...
[...]
10 .note.gnu.build-id .note.Xen
Reported by: cperciva Fixes: 1a9cdd373a6a ('xen: add PV/PVH kernel entry point') Fixes: 93ee134a24fa ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
Kirk McKusick [Tue, 28 Jun 2022 04:46:15 +0000 (21:46 -0700)]
Correctly update fs_dsize in growfs(8)
When growing a UFS/FFS filesystem, the size of the summary information
may expand into additional blocks. These blocks must be removed from
fs_dsize which records the number of blocks in the filesystem that can
be used to hold filesystem data.
While here also update the fs_old_dsize and fs_old_size fields for
compatibility with kernels that were compiled before the addition
of UFS2.
Reported by: Edward Tomasz Napiera
MFC after: 1 week
Kyle Evans [Tue, 28 Jun 2022 03:54:13 +0000 (22:54 -0500)]
date: attempt to more accurately describe year limitations with -v
The previous description was both incorrect and incomplete in its
description -- the 2038 limit doesn't apply on !i386 platforms, and
it didn't note that values above 100 are accepted and interpreted
differently. Further, it didn't note that absolute years are accepted.
Greg V [Mon, 27 Jun 2022 20:41:59 +0000 (14:41 -0600)]
devmatch: Properly ignore commented fields
Any field that starts with # is a commented out field (there as a place
holder only, the data in that place holder is completely ignored). The
previous code improperly detected this using strcmp. Instead, any field
whose names starts with '#' is ignored.
The function's goal is to compare old/new nhop/nexthop group for the route
and decompose it into the series of RTM_ADD/RTM_DELETE single-nhop
events, calling specified callback for each event.
Simplify it by properly leveraging the fact that both old/new groups
are sorted nhop-# ascending.
routing: actually sort nexthops in nhgs by their index
Nexthops in the nexthop groups needs to be deterministically sorted
by some their property to simplify reporting cost when changing
large nexthop groups.
Fix reporting by actually sorting next hops by their indices (`wn_cmp_idx()`).
As calc_min_mpath_slots_fast() has an assumption that next hops are sorted
using their relative weight in the nexthop groups, it needs to be
addressed as well. The latter sorting is required to quickly determine the
layout of the next hops in the actual forwarding group. For example,
what's the best way to split the traffic between nhops with weights
19,31 and 47 if the maximum nexthop group width is 64?
It is worth mentioning that such sorting is only required during nexthop
group creation and is not used elsewhere. Lastly, normally all nexthop
are of the same weight. With that in mind, (a) use spare 32 bytes inside
`struct weightened_nexthop` to avoid another memory allocation and
(b) use insertion sort to sort the nexthop weights.
Yuri [Mon, 27 Jun 2022 15:48:31 +0000 (09:48 -0600)]
smartpqi: Allocate DMA memory NOWAIT
We're not allowed to wait in this allocation path, so allocate the
memory NOWAIT instead of WAITOK. The code already copes with the
failures that may result, so no additional code is needed.
PR: 263008
Reviewed by: markj, Scott Benesh at Microsemi, imp
Differential Revision: https://reviews.freebsd.org/D35601