Michael Osipov [Fri, 24 Nov 2023 09:26:41 +0000 (10:26 +0100)]
periodic: Make security diff(1) output as small is possible
Make, by default, security diff(1) produce a unified output with a context of
zero (0) lines. This reduces output of unrelated lines in e-mails delivered
to root.
Xin LI [Sat, 27 Jan 2024 03:09:39 +0000 (19:09 -0800)]
releng-gce: Advertise the availability of UEFI support in GCE images.
The amd64 and arm64 images supported UEFI, mark it as so users can take
advantage of UEFI boot on GCE. This is already done on FreeBSD
14.0-RELEASE but never codified into the release tools (and should).
Aaron LI [Mon, 22 Jan 2024 16:18:56 +0000 (10:18 -0600)]
wg: detach bpf upon destroy as well
bpfattach() is called in wg_clone_create(), but the bpfdetach() is
missing from wg_close_destroy(). Add the missing bpfdetach() to avoid
leaking both the associated bpf bits as well as the ifnet that bpf will
hold a reference to.
Aaron LI [Wed, 17 Jan 2024 23:29:23 +0000 (23:29 +0000)]
if_wg: fix access to noise_local->l_has_identity and l_private
These members are protected by the identity lock, so rlock it in
noise_remote_alloc() and then assert that we have it held to some extent
in noise_precompute_ss().
Aaron LI [Wed, 17 Jan 2024 23:29:23 +0000 (23:29 +0000)]
if_wg: fix erroneous calculation in calculate_padding() for p_mtu == 0
In practice this is harmless; only keepalive packets may realistically have
p_mtu == 0, and they'll also have no payload so the math works out the same
either way. Still, let's prefer technical accuracy and calculate the amount
of padding needed rather than the padded length...
Mark Johnston [Mon, 15 Jan 2024 17:29:02 +0000 (12:29 -0500)]
condvar: Fix a user-after-free in _cv_wait() when ktrace is enabled
When a thread wakes up after sleeping on a CV, it must not dereference
the CV structure, as it may already have been freed. At least ZFS
relies on this invariant, see commit c636f94bd2ff15be5b904939872b4bce31456c18 for example.
Thus, when logging context-switch events, copy the wmesg into a stack
buffer while it is still safe to do so, and log that after waking up.
While here, move the initial ktrcsw() call later, after assertions and
the SCHEDULER_STOPPED_TD() condition are checked.
Mark Johnston [Mon, 15 Jan 2024 17:27:11 +0000 (12:27 -0500)]
condvar: Clean up condvar.h a bit
- Remove a typedef that has been unused for a long time.
- Remove a LOCORE guard. MI headers like condvar.h don't need such a
guard in general.
- Move a forward declaration into the _KERNEL block.
- Add a types.h include to make the file self-contained.
Cy Schubert [Sat, 20 Jan 2024 13:52:35 +0000 (05:52 -0800)]
rc.d/kdc: Support start of MIT krb5kdc
Some users wishing to use the MIT krb5kdc have discovered the
kdc script workaround applied to the MIT krb5 ports is insufficient.
Let's build into this rc script the smarts to determine whether
base or ports Hiemdal kdc is being invoked or the MIT krb5kdc.
While at it, remove kdc_start_precmd(). This will simplify a future
jail patch.
Chuck Tuffli [Thu, 12 Oct 2023 22:04:17 +0000 (15:04 -0700)]
bhyve nvme: Add NQN value
Add a NVMe Qualified Name (NQN) to the Controller Data structure using
the "first format" (i.e., "... used by any organization that owns a
domain name" Section 7.9 NVM-Express 1.4c 2021.06.28 Ratified).
This avoids a Linux kernel warning about a missing or invalid NQN.
Olivier Certner [Thu, 4 Jan 2024 17:45:52 +0000 (18:45 +0100)]
pthread_attr_get_np(): Use malloc(), report ENOMEM, don't tamper on error
Similarly as in the previous commit, using calloc() instead of malloc()
is useless here in the regular case since the subsequent call to
cpuset_getaffinify() is going to completely fill the allocated memory.
However, there is an additional complication. This function tries to
allocate memory to hold the cpuset if it previously wasn't, and does so
before the thread lock is acquired, which can fail on a bad thread ID.
In this case, it is necessary to deallocate the memory allocated in this
function so that the attributes object appears unmodified to the caller
when an error is returned. Without this, a subsequent call to
pthread_attr_getaffinity_np() would expose uninitialized memory (not
a security problem per se, since it comes from the same process) instead
of returning a full mask as it would before the failing call to
pthread_attr_get_np(). So the caller would be able to notice a change
in the state of the attributes object even if pthread_attr_get_np()
reported failure, which would be quite surprising. A similar problem
that could occur on failure of cpuset_setaffinity() has been fixed.
Finally, we shall always report memory allocation failure. This already
goes for pthread_attr_init(), so, if for nothing else, just be
consistent.
Reviewed by: emaste, kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43329
Olivier Certner [Thu, 4 Jan 2024 15:10:40 +0000 (16:10 +0100)]
libthr: thr_attr.c: More style and clarity fixes
The change of argument for sizeof() (from a type to an object) is to be
consistent with the change done for the malloc() code just above in the
preceding commit touching this file.
Consider bit flags as integers and test whether they are set with an
explicit comparison with 0.
Use an explicit flag value (PTHREAD_SCOPE_SYSTEM) in place of a variable
that has this value at point of substitution.
Mark Johnston [Wed, 2 Nov 2022 17:08:07 +0000 (13:08 -0400)]
inpcb: Allow SO_REUSEPORT_LB to be used in jails
Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process. It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.
This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups
This is a straightforward extension of the semantics used for individual
listening sockets. One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.
Mark Johnston [Wed, 2 Nov 2022 17:03:41 +0000 (13:03 -0400)]
inpcb: Remove NULL checks of credential references
Some auditing of the code shows that "cred" is never non-NULL in these
functions, either because all callers pass a non-NULL reference or
because they unconditionally dereference "cred". So, let's simplify the
code a bit and remove NULL checks. No functional change intended.
Kyle Evans [Fri, 12 Jan 2024 19:57:53 +0000 (13:57 -0600)]
bhyveload(8): document some SECURITY CONSIDERATIONS
The situation is improved now that we're running in a sandbox, but there
is still some host machine access that could be concerning depending on
the context. These concerns may be somewhat mitigated by the fact that
the host machine usually provides the loader binary, even when the guest
image is providing the loader scripts -- they only bring the lua
scripts, and they have to be able to execute arbitrary syscalls rather
than the interfaces provided by libsa(3).
Kyle Evans [Tue, 9 Jan 2024 03:08:16 +0000 (21:08 -0600)]
bhyveload: add CAP_SEEK to our dirfd rights
In the case of hostbase_fd, this is infact a bug fix; we have a seek
callback that the host: filesystem may use in loader, and we really
don't have a good excuse to break it.
bootfd-derived fds will only be used with fdlopen(3) and rtld doesn't
seem to need pread / lseek at all for it today, but there's no reason to
break if it finds a good reason to later.
Kyle Evans [Mon, 8 Jan 2024 17:49:40 +0000 (11:49 -0600)]
bhyveload: make error printing consistent
Previously we used a mix of perror(3) + exit(3) and err(3); standardize
on the latter instead. This does remove one free() in an error path,
because we're decidedly leaking a lot more than just the loader name
there (loader handle, vcpu, vmctx...) anyways.
Kyle Evans [Fri, 5 Jan 2024 06:21:15 +0000 (00:21 -0600)]
bhyveload: support guest rebooting from the loader
userboot has a EXIT_REBOOT code that it uses when the 'reboot' loader
command is executed. Use that and longjmp back to reinit the VM
entirely with a reboot request. This fixes the 'reboot' option in the
loader menu to actually reboot rather than shutdown the VM.
The JMP_* constants are introduced to keep track of why we're doing a
longjmp, though they aren't currently used. We'll notably still do a
complete reload of the interpreter to give the rebooted VM that new
loader smell. It just seemed forward thinking to just keep track of the
different setjmp points.
While we're here, we don't actually need to keep the fd we passed to
fdlopen(3), so let's avoid leaking it.
Kyle Evans [Fri, 5 Jan 2024 06:21:14 +0000 (00:21 -0600)]
bhyveload: limit rights on the dirfds we create
In neither case do we need write access to the directories we're working
with; userboot doesn't support fo_write on the host device, and the
bootfd is only ever needed for loader loading.
This improves on 8bf0882e18 ("bhyveload: enter capability mode [...]")
so that arbitrary code in the loader can't open writable fds to either
of the directories we need to maintain access to.
Kyle Evans [Wed, 3 Jan 2024 22:17:59 +0000 (16:17 -0600)]
bhyveload: hold /boot and do relative lookups for the loader
The next change will push bhyveload into capability mode right after we
allocate vcpu state, before we've setup or entered the loader, to limit
the surface area that a rogue loader script can touch.
With an explicit -l loader, we don't need to preopen /boot because
changing interpreters isn't allowed. We'll just dlopen() entirely in
advance in that case to eliminate some complexity.
Mike Karels [Mon, 15 Jan 2024 21:14:54 +0000 (15:14 -0600)]
route: error on IPv4 network routes with incorrect destination
Route destinations like 10/8 are most likely intended as a shorthand
for 10.0.0.0/8, but instead it means 0.0.0.10/8, which includes
only bits in the host part of the mask, and hence adds a route to
0.0.0.0/8. In 12.x, there was code to "do what I mean", which was
removed as part of a cleanup of old network class remnants. Given
that we have gone this long without that code, do not restore that
behavior. Instead, detect the issue and produce an error.
Specifically, if there are no dots in a numeric IPv4 address, the
mask is specified with CIDR notation (using a slash), and there are
bits set in the host part, produce an error like this for 10/8:
route: malformed address, bits set after mask; 10 means 0.0.0.10
vm.subr's default vm_extra_pre_umount removes /qemu and
/etc/resolv.conf. When vm_extra_pre_umount is overridden these steps
need to be performed in the cloud-specific conf file.
PR: 271602
Reviewed by: dch, lwhsu
Event: Kitchener-Waterloo Hackathon 202305
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40257
Lexi Winter [Wed, 27 Dec 2023 17:30:31 +0000 (17:30 +0000)]
nfsstat: update option strings in docs
Add the missing -q option to the nfsstat(1) manpage SYNOPSIS (it is
already documented in DESCRIPTION), and add the missing -E and -q
options to the built-in usage output.
Alan Somers [Mon, 9 Oct 2023 18:26:25 +0000 (12:26 -0600)]
Fix multiple bugs with ctld's UCL parsing
* Don't segfault when parsing a misformatted auth-group section
* If the config file specifies a chap section within a target but no
auth-group, create a new anonymous auth-group. That matches the
behavior with non-UCL config files.
* Protect some potential segfaults with assertions
Alan Somers [Wed, 15 Nov 2023 17:56:05 +0000 (10:56 -0700)]
Remove _POSIX_PRIORITIZED_IO references from man pages
We don't support it, so there's no need to tell readers what would
happen if we did. Also, don't remind the user that a certain field is
ignored by aio_read. Mentioning every ignored field would make the man
pages too verbose.
Alan Somers [Wed, 12 Jul 2023 20:46:27 +0000 (14:46 -0600)]
zfsd: fault disks that generate too many I/O delay events
If ZFS reports that a disk had at least 8 I/O operations over 60s that
were each delayed by at least 30s (implying a queue depth > 4 or I/O
aggregation, obviously), fault that disk. Disks that respond this
slowly can degrade the entire system's performance.
Alexander Motin [Wed, 27 Dec 2023 00:36:34 +0000 (19:36 -0500)]
iichid(4): Switch taskqueue to "fast"
While "fast" taskqueue may be more expensive due to spinlock use,
when used mainly for timeout tasks it allows to avoid extra context
switches to and from callout thread, that is even more expensive.
Alexander Motin [Wed, 27 Dec 2023 00:28:56 +0000 (19:28 -0500)]
iichid(4): Unify two taskqueue tasks
taskqueue_enqueue_timeout(0) is equivalent to taskqueue_enqueue(),
so no need to create a separate periodic_task and event_task to run
exactly the same handler.
Alexander Motin [Sun, 24 Dec 2023 00:10:49 +0000 (19:10 -0500)]
iichid(4): Restore/increase sampling rate
My previous commit by reducing precision reduced the sampling rate
from 60Hz to 40Hz on idle system. Return it back to 60-80Hz range,
that should be good for mouse smoothness on 60Hz displays.
Alexander Motin [Sat, 23 Dec 2023 03:50:52 +0000 (22:50 -0500)]
iichid(4): Improve idle sampling hysteresis
In sampling mode some devices return same data indefinitely even if
there is nothing to report. Previous idle hysteresis implementation
activated only when device returned no data, so some devices ended up
polled at fast rate all the time. This new implementation compares
each new report with the previous, and, if they are identical, after
reaching threshold also drop sampling rate to slow.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
reduces idle power consumption by ~0.5W by reducing number of context
switches in the driver from ~4000 to ~700 per second when not touched.
Alexander Motin [Sun, 24 Dec 2023 23:18:11 +0000 (18:18 -0500)]
ig4: Actively use FIFO thresholds
Before every wait for FIFO interrupt set how much data/space do we
want to see there. Previous code was not using it for receive, as
result aggregating interrupts only within processing latency. The
new code needs only one interrupt per transfer per FIFO length.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
reduces the interrupt rate per device down to 2 per sample or 16-20
per second when idle and 120-160 per second when actively touched.
Alexander Motin [Sun, 24 Dec 2023 00:02:49 +0000 (19:02 -0500)]
ig4: Fix FIFO depths detection
At least on my Tiger Lake-LP queue depth detection failed before the
ig4iic_set_config() call, resulting in no FIFO use. Moving it after
solves the problem, getting proper 64 bytes size.
On my Dell XPS 13 9310 with iichid(4) touchscreen and touchpad this
by few times reduces context switch rate in the driver, and probably
also improves the I2C bus utilization.
Joerg Pulz [Fri, 27 Oct 2023 15:27:37 +0000 (17:27 +0200)]
isp(4): Rework firmware handling/loading
Correctly identify the active firmware in flash on adapters with
primary and secondary firmware region in flash.
Correctly identify the active NVRAM on adapters with primary
and secondary NVRAM region in flash.
Loading ispfw(4) moved from isp_pci_attach() to isp_reset().
Drop the reference to ispfw(4) after using it so one can kldunload(8) it.
New isp_load_ram() function to load either ispfw(4) or flash firmware
into RISC's RAM.
New functions to read data from flash. The old ones will be removed later.
A bunch of new helper functions to identify and validate active flash
regions for firmware, auxiliary and NVRAM.
Overhaul ISP_FW_* macros and make use of it when comparing firmware
versions. We can handle firmware versions up to 255.255.255.
Firmware load priority slightly changed:
For 27xx and newer adapters:
- load ispfw(4) firmware
- request (active) flash firmware information
- compare version numbers of ispfw(4) and flash firmware
- load firmware with highest version into RISC's RAM
- if loading ispfw(4) is disabled or failed - load firmware from flash
- if everything else fails use MBOX_LOAD_FLASH_FIRMWARE as fallback
For 26xx and older adapters nothing changed:
- load ispfw(4) firmware and load it into RISC's RAM
- if loading ispfw(4) is disabled or failed use MBOX_EXEC_FIRMWARE
- for 26xx a preceding MBOX_LOAD_FLASH_FIRMWARE is used
New read only sysctl(8)'s:
dev.isp.N.fw_version_run: the firmware version actually running
dev.isp.N.fw_version_ispfw: the firmware version provided by ispfw(4)
dev.isp.N.fw_version_flash: the (active) firmware version in flash
While here:
- firmware attribute handling/parsing reworked
+ renamed defines from ISP2400_FW_ATTR_* to ISP_FW_ATTR_*
+ changed values to match new handling/parsing
+ added some more attributes
- enable FLT support on 26xx based adapters
- log level adjustments
- new function return status codes (some for now, some for later use)
- some minor style changes
Tested and approved to work on real hardware with:
- Qlogic ISP 2532 (QLogic QLE2560 8Gb FC Adapter)
- Qlogic ISP 2031 (QLogic QLE2662 16Gbit 2Port FC Adapter)
- Qlogic ISP 2722 (QLogic QLE2690 16Gb FC Adapter)
- Qlogic ISP 2812 (QLogic QLE2772 32Gbit 2Port FC Adapter)
PR: 273263
Reviewed by: mav
Pull Request: https://github.com/freebsd/freebsd-src/pull/877
MFC after: 1 month
Sponsored by: Technical University of Munich
The ISP26xx based HBAs are left as is for now with static NVRAM addressing.
Those HBAs are known as 83xx (2031 and 8031 for real) and need special handling.
This is left for further investigation for now.
Cosmetics:
- rename functions and defines as they are no longer specific to 28xx
- set reasonable log levels
- sort FLT and NVRAM functions (in the order they are used)
Tested and approved to work on real hardware with:
- Qlogic ISP 2532 (QLogic QLE2562 8Gb 2Port FC Adapter)
- Qlogic ISP 2722 (QLogic QLE2690 16Gb FC Adapter)
- Qlogic ISP 2812 (QLogic QLE2772 32Gbit 2Port FC Adapter)
PR: 271062
Reviewed by: imp, mav
Sponsored by: Technical University of Munich
Pull Request: https://github.com/freebsd/freebsd-src/pull/726
isp(4): Add support to read contents of the FLT (flash layout table)
The FLT is like a TOC for the flash area and contains entries for every flash
region with start/end address, size and flags.
Start using NVRAM addresses from FLT instead of hardcoded ones for ISP28xx
based HBAs.
The FLT should be available on earlier HBAs too, probably since ISP24xx based.
This needs further investigation and testing.
PR: 271062
Reviewed by: imp, mav
Sponsored by: Technical University of Munich
Pull Request: https://github.com/freebsd/freebsd-src/pull/726
This covers the following HBAs:
ISP2812-based 64/32G Fibre Channel to PCIe Controller:
QLE2770 Single Port 32GFC PCIe Gen4 x8 Adapter
QLE2772 Dual Port 32GFC PCIe Gen4 x8 Adapter
QLE2870 Single Port 64GFC PCIe Gen4 x8 Adapter
QLE2872 Dual Port 64GFC PCIe Gen4 x8 Adapter
ISP2814-based 64/32G Fibre Channel to PCIe Controller:
QLE2774 Quad Port 32GFC PCIe Gen4 x16 Adapter
QLE2874 Quad Port 64GFC PCIe Gen4 x16 Adapter
While here, add required bits to support 64GB FC.
Default framesize is set to 2048 for ISP28xx based HBAs for now.
PR: 271062
Reviewed by: imp, mav
Sponsored by: Technical University of Munich
Pull Request: https://github.com/freebsd/freebsd-src/pull/726
Alexander Motin [Tue, 26 Dec 2023 02:19:28 +0000 (21:19 -0500)]
acpi_cpu: Reduce BUS_MASTER_RLD manipulations
Instead of setting and clearing BUS_MASTER_RLD register on every C3
state enter/exit, set it only once if the system supports C3 state
and we are going to "disable" bus master arbitration while in it.
This is what Linux does for the past 14 years, and for even more time
this register is not implemented in a relevant hardware. Same time
since this is only a single bit in a bigger register, ACPI has to
do take a global lock and do read-modify-write for it, that is too
expensive, saved only by C3 not entered frequently, but enough to be
seen in idle system CPU profiles.