In `copy()`, if no digest was requested (which is the common case), we
use `copy_file_range()` to avoid needlessly copying the contents of the
file into user space and back. When `copy_file_range()` returns
successfully (which, again, is the common case), we simply return, and
therefore never get to the point where we call `fsync()` if the `-S`
option was specified. Fix this.
There's no need to copy the path twice to split it into base and dir.
We simply call `basename()` first, then handle the two trivial cases in
which it isn't safe to call `dirname()`.
While here, add an early check that the destination is not an empty
string. This would always fail eventually, so it may as well fail
right away. Also add a test case for this shortcut.
Previously, we would only use a temporary file if explicitly asked to
with the `-S` option, and even then, only if the target file already
existed. This meant that an outside observer looking for the target
file might see a partial file, and might see the file disappear and
then reappear.
With this patch, we always use a temporary file, ensuring atomicity.
The downside is slightly increased disk usage. The upside is never
having to worry about, for instance, cron jobs randomly failing if
they happen to run simultaneously with `make installworld`.
The `-S` option is retained, partly for compatibility, and partly
to control the use of `fsync(2)`, which has a non-negligible cost
(approximately 10% increase in wall time for `make installworld`).
Ka Ho Ng [Fri, 12 Apr 2024 16:57:35 +0000 (16:57 +0000)]
ibcore: Remove the use of NULL_IB_OBJECT
LinuxKPI's XArray implementation accepts NULL as an input as of the
following commit:
- linuxkpi: Accept NULL as a value in linux_xarray (3102ea3b15b6)
Ka Ho Ng [Fri, 12 Apr 2024 16:56:42 +0000 (16:56 +0000)]
uart_snps: Register a device xref for UARTs
This is useful for other drivers to be able to find the UART (such as
the case of UARTs where hardware flow control lines are handled by
another device.)
Alexander Ziaee [Fri, 12 Apr 2024 16:57:54 +0000 (10:57 -0600)]
intro.1: 2024 edition
Modernize intro.1, attempting to preserve style and brevity,
including a paragraph about installing more commands, a FILES
section explaining where the commands are located and why, and
adding section number to HISTORY for clarity.
in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never used, causing a compiler error:
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:547:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
547 | struct inpcb *inp;
| ^
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:573:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
573 | struct inpcb *inp;
Fix this by guarding all the INET-specific logic, including the variable
definition, behind #ifdef INET.
While here, tweak formatting in in6_mapped_peeraddr() so both functions
are the same.
Andrew Turner [Tue, 12 Mar 2024 17:01:26 +0000 (17:01 +0000)]
sys/ddb: Add hardware breakpoint support to ddb
As with hardware watchpoints add support for hardware breakpoints. The
command is only enabled on architectures that report support for them.
Currently no architectures do, however arm64 will add support in a
future change.
Reviewed by: jhb (earlier version)
Sponsored by: Arm Ltd
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D22191
Andrew Turner [Thu, 21 Mar 2024 14:11:17 +0000 (14:11 +0000)]
ddb: Start to generalise breakpoints
To allow for hardware breakpoints it is useful to reuse the same
management code. Start to generalise the code by moving common data
into a new struct and pas this to internal functions to work with.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D44461
Merge commit 55c466da2f2f from llvm-project (by Benjamin Kramer):
[X86][AVX512BF16] Add a few missing insert/extract patterns
These are really the same as the f16 (and i16) instructions, but we need
them for any type that can occur.
Merge commit 2e4e04c59043 from llvm-project (by Phoebe Wang):
[X86][BF16] Do not lower to VCVTNEPS2BF16 without AVX512VL (#86395)
Fixes: #86305
These should fix "fatal error: error in backend: Cannot select: t71:
v32bf16 = insert_subvector t67, t64, Constant:i32<16>" when building the
misc/ncnn port.
The author reported that this patch was needed to avoid
crashes on a fairly busy RISC-V system. The author did not
provide details w.r.t. the crashes. Although I
have not seen any such crash, the patch looks reasonable
and I have not found any regressions when testing it.
Since "rdirplus" is not a default option, the patch is
only needed if you are doing NFS mounts with the "rdirplus"
mount option and seeing crashes related to the name cache.
ds1307: use the correct Microchip part number in enum and device description
During a minor refactoring two years ago (part of 2486b446), the newly
created enum used the wrong part number - MCP7491x instead of MCP7941x. The
device description string got the same transposition of digits.
This change swaps the digits back to what they should be.
fdwrite.c: initialize pointers to NULL and a few other cleanups
1. Both trackbuf and vrfybuf are initialized to
zero (NULL). While it's okay to initialize pointers
to zero, to keep consistency, as they're explicitly
pointers, it's better to just use NULL ((void *)0)
instead of 0 (both are equivalent to the compilers).
2. Call free() for both trackbuf and vrfybuf after
their job has been done.
3. Remove the register keyword. Compilers generally
ignore this keyword (except for very very old compilers
and CPUs).
4. Remove the ctype.h header. It's not being used
anywhere in the file.
John Grafton [Thu, 11 Apr 2024 18:11:18 +0000 (12:11 -0600)]
adduser(8): support creation of ZFS dataset
On systems utilizing ZFS, default to creating a ZFS dataset for a new
user's home directory if the parent directory resides on a ZFS dataset.
Add a flag that disables this behavior if the administrator explicitly
does not want it.
If run during installation from within a chroot, set mountpoint to legacy
after dataset creation and mount directly into the chroot. Then umount
and reset the mountpoint to inherit from parent.
Also support ZFS default encryption on user's home directory.
Hot-unplugging a sound device, such as a USB sound card, whilst being
consumed by an application, results in an infinite loop until either the
application closes the device's file descriptor, or the channel
automatically times out after hw.snd.timeout seconds. In the case of a
detach however, the timeout approach is still not ideal, since we want
all resources to be released immediatelly, without waiting for N seconds
until we can use the bus again.
The timeout mechanism works by calling chn_sleep() in chn_read() and
chn_write() (see pcm/channel.c) in order to send the thread to sleep,
using cv_timedwait_sig(). Since chn_sleep() sets the CHN_F_SLEEPING flag
while waiting for cv_timedwait_sig() to return, we can test this flag in
pcm_unregister() (called during detach) and wakeup the sleeping
thread(s) to immediately kill the channel(s) being consumed.
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
PR: 194727
Reviewed by: dev_submerge.ch, bapt, markj
Differential Revision: https://reviews.freebsd.org/D43545
sound: Get rid of snd_clone and use DEVFS_CDEVPRIV(9)
Currently the snd_clone framework creates device nodes on-demand for
every channel, through the dsp_clone() callback, and is responsible for
routing audio to the appropriate channel(s). This patch gets rid of the
whole snd_clone framework (including any related sysctls) and instead
uses DEVFS_CDEVPRIV(9) to handle device opening, channel allocation and
audio routing. This results in a significant reduction in code size as
well as complexity.
Behavior that is preserved:
- hw.snd.basename_clone.
- Exclusive access of an audio device (i.e VCHANs disabled).
- Multiple processes can read from/write to the device.
- A device can only be opened as many times as the maximum allowed
channel number (see SND_MAXHWCHAN in pcm/sound.h).
- OSSv4 compatibility aliases are preserved.
Behavior changes:
Only one /dev/dspX device node is created (on attach) for each audio
device, as opposed to the current /dev/dspX.Y devices created by
snd_clone. According to the sound(4) man page, devices are not meant to
be opened through /dev/dspX.Y anyway, so it is best if we do not create
device nodes for them in the first place. As a result of this, modify
dsp_oss_audioinfo() to print /dev/dspX in the "ai->devnode", instead of
/dev/dspX.Y.
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
Reviewed by: dev_submerge.ch, bapt, markj
Differential Revision: https://reviews.freebsd.org/D44411
Colin Percival [Thu, 11 Apr 2024 16:24:59 +0000 (09:24 -0700)]
release/Makefile.vm: Support read-only ports tree
Build qemu (if needed) with WRKDIRPREFIX=/tmp/ports DISTDIR=/tmp/distfiles
so that we can have a read-only /usr/ports and don't contaminate it. This
became an issue when I enabled parallel release building, since one image
might be creating its ports.txz file at the same time as we're building
qemu as a prerequisite for building another image.
Mark Johnston [Wed, 10 Apr 2024 14:10:10 +0000 (10:10 -0400)]
arm64/vmm: Define a dummy _start symbol in vmm_hyp_blob.elf
To silence a linker warning about _start being missing. This blob
contains code executed at EL2 and is only meant to be entered via
exception handlers.
Reviewed by: bz, emaste
Fixes: 47e073941f4e ("Import the kernel parts of bhyve/arm64")
Differential Revision: https://reviews.freebsd.org/D44735
Cristian Marussi [Mon, 11 Dec 2023 08:33:01 +0000 (08:33 +0000)]
scmi: Introduce a new SCMI API and port CLK SCMI driver to it
Expose new scmi_buf_get/put API methods to build and send messages;
command request descriptors are now pre-allocated when the SCMI core is
initialized and kept in a free list, instead of being allocated on the
stack of the caller of the SCMI request.
Dynamically allocated descriptors enable the SCMI core to keep around
and track outstanding transactions for as long as needed, outliving the
lifetime of the caller stack: this allows tracking of late or missing
replies and it will be needed when adding support for SCMI transports
that allows for more messages to be inflight concurrently.
Move the existing CLK SCMI driver to the new API.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43046
scmi: Add SCMI message tracking and centralize tx/rx logic
In order to be able to support also new, more parallel, SCMI transports
that by nature can allow multiple concurrent commands to be in-flight,
pending a reply, we must be able to use the sequence number provided in
the SCMI messages to track the message status, matching commands and
replies while keeping track of timeouts and duplicates.
Add the needed message tracking machinery in the core SCMI stack and
move the residual common tx/rx logic from the specific transports to
the core SCMI stack, while adding one more interface to let the
transports customize ther behaviour.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43045
scmi: Add new SCMI interfaces for init and message processing
Introduce a couple of new SCMI interface methods to allow centralized
initialization of transport-specific features and a couple of methods
to handle message reception from the SCMI core.
Move SCMI SMT related calls out of the core common SCMI code into the
transport specific layers Mailbox/SMC.
Make SCMI Mailbox/SMC transports use the new interface methods for
initialization and message reception.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43044
scmi: Protect SCMI/SMT channels from concurrent transmissions
The SCMI/SMT memory areas are used from the agent and the platform as
channels to exchage commands and replies.
Once the platform has completed its processing and a reply is ready to
be read from the agent, the platform will relinquish the channel to the
agent by setting the CHANNEL_FREE bits in the related SMT area.
When this happens, though, the agent has still to effectively read back
the reply message and any other concurrent request happened to have been
issued in the meantime will have been to be hold back until the reply
is processed or risk to be overwritten by the new request.
The base->mtx lock that currently guards the whole scmi_request()
operation is released when sleeping waiting for a reply, so the above
mentioned race can still happen or, in a slightly different scenario,
the concurrent transmission could just fail, finding the channel busy,
after having sneaked through the mutex.
Adding a new mechanism to let the agent explicitly acquire/release the
channel paves the way, in the future, to remove such central commmon
lock in favour of new dedicated per-transport locking mechanisms, since
not all transports will necessarily need the same level of protection.
Add a flag, controlled by the agent, to mark when the channel has an
inflight command transaction still pending to be completed and make the
agent spin on it when queueing multiple concurrent messages on the same
SMT channel.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43043
When the system is cold, the SCMI stack processes commands in polling
mode with the current polling mechanism being a check of the status
register in the mailbox controller to see if there is any pending
doorbell request.
Anyway, the completion interrupt is optional by the SCMI specification
and a system could have been simply designed without it: for this
reason polling on the mailbox controller status registers is not going
to work in all situations.
Moreover even alternative SCMI transports based on shared memory, like
SMC, will not have at all a mailbox controller to poll for.
On the other side, the associated SCMI Shared Memory Transport defines
dedicated channel flags and status bits that can be used by the agent to
explicitly request a polling-based transaction, even if the completion
interrupt was available, and to check afterwards when the platform has
completed its processing on the outstanding command.
Use SCMI/SMT specific mechanism to process transactions in polling mode.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43042
Add a few new common public scmi_shmem methods to be used to handle SCMI
shared memory areas from multiple transports; while doing that review
the shared memory accesses to read only the SMT header fields strictly
relevant to the SCMI message processing.
Move all the SCMI shmem related code to the existing scmi_shmem.c file
and add a new dedicated scmi_shmem.h header.
Introduce some commonly needed message header manipulation macros.
Reviewed by: andrew
Tested on: Arm Morello Board
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D43041
Bojan Novković [Tue, 9 Apr 2024 19:02:12 +0000 (21:02 +0200)]
vm: improve kstack_object pindex calculation to avoid pindex holes
This commit replaces the linear transformation of kernel virtual
addresses to kstack_object pindex values with a non-linear
scheme that circumvents physical memory fragmentation caused by
kernel stack guard pages. The new mapping scheme is used to
effectively "skip" guard pages and assign pindices for
non-guard pages in a contiguous fashion.
The new allocation scheme requires that all default-sized kstack KVAs
come from a separate, specially aligned region of the KVA space.
For this to work, this commited introduces a dedicated per-domain
kstack KVA arena used to allocate kernel stacks of default size.
The behaviour on 32-bit platforms remains unchanged due to a
significatly smaller KVA space.
Aside from fullfilling the requirements imposed by the new scheme, a
separate kstack KVA arena facilitates superpage promotion in the rest
of kernel and causes most kstacks to have guard pages at both ends.
Jessica Clarke [Mon, 19 Feb 2024 22:40:24 +0000 (22:40 +0000)]
bhyve: Fix arm64 PCI I/O range to match FDT
This is supposed to combine with the memory range to make one contiguous
block, as is laid out in the FDT, so make this match what the OS is told
and thus actually configures.
Also drop the confusing leading zero from all three of these constants
that is making these 9 rather than 8 hex digits long (as one would
expect for a 32-bit address).
Jessica Clarke [Sat, 17 Feb 2024 01:44:51 +0000 (01:44 +0000)]
bhyve: Support legacy PCI interrupts on arm64
This allows us to remove various #ifdef hacks and enable building more
PCI devices.
Note that a hole is left in the interrupt mapping for the RTC rather
than having the two core devices straddle the PCIe interrupts. QEMU's
virt machine also takes this approach.
Mark Johnston [Wed, 3 Apr 2024 17:45:06 +0000 (13:45 -0400)]
libvmmapi: Conditionalize compilation of some functions
Hide definitions of several functions that currently don't have
implementatations in the arm64 vmm port. In particular, add a
WITH_VMMAPI_SNAPSHOT preprocessor variable that can be used to enable
compilation of save/restore functions, and conditionalize compilation of
some functions only used by amd64 bhyve. If in the long term they
remain amd64-only, they can move to vmmapi_machdep.c, but for now it's
not clear to me that that's the right thing to do.
Mark Johnston [Wed, 3 Apr 2024 17:44:40 +0000 (13:44 -0400)]
bhyve: Push option parsing down into bhyverun_machdep.c
After a couple of attempts I think this is the cleanest approach despite
the expense of some code duplication. Quite a few of the single-letter
bhyve options are x86-specific.
I think that going forward we should strongly discourage the addition of
new options and instead configure guests using the more general
configuration file syntax.
Mark Johnston [Wed, 3 Apr 2024 17:11:37 +0000 (13:11 -0400)]
bhyve: Add PCI mappings for arm64
- The extended config space and BAR ranges are listed in the FDT.
- Avoid referencing I/O ports in ACPI tables. Currently the arm64 port
does not support ACPI in any case.
Mark Johnston [Wed, 3 Apr 2024 17:11:24 +0000 (13:11 -0400)]
bhyve: Do not compile PCI passthrough support on arm64
Some required kernel functionality is not yet implemented.
For now this means that one cannot specify host PCI register values, but
that functionality is only used by amd64-specific device models for now.
Note that this limitation is rather artificial; it arises only because
pci_host_read_config() lives in pci_passthru.c.
Mark Johnston [Wed, 3 Apr 2024 17:09:32 +0000 (13:09 -0400)]
libvmmapi: Make vm_raise_msi() a common function
Currently, bhyve PCI emulation uses vm_lapic_msi() to raise an MSI in
the guest. The arm64 port has a similar function, vm_raise_msi().
Add vm_raise_msi() on amd64 as well and have it simply call
vm_lapic_msi() so that bhyve can use a common, generically named
function.
Mark Johnston [Wed, 3 Apr 2024 17:01:31 +0000 (13:01 -0400)]
libvmmapi: Make memory segment handling a bit more abstract
libvmmapi leaves a hole at [3GB, 4GB) in the guest physical address
space. This hole is not used in the arm64 port, which maps everything
above 4GB. This change makes the code a bit more general to accomodate
arm64 more naturally. In particular:
- Remove vm_set_lowmem_limit(): it is unused and doesn't have
well-defined constraints, e.g., nothing prevents a consumer from
setting a lowmem limit above the highmem base.
- Define a constant for the highmem base and use that everywhere that
the base is currently hard-coded.
- Make the lowmem limit a compile-time constant instead of a vmctx field.
- Store segment info in an array.
- Add vm_get_highmem_base(), for use in bhyve since the current value is
hard-coded in some places.
Mark Johnston [Wed, 3 Apr 2024 16:56:22 +0000 (12:56 -0400)]
libvmmapi: Move PCI passthrough ioctl wrappers into a separate file
The arm64 port doesn't implement PCI passthrough and in particular
doesn't define the ioctls used by these wrappers. It might be that the
ppt ioctl interface will require modification to support arm64. Until
that's sorted out one way or another, put this code in a separate file
so that it's easy to conditionally compile.
Mark Johnston [Wed, 3 Apr 2024 16:55:54 +0000 (12:55 -0400)]
libvmmapi: Split the ioctl list into MI and MD lists
To enable use in capability mode, libvmmapi needs a list of all the
ioctls that might be invoked on the vmm device handle. Some of these
ioctls are amd64-specific. Move the ioctl list to vmmapi_machdep.c and
define a list of MI ioctls so that the arm64 port can build its own list
without duplicating common ioctls. No functional change intended.
Mark Johnston [Wed, 3 Apr 2024 16:52:25 +0000 (12:52 -0400)]
libvmmapi: Move some ioctl wrappers to vmmapi_machdep.c
ioctls relating to segments and various x86-specific interrupt
controllers are easy candidates to move to vmmapi_machdep.c.
In vmmapi.h I'm just ifdefing MD prototypes for now. We could instead
split vmmapi.h into multiple headers, e.g., vmmapi.h and
vmmapi_machdep.h, but it's not obvious to me yet that that's the right
approach.
Mark Johnston [Wed, 3 Apr 2024 16:50:21 +0000 (12:50 -0400)]
bhyve: Add FDT building code for arm64
fdt.c provides some basic routines which let platform initialization
code build the FDT that gets passed into the guest. For now this is not
very generic; we declare info about CPUs, memory, a single UART
(specified by -o console), a PCIe controller (used for virtio devices),
an interrupt controller and the platform timer.
Co-authored-by: andrew
Reviewed by: corvink, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40996
Mark Johnston [Wed, 3 Apr 2024 16:48:45 +0000 (12:48 -0400)]
bhyve: Provide optional libfdt linking
The arm64 port currently does not support ACPI, it instead builds up an
FDT which is exported to the guest. This mechanism will not be used on
amd64 but isn't really arm64-specific either, so provide an opt-in
mechanism to link libfdt.
sys_procctl(): Make it clear that negative commands are invalid
An initial reading of the preamble of sys_procctl() gives the impression
that no test prevents a malicious user from passing a negative commands
index (in 'uap->com'), which is soon used as an index into the static
array procctl_cmds_info[].
However, a closer examination leads to the conclusion that the existing
code is technically correct. Indeed, the comparison of 'uap->com' to
the nitems() expression, which expands to a ratio of sizeof(), leads to
a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic
Conversions/Integer Promotions applied by '<=', because sizeof() returns
'size_t' values, and we define 'size_t' as an equivalent of 'unsigned
int' (which is not mandated by the standard, the latter allowing, e.g.,
integers of lower ranks).
With this conversion, negative values of 'uap->com' are automatically
ruled-out since they are converted to very big unsigned integers which
are caught by the test. An analysis of assembly code produced by LLVM
16 on amd64 and practical tests confirm that no exploitation is possible.
However, the guard code as written is misleading to readers and might
trip up static analysis tools. Make sure that negative values are
explicitly excluded so that it is immediately clear that EINVAL will be
returned in this case.
Build tested with clang 16 and GCC 12.
Approved by: markj (mentor)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation