When checking for the destructor pointer belonging to some still
loaded dso, do not limit the possible dso to the one instantiated the
destructor. For instance, dso could set up the dtr pointer to a function
from libcxx.
* VERSION (_MAKE_VERSION): 20240430
Merge with NetBSD make, pick up
o main.c: ensure '.include <makefile>' respects MAKESYSPATH.
Dir_FindFile will search .CURDIR first unless ".DOTLAST" is seen.
2024-04-28 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240428
Merge with NetBSD make, pick up
o simplify freeing of lists
o arch.c: trim pointless comments
o var.c: delay variable assignments until actually needed
don't reallocate memory after evaluating an expression, result is
almost always short-lived.
2024-04-26 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240426
Merge with NetBSD make, pick up
o job.c: in debug output, print the directory in which a job
failed at same time as failed target so it is more easily found in
build log.
2024-04-24 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240424
Merge with NetBSD make, pick up
o clean up comments, code and tests
2024-04-23 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240422
Merge with NetBSD make, pick up
o var.c: avoid LazyBuf for :*time modifiers.
LazyBuf's are not nul terminated so not suitable for passing to
functions that expect that. These modifiers are used sparingly so
an extra allocation is not a problem.
2024-04-20 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240420
Merge with NetBSD make, pick up
o provide more context information for parse/evaluate errors
2024-04-14 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240414
Merge with NetBSD make, pick up
o parse.c: print -dp debug info earlier so we see which
.if or .for line is being parsed.
2024-04-04 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240404
Merge with NetBSD make, pick up
o fix some unit tests for Cygwin
o parse.c: exit immediately after reading a null byte from a makefile
* fix generation of bmake.cat1
2024-03-19 Simon J Gerraty <sjg@beast.crufty.net>
* VERSION (_MAKE_VERSION): 20240314
Add/Improve support for Cygwin
o uname -s output isn't useful so allow configure to
set FORCE_MAKE_OS - to force the value of .MAKE.OS
and use Cygwin which matches uname -o
o fix some unit-tests for Cygwin
* configure.in: use_makefile=no for Cygwin et al.
NOTE: bmake does not support Cygwin and likely never will,
* init.mk: allow for _ as well as . to join V
and Q from QUALIFIED_VAR_LIST and VAR_QUALIFIER_LIST.
* progs.mk: avoid overlap between PROG_VARS and
init.mk's QUALIFIED_VAR_LIST since PROG would also
match its VAR_QUALIFIER_LIST,
libs.mk does not have the same issue.
* subdir.mk: _SUBDIRUSE for realinstall should run install
remove include of ${.CURDIR}/Makefile.inc that can be done via
local.subdir.mk where needed
Justin Hibbits [Mon, 13 Nov 2023 16:33:44 +0000 (11:33 -0500)]
tpm: Refactor TIS and add a SPI attachment
Summary:
Though mostly used in x86 devices, TPM can be used on others, with a
direct SPI attachment. Refactor the TPM 2.0 driver set to use an
attachment interface, and implement a SPI bus interface.
Test Plan:
Tested on a Raspberry Pi 4, with a GeeekPi TPM2.0 module (SLB9670
TPM) using security/tpm2-tools tpm2_getcaps for very light testing against the
spibus attachment.
Notable upstream pull request merges:
#15839 c3f2f1aa2 vdev probe to slow disk can stall mmp write checker
#15888 5044c4e3f Fast Dedup: ZAP Shrinking
#15996 db499e68f Overflowing refreservation is bad
#16118 67d13998b Make more taskq parameters writable
#16128 21bc066ec Fix updating the zvol_htable when renaming a zvol
#16130 645b83307 Improve write issue taskqs utilization
#16131 8fd3a5d02 Slightly improve dnode hash
#16134 a6edc0adb zio: try to execute TYPE_NULL ZIOs on the current task
#16141 b28461b7c Fix arcstats for FreeBSD after zfetch support
Warner Losh [Fri, 3 May 2024 15:08:03 +0000 (09:08 -0600)]
MINIMAL: Grow minimal to support ata, scsi and nvme
Until the boot loader automatically loads these things (including the
CAM dependency), we need to have them in the minimal kernel since they
are needed to boot. These aren't strictly required to be in the kernel,
since modules work, but are high enough demand items that until we sort
out boot loader automation, I'm adding them here. These devices are also
common in vm environments. The delta is relatively small in size. Once
the boot loader automation arrives, these and a lot of other things can
be trimmed. It's less than ideal, but is a good middle ground for the
moment.
Gleb Smirnoff [Fri, 3 May 2024 14:45:07 +0000 (07:45 -0700)]
tests/sendfile: test operation on unix/stream socket
Although there are already multiple tests in the tests collection
that utilize sendfile(2) support over unix/stream socket, they all
don't exercise the asynchronous part of the operation. This test
framework, however, uses a trick to toggle true async operation and
guarantee that pr_ready method of unix/stream is also tested.
Tijl Coosemans [Fri, 3 May 2024 13:27:29 +0000 (15:27 +0200)]
linuxkpi: Fix set_memory_*
set_memory_* is currently implemented using PHYS_TO_DMAP but not all
architectures have a DMAP. Looking at how this function is used the
given address isn't physical but virtual so the PHYS_TO_DMAP call can
simply be removed.
Also cast numpages before shifting it to avoid overflow.
Shawn Bayern [Fri, 3 May 2024 07:46:18 +0000 (00:46 -0700)]
Tighten boundary check in split(1) to prevent a potential buffer overflow.
Before increasing sufflen, make sure the current name plus two (including
the terminating NUL character and the to-be-added character) does not
exceed the fixed buffer length, and stop immediately if this would occur.
In worst case scenario the code would write an nul character beyond the
boundary, however it would be caught by open(2) and based on the memory
layout, we do not believe this would constitute a security vulnerability.
John Baldwin [Thu, 2 May 2024 23:35:40 +0000 (16:35 -0700)]
nvmfd: A simple userspace daemon for the NVMe over Fabrics controller
This daemon can operate as a purely userspace controller exporting one
or more simulated RAM disks or local block devices as NVMe namespaces
to a remote host. In this case the daemon provides a discovery
controller with a single entry for an I/O controller.
nvmfd can also offload I/O controller queue pairs to the nvmft.ko
in-kernel Fabrics controller when -K is passed. In this mode, nvmfd
still accepts connections and performs initial transport-specific
negotitation in userland. The daemon still provides a userspace-only
discovery controller with a single entry for an I/O controller.
However, queue pairs for the I/O controller are handed off to the CTL
NVMF frontend.
Eventually ctld(8) should be refactored to to provide an abstraction
for the frontend protocol and the discovery and the kernel mode of
this daemon should be merged into ctld(8). At that point this daemon
can be moved to tools/tools/nvmf as a debugging tool (mostly as sample
code for a userspace controller using libnvmf).
John Baldwin [Thu, 2 May 2024 23:35:32 +0000 (16:35 -0700)]
nvmfdd: A simple userspace NVMe over Fabrics host
This program uses libnvmf to connect to a remote Fabrics controller
and perform a single read or write operation. The write command reads
data from stdin to construct one or more NVM Write commands sent to
the remote namespace. The read command uses one or more NVM Read
commands to read blocks from a remote namespace writing the data to
stdout.
John Baldwin [Thu, 2 May 2024 23:34:45 +0000 (16:34 -0700)]
nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics.
Userland is responsible for accepting a new queue pair and receiving
the initial Connect command before handing the queue pair off via an
ioctl to this CTL frontend.
This frontend exposes CTL LUNs as NVMe namespaces to remote hosts.
Users can ask LUNS to CTL that can be shared via either iSCSI or
NVMeoF.
John Baldwin [Thu, 2 May 2024 23:34:26 +0000 (16:34 -0700)]
ctl: Add NVMF port type and ioctls
- Add CTL_PORT_NVMF as a new port type.
- Define a new CTL_NVMF ioctl for NVMF-specific operations similar to
CTL_ISCSI. This ioctl supports a command to handoff a single
queue pair, a command to enumerate active associations, and a
command to disconnect one or more active associations.
John Baldwin [Thu, 2 May 2024 23:33:50 +0000 (16:33 -0700)]
ctl_backend_ramdisk: Add support for NVMe
One known caveat is that the support for WRITE_UNCORRECTABLE is not
quite correct as reads from LBAs after a WRITE_UNCORRECTABLE will
return zeroes rather than an error. Fixing this would likely require
special handling for PG_ANCHOR for NVMe requests (or adding a new
PG_UNCORRECTABLE).
John Baldwin [Thu, 2 May 2024 23:32:41 +0000 (16:32 -0700)]
ctl: Add helper routines to populate NVMe namespace data IDs for a LUN
These will be used by the backends to populate the unique ID fields
like EUI64 in the NVMe namespace data (CNS == 0) and namespace
identification descriptor list (CNS == 3).
John Baldwin [Thu, 2 May 2024 23:32:09 +0000 (16:32 -0700)]
ctl: Support for NVMe commands
- Add support for queueing and executing NVMe admin and NVM commands
via ctl_run and ctl_queue. This requires fixing a few places that
were SCSI-specific to add NVME logic.
- NVMe has much simpler command ordering requirements than SCSI. In
particular, the HBA is not required to enforce any specific ordering
for requests with overlapping LBAs. The host is required to manage
that ordering. However, fused commands (currently only COMPARE and
WRITE NVM commands can be fused) are required to be executed
atomically.
To support fused commands, make the second half of a fused command
block on the first half, and have commands submitted after a fused
command pair block on the second half.
- Add handlers and command tables for admin and NVM commands that
operate on individual namespaces and will be passed down from an
NVMe over Fabrics controller to a CTL LUN.
John Baldwin [Thu, 2 May 2024 23:31:59 +0000 (16:31 -0700)]
ctl: Add assertions in SCSI-only paths
Assert that only SCSI I/O requests are passed in various places
that assume a SCSI I/O request (that is, places that access fields
in io->scsiio directly).
John Baldwin [Thu, 2 May 2024 23:31:44 +0000 (16:31 -0700)]
ctl: Update some core data paths to be protocol agnostic
- Add wrapper routines for invoking the be_move_done and io_continue
callbacks in SCSI and NVMe I/O requests.
- Use wrapper routines for access to shared fields between SCSI and
NVMe I/O requests.
- ctl_config_write_done is not fully updated since it resubmits SCSI
commands via ctl_scsiio. This will be completed in a subsequent
commit when ctl_nvmeio is added.
John Baldwin [Thu, 2 May 2024 23:30:20 +0000 (16:30 -0700)]
ctl: Avoid an upcast for calling ctl_scsi_path_string
Change the first argument of ctl_scsi_path_string to be the embedded
header structure instead of the union. Currently union ctl_io and
struct ctl_scsiio have the same alignment, but this changes on i386 if
a new union member is added that contains a uint64_t member (such as
an embedded struct nvme_command for NVMeoF). In that case, union
ctl_io requires stronger alignment, so the upcast from struct
ctl_scsiio to union ctl_io in ctl_scsi_sense_sbuf raises an increasing
alignment warning on i386.
Avoid the warning by passing struct ctl_io_hdr as the first argument
to ctl_scsi_path_string instead.
John Baldwin [Thu, 2 May 2024 23:30:10 +0000 (16:30 -0700)]
nvmecontrol: New commands to support Fabrics hosts
- discover: Connects to a remote Discovery controller, fetches its
Discovery Log Page, and enumerates the remote controllers described
in the log page.
The -v option can be used to display the Identify Controller data
structure for the Discovery controller. This is only really useful
for debugging.
- connect: Connects to a remote I/O controller and establishes an
association of an admin queue and a single I/O queue. The
association is handed off to the in-kernel host to create a new
nvmeX device.
- connect-all: Connects to a Discovery controller and attempts to
create an association with each I/O controller enumerated in the
Discovery controller's Discovery Log Page.
- reconnect: Establishes a new association with a remote I/O
controller for an existing nvmeX device. This can be used to
restore access to a remote I/O controller after the loss of a prior
association due to a transport error, controller reboot, etc.
- disconnect: Deletes one or more nvmeX devices after detaching its
namespaces and terminating any active associations. The devices to
delete can be identified by either a nvmeX device name or the NQN of
the remote controller.
- disconnect-all: Deletes all active associations with remote
controllers.
John Baldwin [Thu, 2 May 2024 23:29:37 +0000 (16:29 -0700)]
nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics.
Userland is responsible for creating a set of queue pairs and then
handing them off via an ioctl to this driver, e.g. via the 'connect'
command from nvmecontrol(8). An nvmeX new-bus device is created
at the top-level to represent the remote controller similar to PCI
nvmeX devices for PCI-express controllers.
As with nvme(4), namespace devices named /dev/nvmeXnsY are created and
pass through commands can be submitted to either the namespace devices
or the controller device. For example, 'nvmecontrol identify nvmeX'
works for a remote Fabrics controller the same as for a PCI-express
controller.
nvmf exports remote namespaces via nda(4) devices using the new NVMF
CAM transport. nvmf does not support nvd(4), only nda(4).
John Baldwin [Thu, 2 May 2024 23:28:47 +0000 (16:28 -0700)]
nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI
(icl_soft.c). One key difference is that NVMeoF transports use a more
abstract interface working with NVMe commands rather than transport
PDUs. Thus, the data transfer for a given command is managed entirely
in the transport backend.
Similar to icl_soft.c, separate kthreads are used to handle transmit
and receive for each queue pair. On the transmit side, when a capsule
is transmitted by an upper layer, it is placed on a queue for
processing by the transmit thread. The transmit thread converts
command response capsules into suitable TCP PDUs where each PDU is
described by an mbuf chain that is then queued to the backing socket's
send buffer. Command capsules can embed data along with the NVMe
command.
On the receive side, a socket upcall notifies the receive kthread when
more data arrives. Once enough data has arrived for a PDU, the PDU is
handled synchronously in the kthread. PDUs such as R2T or data
related PDUs are handled internally, with callbacks invoked if a data
transfer encounters an error, or once the data transfer has completed.
Received capsule PDUs invoke the upper layer's capsule_received
callback.
struct nvmf_tcp_command_buffer manages a TCP command buffer for data
transfers that do not use in-capsule-data as described in the NVMeoF
spec. Data related PDUs such as R2T, C2H, and H2C are associated with
a command buffer except in the case of the send_controller_data
transport method which simply constructs one or more C2H PDUs from the
caller's mbuf chain.
John Baldwin [Thu, 2 May 2024 23:28:32 +0000 (16:28 -0700)]
nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs
and capsules. It provides a glue layer between transports (such as
TCP or RDMA) and an NVMeoF host (initiator) and controller (target).
Unlike the synchronous API exposed to the host and controller by
libnvmf, the kernel's transport layer uses an asynchronous API built
on callbacks. Upper layers provide callbacks on queue pairs that are
invoked for transport errors (error_cb) or anytime a capsule is
received (receive_cb).
Data transfers for a command are usually associated with a callback
that is invoked once a transfer has finished either due to an error
or successful completion.
For an upper layer that is a host, command capsules are allocated and
populated with an NVMe SQE by calling nvmf_allocate_command. A data
buffer (described by a struct memdesc) can be associated with a
command capsule before it is transmitted via nvmf_capsule_append_data.
This function accepts a direction (send vs receive) as well as the
data transfer callback. The host then transmits the command via
nvmf_transmit_capsule. The host must ensure that the data buffer
described by the 'struct memdesc' remains valid until the data
transfer callback is called. The queue pair's receive_cb callback
should match received response capsules up with previously transmitted
commands.
For the controller, incoming commands are received via the queue
pair's receive_cb callback. nvmf_receive_controller_data is used to
retrieve any data from a command (e.g. the data for a WRITE command).
It can be called multiple times to split the data transfer into
smaller sizes. This function accepts an I/O completion callback that
is invoked once the data transfer has completed.
nvmf_send_controller_data is used to send data to a remote host in
response to a command. In this case a callback function is not used
but the status is returned synchronously. Finally, the controller can
allocate a response capsule via nvmf_allocate_response populated with
a supplied CQE and send the response via nvmf_transmit_capsule.
John Baldwin [Thu, 2 May 2024 23:28:16 +0000 (16:28 -0700)]
libnvmf: Add internal library to support NVMe over Fabrics
libnvmf provides APIs for transmitting and receiving Command and
Response capsules along with data associated with NVMe commands.
Capsules are represented by 'struct nvmf_capsule' objects.
Capsules are transmitted and received on queue pairs represented by
'struct nvmf_qpair' objects.
Queue pairs belong to an association represented by a 'struct
nvmf_association' object.
libnvmf provides additional helper APIs to assist with constructing
command capsules for a host, response capsules for a controller,
connecting queue pairs to a remote controller and optionally
offloading connected queues to an in-kernel host, accepting queue pair
connections from remote hosts and optionally offloading connected
queues to an in-kernel controller, constructing controller data
structures for local controllers, etc.
libnvmf also includes an internal transport abstraction as well as an
implementation of a userspace TCP transport.
libnvmf is primarily intended for ease of use and low-traffic use cases
such as establishing connections that are handed off to the kernel.
As such, it uses a simple API built on blocking I/O.
For a host, a consumer first populates an 'struct
nvmf_association_params' with a set of parameters shared by all queue
pairs for a single association such as whether or not to use SQ flow
control and header and data digests and creates a 'struct
nvmf_association' object. The consumer is responsible for
establishing a TCP socket for each queue pair. This socket is
included in the 'struct nvmf_qpair_params' passed to 'nvmf_connect' to
complete transport-specific negotiation, send a Fabrics Connect
command, and wait for the Connect reply. Upon success, a new 'struct
nvmf_qpair' object is returned. This queue pair can then be used to
send and receive capsules. A command capsule is allocated, populated
with an SQE and optional data buffer, and transmitted via
nvmf_host_transmit_command. The consumer can then wait for a reply
via nvmf_host_wait_for_response. The library also provides some
wrapper functions such as nvmf_read_property and nvmf_write_property
which send a command and wait for a response synchronously.
For a controller, a consumer uses a single association for a set of
incoming connections. A consumer can choose to use multiple
associations (e.g. a separate association for connections to a
discovery controller listening on a different port than I/O
controllers). The consumer is responsible for accepting TCP sockets
directly, but once a socket has been accepted it is passed to
nvmf_accept to perform transport-specific negotiation and wait for the
Connect command. Similar to nvmf_connect, nvmf_accept returns a newly
construct nvmf_qpair. However, in contrast to nvmf_connect,
nvmf_accept does not complete the Fabrics negotiation. The consumer
must explicitly send a response capsule before waiting for additional
command capsules to arrive. In particular, in the kernel offload
case, the Connect command and data are provided to the kernel
controller and the Connect response capsule is sent by the kernel once
it is ready to handle the new queue pair.
For userspace controller command handling, the consumer uses
nvmf_controller_receive_capsule to wait for a command capsule.
nvmf_receive_controller_data is used to retrieve any data from a
command (e.g. the data for a WRITE command). It can be called
multiple times to split the data transfer into smaller sizes.
nvmf_send_controller_data is used to send data to a remote host in
response to a command. It also sends a response capsule indicating
success, or an error if an internal error occurs. nvmf_send_response
is used to send a response without associated data. There are also
several convenience wrappers such as nvmf_send_success and
nvmf_send_generic_error.
John Baldwin [Thu, 2 May 2024 23:27:53 +0000 (16:27 -0700)]
nvmft: Add NVMeoF controller routines shared between kernel and userland
This includes functions to validate NVMe Qualified Names, compute an
initial value of the CAP property, validate changes to the CC
property, and populate the Identify Controller data structure for an
I/O controller.
John Baldwin [Thu, 2 May 2024 23:26:16 +0000 (16:26 -0700)]
nvmf_proto.h: NVMe over Fabrics protocol definitions
This is a copy of spdk/include/spdk/nvmf_spec.h as of commit 470e851852bb948334a272c9f8de495020fa082f from Intel's SPDK.
Subsequent commits will modify it to be suitable header for the
kernel, but importing the stock file first makes it easier to see
how the resulting header is derived from the original.
Rob N [Thu, 2 May 2024 22:18:35 +0000 (08:18 +1000)]
vdev_disk: disable flushes if device does not support it
If the underlying device doesn't have a write-back cache, the kernel
will just return a successful response. This doesn't hurt anything, but
it's extra work on the IO taskqs that are unnecessary. So, detect this
when we open the device for the first time.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16148
Warner Losh [Thu, 2 May 2024 21:58:55 +0000 (15:58 -0600)]
cam/iosched: Document latency buckets correctly.
Document how latency buckets are actually computed: They are a doubling
from 20us to 10.485s by default, but based at
kern.cam.iosched.bucket_base_us and increase with a ratio of
kern.cam.iosched.bucket_ration / 100 from one to the next.
Warner Losh [Sun, 28 Apr 2024 17:01:24 +0000 (11:01 -0600)]
nvmecontrol: Allow optional /dev/ for device names
nvmecontrol operates on devices. Allow a user to specify the /dev/ if
they want. Any device that starts with / will be treated as if it was a
full path for maximum flexbility.
Mark Johnston [Wed, 1 May 2024 11:57:56 +0000 (07:57 -0400)]
sysctl: Make sysctl_ctx_free() a bit safer
Clear the list before returning so that sysctl_ctx_free() can be called
more than once on the same list without side effects. This simplifies
error handling in drivers; previously, drivers would have to be careful
to call sysctl_ctx_free() at most once to avoid a use-after-free.
While here, use TAILQ_FOREACH_SAFE in the loop which unregisters OIDs.
Mike Karels [Thu, 2 May 2024 15:24:37 +0000 (10:24 -0500)]
in6.h: expose s6_addr* definitions to user level
The only element of of in6_addr that is specified in RFC 3493 or
in POSIX.1-2017 is s6_addr, implemented via a #define to a union
member. However, FreeBSD and other BSD systems have additional
definitions for the other union members, s6_addr{8,16,32} which
are defined for the kernel and loader. Some Linux applications
also use them, and they seem to be allowed by the RFC and POSIX.
Remove the current ifdefs, exposing the additional fields to user
level, and replace with #if __BSD_VISIBLE. Add an explanatory
comment expanding on the previous "nonstandard" comment.
Alexander Motin [Wed, 1 May 2024 18:07:20 +0000 (14:07 -0400)]
Improve write issue taskqs utilization
- Reduce number of allocators on small system down to one per 4
CPU cores, keeping maximum at 4 on 16+ core systems. Small systems
should not have the lock contention multiple allocators supposed
to solve, while having several metaslabs open and modified each
TXG is not free.
- Reduce number of write issue taskqs down to one per 16 CPU
cores and an integer fraction of number of allocators. On mid-
sized systems, where multiple allocators already make sense, too
many write issue taskqs may reduce write speed on single-file
workloads, since single file is handled by only one taskq to
reduce fragmentation. On large systems, that can actually benefit
from many taskq's better IOPS, the bottleneck is less important,
since in worst case there will be at least 16 cores to handle it.
- Distribute dnodes between allocators (and taskqs) in a round-
robin fashion instead of relying on sync taskqs to be balanced.
The last is not guarantied and may depend on scheduling.
- Remove io_wr_iss_tq from struct zio. io_allocator is enough.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16130
Alexander Motin [Wed, 1 May 2024 17:59:32 +0000 (13:59 -0400)]
Slightly improve dnode hash
As I understand just for being less predictable dnode hash includes
8 bits of objset pointer, starting at 6. But since objset_t is
more than 1KB in size, its allocations are likely aligned to 2KB,
that means 11 lower bits provide no entropy. Just take the 8 bits
starting from 11.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16131
Rob Norris [Tue, 30 Apr 2024 00:37:29 +0000 (10:37 +1000)]
libspl/assert: use libunwind for backtrace when available
libunwind seems to do a better job of resolving a symbols than
backtrace(), and is also useful on platforms that don't have backtrace()
(eg musl). If it's available, use it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
Rob Norris [Sun, 28 Apr 2024 02:49:58 +0000 (12:49 +1000)]
libspl/assert: add lock around assertion output
If multiple threads trip an assertion at the same moment (quite common),
they can be printing at the same time, and their output gets messy.
This adds a simple lock around the whole thing, to prevent a second task
printing assert output before the first has finished.
Additionally, if libspl_assert_ok is not set, abort() is called without
dropping the lock, so that any other asserting tasks will be killed
before starting any output, rather than only getting part-way through.
This is a tradeoff; it's assumed that multiple threads asserting at the
same moment are likely the same fault in different instances of a
thread, and so there won't be any more useful information from the other
tasks anyway.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
Rob Norris [Sun, 21 Apr 2024 11:43:53 +0000 (21:43 +1000)]
libspl/assert: show process/task details in assert output
Makes it much easier to see what thing complained.
Getting thread id, program name and thread name vary wildly between
Linux and FreeBSD, so those are set up in macros. pthread_getname_np()
did not appear in musl until very recently, but the same info has always
been available via prctl(PR_GET_NAME), so we use that instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
Rob Norris [Tue, 30 Apr 2024 02:35:30 +0000 (12:35 +1000)]
find_system_library: fix var cleanup when library not found
The "not found" path is attempting to clear SOMELIB_CFLAGS and
SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to
AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`,
which is defined as "if the first arg is non-empty". The m4 "empty"
construction is [], therefore, the existing AC_SUBST calls never modify
the variables at all.
The effect of this is that leftovers from the library test can leak out.
At least, if a library header is found in the first stage, but the
library itself is not, -lsomelib is added to SOMELIB_LIBS and further
tests done. If that library is not found, SOMELIB_LIBS will not be
cleared.
For most of our library tests this hasn't been a problem, as they're
either always found properly via pkg-config or set directly, or the
calling test immediately aborts configure. For an optional dependency
however, an apparent "partial" result where the header is found but no
corresponding library causes link errors later.
I think a complete fix should probably not be setting SOMELIB_xxx until
the final result is known, but for now, adjusting the AC_SUBST calls to
explictly set the empty shell string (which is not "empty" to m4) at
least restores the intent.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/
Closes #16140
In the error path during allocating an in_pcb, the credentials
associated with the new struct get their reference count
increased early on, but not decremented when the allocation
fails.
Reported by: cmiller_netapp.com
MFC after: 3 days
Reviewed by: jhb, tuexen
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D45033
guestrpc module to handle VMware backdoor port GuestRPC functionality
Convert existing FreeBSD vmware_hvcall function to take a channel
and parameter arguments.
Added vmware_guestrpc_cmd() to send GuestRPC commands to the VMware
hypervisor. The sbuf argument is used for both the command to send
and to store the data to return to the caller.
The following KPIs can be used to get and set FreeBSD-specific guest
information in key/value pairs:
* vmware_guestrpc_set_guestinfo
- set a value into the guestinfo.fbsd.<keyword> key
* vmware_guestrpc_get_guestinfo
- get the value stored in the guestinfo.fbsd.<keyword> key
Colin Percival [Wed, 1 May 2024 17:56:51 +0000 (10:56 -0700)]
release: Stage non-UFS images in vm-images-stage
When the VM image building code was updated to support building
non-UFS images, the vm-images-stage target was not updated to
install those newly built images to the FTP site. As a result, we
have been sending weekly snapshot announcements since August claiming
that ZFS VM images are available when they are not in fact present
anywhere publicly accessible.
Fixes: 32ae9a6b3937 "release: Build UFS and ZFS VM images"
Reported by: Michael Dexter
MFC after: 5 days
SHENG-YI HONG [Wed, 1 May 2024 15:09:31 +0000 (15:09 +0000)]
bhyve: Move lock of uart frontend to uart backend
Currently, lock of uart in bhyve is placed in frontend. There are some
problems about it:
1. If every frontend should has a lock, why not move it inside backend
as they all have same uart_softc.
2. If backend needs to modify the information of uart after initialize,
it will be impossible as backend cannot use lock. For example, if we
want implement a telnet support for uart in backend, It should wait
for connection when initialize. After some remote process connect it,
it needs to modify rfd and wfd in backend.
Mark Johnston [Wed, 1 May 2024 12:36:30 +0000 (08:36 -0400)]
vmrun.sh: Add arm64 support
For now, we enumerate disk devices before network devices. This is to
work around a problem wherein u-boot remaps BARs during boot in a way
that bhyve does not handle. Some discussion and experiments suggest
that this can be handled by having bhyve not map BARs during boot on
arm64; until a solution is implemented, however, this workaround is
sufficient for simple usage and doesn't have any real downsides.
The console and bootrom are specified slightly differently versus amd64,
and a few of vmrun.sh's command-line options are amd64-only.
fattime: fix fattime to timespec conversion of dates beyond 2106-02-06
It turns out that the only conversion issue was in fattime2timespec, where
multiplying the number of seconds in a day by the number of days overflowed
32-bit unsigned int for dates beyond 2106-02-07 06:28:15.
Casting one of the multiplicands as time_t forces a 64-bit multiplication on
systems where time_t is 64-bits and produces no binary changes on the one
remaining system with 32-bit time_t (namely i386).
Since the code is now tested & fixed, this change removes the fixme comments.
fattime: make the test code check beyond 32-bit time_t limits
On systems that have a 64-bit time_t, the test code now exercises the whole
range of fattime. To do this, this commit...
1. replaces the call to random() with two calls to arc4random() to
generate a 33-bit number of seconds in order to cover the entire range of
fattime [1970,2107]. (32-bits stops just short - in January 2106.)
On systems with 32-bit time_t, the extra bits are discarded and only the
time_t expressible range is tested.
2. casts time_t values passed to printf as longs and changes the format
string to match.
Now, the test code builds, runs, and exercises what it can (i.e., the whole
fattime range or the 32-bit time_t subset of it) on both 32-bit and 64-bit
time_t systems.
1. replaces calls to timet2fattime/fattime2timet with calls to
timespec2fattime/fattime2timespec. The functions got renamed shortly
after they landed in the kernel but the test code wasn't updated (see 7ea93e912bf0ef).
2. adds a utc_offset stub.
With this, the test code builds and runs as a 32-bit binary (cc -Wall -O2
-m32 subr_fattime.c).
It is the equivalent of tx_chan but for receive so rx_chan is a better
name. Initialize both using helper functions and make sure both are
displayed in the sysctl MIB.
Mark Johnston [Tue, 30 Apr 2024 21:32:38 +0000 (17:32 -0400)]
clang-format: Minor tweaks
Invert KeepEmptyLinesAtTheStartOfBlocks. We used to require an empty
line at the beginning of functions with no local variables, which I
believe is the reason for this setting. Now it is discouraged in new
code.
Tell clang-format to align consecutive macros, since we tend to do that.
clang-format's output isn't quite what we want here. Typically we have
a tab after a #define for some reason, and clang-format doesn't appear
to have an option for that. clang-format will also use a mix of tabs
and spaces to minimize indentation, which is also against our
convention. However, the result looks better with this setting than
without.