John Baldwin [Thu, 24 Mar 2022 18:36:19 +0000 (11:36 -0700)]
x86: Add a NT_X86_SEGBASES register set.
This register set contains the values of the fsbase and gsbase
registers. Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps. In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.
The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.
John Baldwin [Wed, 23 Mar 2022 20:33:06 +0000 (13:33 -0700)]
arm,arm64: Add a NT_ARM_TLS read-only register set.
This register set exposes the per-thread TLS register. It matches the
layout used by Linux on arm64. Linux does not implement this note for
32-bit arm.
Reviewed by: andrew, markj
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D34595
John Baldwin [Wed, 23 Mar 2022 20:33:06 +0000 (13:33 -0700)]
Use a regset for NT_ARM_VFP.
This includes adding support for NT_ARM_VFP for 32-bit binaries
running under aarch64 kernels both for ptrace(), and coredumps via the
kernel and gcore.
Reviewed by: andrew, markj
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D34448
John Baldwin [Thu, 10 Mar 2022 23:40:44 +0000 (15:40 -0800)]
gcore: Use PT_GETREGSET to fetch NT_PRSTATUS and NT_FPREGSET.
Add a elf_putregnote() helper to build the ELF note for a register
set. Once nice result of this approach is that this reuses the
kernel's support for generating 32-bit register sets for 32-bit
processes avoiding the need to duplicate that logic in elf32core.c.
Reviewed by: markj
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D34447
John Baldwin [Thu, 10 Mar 2022 23:40:19 +0000 (15:40 -0800)]
Store core dump notes for all valid register sets for FreeBSD processes.
In particular, use a generic wrapper around struct regset rather than
requiring per-regset helpers. This helper replaces the MI
__elfN(note_prstatus) and __elfN(note_fpregset) helpers. It also
removes the need to explicitly dump NT_ARM_ADDR_MASK in the arm64
__elfN(dump_thread).
Reviewed by: markj, emaste
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D34446
Andrew Turner [Mon, 24 Jan 2022 11:24:17 +0000 (11:24 +0000)]
Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
Add infrastructure required for Linux coredump support
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
This makes it possible to use core_write(), core_output(),
and sbuf_drain_core_output(), in Linux coredump code. Moving
them out of imgact_elf.c is necessary because of the weird way
it's being built.
Alan Somers [Thu, 28 Apr 2022 21:13:09 +0000 (15:13 -0600)]
fusefs: fix FUSE_CREATE with file handles and fuse protocol < 7.9
Prior to fuse protocol version 7.9, the fuse_entry_out structure had a
smaller size. But fuse_vnop_create did not take that into account when
working with servers that use older protocols. The bug does not matter
for servers which don't use file handles or open flags (the only fields
affected).
PR: 263625
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
Alan Somers [Wed, 21 Apr 2021 22:56:48 +0000 (16:56 -0600)]
ctlstat: add prometheus output
When invoked by inetd, ctlstat -P will now produce output suitable for
ingestion into Prometheus.
It's a drop-in replacement for https://github.com/Gandi/ctld_exporter,
except that it doesn't report the number of initiators per target, and
it does report time and dma_time.
Alan Somers [Mon, 18 Apr 2022 21:29:37 +0000 (15:29 -0600)]
prometheus_sysctl_exporter: fix metric aliasing
When exporting sysctls to Prometheus, the exporter replaces "." with
"_". This caused several metrics to alias, confusing the Prometheus
server. Fix it by:
* Renaming the "tcp_log_bucket" UMA zone to "tcp_log_id_bucket". Also,
rename "tcp_log_node" to "tcp_log_id_node" for consistency.
* Not exporting sysctls with "(LEGACY)" in the description. That is
used by ZFS sysctls that have been replaced by others, many of which
alias to the same Prometheus metric name (like "vfs.zfs.arc_max" and
"vfs.zfs.arc.max").
Alan Somers [Mon, 18 Apr 2022 23:03:53 +0000 (17:03 -0600)]
fusefs: correctly handle servers that report too much data written
During a FUSE_WRITE, the kernel requests the server to write a certain
amount of data, and the server responds with the amount that it actually
did write. It is obviously an error for the server to write more than
it was provided, and we always treated it as such, but there were two
problems:
* If the server responded with a huge amount, greater than INT_MAX, it
would trigger an integer overflow which would cause a panic.
* When extending the file, we wrongly set the file's size before
validing the amount written.
Alan Somers [Fri, 15 Apr 2022 19:04:24 +0000 (13:04 -0600)]
fusefs: validate servers' error values
Formerly fusefs would pass up the stack any error value returned by the
fuse server. However, some values aren't valid for userland, but have
special meanings within the kernel. One of these, EJUSTRETURN, could
cause a kernel page fault if the server returned it in response to
FUSE_LOOKUP. Fix by validating all errors returned by the server.
Also, fix a data lifetime bug in the FUSE_DESTROY test.
John Baldwin [Thu, 10 Mar 2022 23:50:52 +0000 (15:50 -0800)]
cxgbei: Support unmapped I/O requests.
- Add icl_pdu_append_bio and icl_pdu_get_bio methods.
- Add new page pod routines for allocating and writing page pods for
unmapped bio requests. Use these new routines for setting up DDP
for iSCSI tasks with a SCSI I/O CCB which uses CAM_DATA_BIO.
- When ICL_NOCOPY is used to append data from an unmapped I/O request
to a PDU, construct unmapped mbufs from the relevant pages backing
the struct bio. This also requires changes in the t4_push_pdus path
to support unmapped mbufs.
John Baldwin [Thu, 10 Mar 2022 23:50:26 +0000 (15:50 -0800)]
iscsi: Support unmapped I/O requests in the default initiator.
- Add icl_pdu_append_bio and icl_pdu_get_bio methods.
- When ICL_NOCOPY is used to append data from an unmapped I/O request
to a PDU, construct unmapped mbufs from the relevant pages backing
the struct bio.
- Use m_apply with a helper to compute crc32 digests on mbuf chains
to handle unmapped mbufs. Since m_apply requires PMAP_HAS_DMAP
for unmapped mbufs, only support unmapped requests when PMAP_HAS_DMAP
is true.
John Baldwin [Thu, 10 Mar 2022 23:49:53 +0000 (15:49 -0800)]
iscsi: Handle unmapped I/O requests.
Don't assume that csio->data_ptr is pointer to a data buffer that can
be passed to icl_get_pdu_data and icl_append_data. For unmapped I/O
requests, csio->data_ptr is instead a pointer to a struct bio as
indicated by CAM_DATA_BIO. To support these requests, add
icl_pdu_append_bio and icl_pdu_get_bio methods which pass a pointer to
the bio and an offset and length relative to the bio's buffer.
Note that only backends supporting unmapped requests need to implement
these hooks.
Implement simple no-op hooks for the iser backend.
John Baldwin [Thu, 10 Mar 2022 23:48:20 +0000 (15:48 -0800)]
iscsi: Use ICL_NOCOPY for SCSI command immediate data and R2T.
The associated csio ccb will not be completed via xpt_done() until
after the associated PDUs are transmitted to the other side and either
the original PDU is acked with a SCSI response, or a response is
received for a subsequent abort CCB (which means the earlier PDU has
also been sent since it would have been sent before the abort PDU).
This does assume that once an I/O request has been aborted, no further
PDUs with data payload are queued for that I/O request.
John Baldwin [Thu, 10 Mar 2022 23:39:53 +0000 (15:39 -0800)]
libpmcstat: Fix a few ARM-specific issues with function symbols.
- Refine the checks for ARM mapping symbols and apply them on arm64 as
well as 32-bit arm. In particular, mapping symbols can have
additional characters and are not strictly limited to just "$a" but
can append additional characters (e.g. "$a.1"). Add "$x" to the
list of mapping symbol prefixes.
- Clear the LSB of function symbol addresses. Thumb function
addresses set the LSB to enable Thumb mode. However, the actual
function starts at the aligned address with LSB clear. Not clearing
the LSB can cause pmcannotate to pass misaligned addresses to
objdump when extracting disassembly.
Reviewed by: andrew
Obtained from: CheriBSD
Sponsored by: University of Cambridge, Google, Inc.
Differential Revision: https://reviews.freebsd.org/D34416
John Baldwin [Wed, 9 Mar 2022 23:38:58 +0000 (15:38 -0800)]
bhyve: Make the MADT dynamically sized.
Use basl_ncpu instead of VM_MAXCPU in MADT_SIZE. Since several of the
offsets are no longer compile time constants, unroll the loop
generating ACPI tables.
Alexander Motin [Sat, 12 Mar 2022 16:49:37 +0000 (11:49 -0500)]
GEOM: Introduce partial confxml API
Traditionally the GEOM's primary channel of information from kernel to
user-space was confxml, fetched by libgeom through kern.geom.confxml
sysctl. It is convenient and informative, representing full state of
GEOM in a single XML document. But problems start to arise on systems
with hundreds of disks, where the full confxml size reaches many
megabytes, taking significant time to first write it and then parse.
This patch introduces alternative solution, allowing to fetch much
smaller XML document, subset of the full confxml, limited to 64KB and
representing only one specified geom and optionally its parents. It
uses existing GEOM control interface, extended with new "getxml" verb.
In case of any error, such as the buffer overflow, it just transparently
falls back to traditional full confxml. This patch uses the new API in
user-space GEOM tools where it is possible.
John Baldwin [Mon, 14 Feb 2022 19:48:47 +0000 (11:48 -0800)]
Disable -Wreturn-type on GCC.
GCC is more pedantic than clang about warning when a function doesn't
handle undefined enum values (see GCC bug 87950). Clang's warning
gives a more pragmatic coverage and should find any real bugs, so
disable the warning for GCC rather than adding __unreachable
annotations to appease GCC.
John Baldwin [Sat, 12 Feb 2022 00:04:52 +0000 (16:04 -0800)]
Cast pointer to uintptr_t to avoid alignment warnings.
Both struct ip and struct udphdr both have an aligment of 2, but the
cast from struct ip to a uint32_t pointer confused GCC 9 into raising
the required alignment to 4 and then raising a
-Waddress-of-packed-member error when casting to struct udphdr.
John Baldwin [Fri, 11 Feb 2022 23:16:25 +0000 (15:16 -0800)]
ktls: Write-lock the INP when changing a transmit TLS session.
The TCP rate pacing code relies on being able to read this pointer
safely while holding an INP lock. The initial TLS session pointer is
set while holding the write lock already.
John Baldwin [Wed, 2 Feb 2022 20:18:43 +0000 (12:18 -0800)]
stand/efi: Pass --no-dynamic-linker to ld.bfd >= 2.34.
ld.bfd in binutils 2.34+ now reports an error in more cases for custom
ldscripts that do not place PHDRs in a LOAD segment. However, EFI
binaries are not dynamic binaries which need PHDRs, so pass
--no-dynamic-linker to disable this check.
John Baldwin [Tue, 1 Feb 2022 01:33:31 +0000 (17:33 -0800)]
fstyp: Remove __packed from struct exfat_de_label.
This fixes a -Waddress-of-packed-member warning about a possibly
unaligned pointer from GCC 9 when calling convert_label().
__packed has to be removed from struct exfat_dirent as well to fix an
alignment warning when casting from a struct exfat_dirent pointer to a
struct exfat_de_label pointer.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D32144
John Baldwin [Tue, 1 Feb 2022 01:11:27 +0000 (17:11 -0800)]
hyperv storvsc: Don't abuse struct sglist to hold virtual addresses.
struct sglist is intended for holding S/G lists of physical address
ranges, not virtual address ranges. GCC 9.x issues several warnings
due to casts between pointers and integers of different sizes as a
result (vm_paddr_t is 64-bits on i386). Instead, add a local 'struct
hv_sglist' which uses an array of 'struct iovec' to hold the S/G list
of virtual address ranges.
John Baldwin [Sat, 25 Sep 2021 18:24:35 +0000 (11:24 -0700)]
kernel: Disable errors for -Walloca-larger-than for GCC.
GCC complains about the use of alloca() with variable sizes (for XSAVE
state len) in sendsig() for i386. Modern XSAVE state is probably
getting a bit large for the i386 kstack, but downgrade the error to a
warning.
John Baldwin [Wed, 15 Sep 2021 16:03:18 +0000 (09:03 -0700)]
infiniband: Disable -Wredundant-decl warnings.
ib_uverbs_flow_resources_free() is declard in two header files in
upstream OFED. Disable the warning to avoid introducing diffs to fix
the build on GCC 9.
While here, fix the ibcore module to disable the same warnings
disabled in OFED_CFLAGS.
John Baldwin [Wed, 15 Sep 2021 16:03:17 +0000 (09:03 -0700)]
<linux/overflow.h>: Don't use __has_builtin().
GCC only added support for __has_builtin in GCC 10. However, all
supported versions of GCC and clang include these builtins so just use
them unconditionally.
John Baldwin [Thu, 18 Feb 2021 00:32:11 +0000 (16:32 -0800)]
Add a VA_IS_CLEANMAP() macro.
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.
In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.
A different exception is raised when we hit a 32bits breakpoint, rather than
a 64bits one, so handle those as well when COMPAT_FREEBSD32 is defined.
This should fix SIGBUS at least when using breakpoints with thumb2 code.
mlx5en(4): Use hard-coded 4K page size for RQ/SQ/CQ.
The page size specified for RQ, SQ and CQ is always in units of 4KBytes.
Make sure we subtract MLX5_ADAPTER_PAGE_SHIFT, 12, instead of PAGE_SHIFT
which may vary. This fixes support for using the mlx5en driver on systems
having non-4K page size.
Mark Johnston [Mon, 25 Apr 2022 13:13:03 +0000 (09:13 -0400)]
linuxkpi: Mitigate a seqlock livelock
Disable preemption in seqlock write sections when using the _irqsave
variant. This ensures that a writer can't be preempted and subsequently
starved by a reader running in a callout handler on the same CPU.
This fixes occasional display hangs seen when using the i915 driver.
Tested by: emaste, wulf
Reviewed by: wulf, hselasky
Sponsored by: The FreeBSD Foundation
Alex Richardson [Mon, 13 Sep 2021 09:16:05 +0000 (10:16 +0100)]
Add a test for https://reviews.freebsd.org/D31858 (PR 258310)
This test (based on https://github.com/jiixyj/epoll-shim/pull/32#issuecomment-891276654)
fails reproducibly on QEMU with KVM and `-smp 2` prior to D31858 (committed
as 98168a6e6c12dab8f608f6b5f5b0b175d2b87ef0) and passes with the patch applied.
Alex Richardson [Thu, 9 Sep 2021 10:46:53 +0000 (11:46 +0100)]
Export _mmap and __sys_mmap from libc.so
Unlike the other syscalls these two symbols were missing from the
version script. I noticed this while looking into the compiler-rt
runtime libraries for CHERI.
Reviewed by: brooks
Obtained from: https://github.com/CTSRD-CHERI/cheribsd/pull/1063
MFC after: 3 days
Ed Maste [Sat, 30 Apr 2022 19:25:35 +0000 (15:25 -0400)]
Update WITH_PROFILE description as it is yet removed
GCC still wants to link against (for example) libc_p.a when -pg is in
use, and it's unclear when and how this will be addressed. Change the
WITH_PROFILE option description to claim that it may be removed from an
unspecified future version of FreeBSD, rather than FreeBSD 14.
Reported by: Steve Kargl
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mark Johnston [Thu, 20 Jan 2022 13:23:38 +0000 (08:23 -0500)]
clockcalib: Fix an overflow bug
tc_counter_mask is an unsigned int and in the TSC timecounter is equal
to UINT_MAX, so the addition tc->tc_counter_mask + 1 can overflow to 0,
resulting in a hang during boot.
Fixes: c2705ceaeb09 ("x86: Speed up clock calibration")
Reviewed by: cperciva
Sponsored by: The FreeBSD Foundation
Implement ieee80211_is_data_present() and a subset of
ieee80211_is_bufferable_mmpdu() which hopefully is good enough in
the compat code for now.
This is partly in preparation for some TXQ changes coming up soon.
In ieee80211_beacon_loss() call into net80211::ieee80211_beacon_miss()
rather than manually bouncing our state. That should give us the
ability to send a probereq and see if the AP is till there rather than
right away going to scan.