Ed Maste [Mon, 14 Feb 2022 20:14:02 +0000 (15:14 -0500)]
Cirrus-CI: use qemu-nox11
We use -nographic for the smoke test and there is no need to pull in all
of the x11 deps. This saves some time and bandwidth during package
installation.
When I originally added Cirrus-CI support the -nox11 package was not
available.
Andrew Turner [Mon, 7 Feb 2022 11:47:04 +0000 (11:47 +0000)]
Fix the signal code on 32-bit breakpoints on arm64
When debugging 32-bit programs a debugger may insert a instruction that
will raise the undefined instruction trap. The kernel handles these
by raising a SIGTRAP, however the code was incorrect.
Fix this by using the expected TRAP_BRKPT signal code.
Andrew Turner [Tue, 1 Feb 2022 11:43:13 +0000 (11:43 +0000)]
Add the Arm SPE interrupt to acpidump
To support the Arm Statistical Profiling Extension (SPE) ACPI 6.3 added
a place to hold the SPE interrupt. Add to acpidump to show when printing
the Arm Generic Interrupt data.
Andrew Turner [Wed, 22 Dec 2021 17:26:33 +0000 (17:26 +0000)]
Teach DTrace about BTI on arm64
The Branch Target Identification (BTI) Armv8-A extension adds new
instructions that can be placed where we may indirrectly branch to,
e.g. at the start of a function called via a function pointer. We can't
emulate these in DTrace as the kernel will have raised a different
exception before the DTrace handler has run.
Skip over the BTI instruction if it's used as the first instruction in
a function.
Andrew Turner [Wed, 29 Dec 2021 12:10:49 +0000 (12:10 +0000)]
Fix undefined behaviour in the USB controllers
The USB controller drivers assume they can cast a NULL pointer to a
struct and find the address of a member. KUBSan complains about this so
replace with the __offsetof and __containerof macros that use either a
builtin function where available, or the same NULL pointer on older
compilers without the builtin.
Reviewers: hselasky
Subscribers: imp
Reviewed by: hselasky
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33865
Previously it depended on sysctl, which itself has no dependencies,
so rcorder(8) had a bit too much flexibility when choosing when to run
it. Make sure it runs just between 'fsck' and 'root'.
It is a bit more user-friendly if before running mount(8) the service
checks if there are any file systems left to be mounted. This patch
implements this behavior.
Also, while here, create mount points directories (as suggested by
otis).
Rick Macklem [Tue, 15 Feb 2022 22:18:23 +0000 (14:18 -0800)]
gssd: Modify /etc/rc.d/gssd so that it starts after NETWORKING
Arno Tuber reported via email that he needed to restart the gssd daemon
after booting, to get his Kerberized NFS mount to work.
Without this patch, rcorder shows that the gssd starts before NETWORKING
and kdc. The gssd will need NETWORKING to connect to the KDC and, if
the kdc is running on the same system, it does not make sense to start it
before the kdc. This fixed the problem for Arno.
While here, I also added a "# BEFORE: mountcritremote".
It does not affect ordering at this time, but I felt
it should be added, since the gssd needs to be running
when remote NFS mounts are done.
If the NVMe Controller doesn't support Namespace Management, it should
return "Invalid Namespace or Format" when the Host request Identify
Namespace with the global NSID value.
Chuck Tuffli [Sun, 30 Jan 2022 07:10:59 +0000 (23:10 -0800)]
bhyve nvme: Fix Set Features, AEN
NVMe Controllers which do not support Endurance Groups must return an
error when the Endurance Group Event Aggregate Log Change Notices bit is
set in Set Features, Asynchronous Event Configuration.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:57 +0000 (23:09 -0800)]
bhyve nvme: Fix LBA out-of-range calculation
The function which checks for a valid LBA range mistakenly named an
input value as NLB ("Number of Logical Blocks") instead of "number of
blocks". The NVMe specification defines NLB as a zero-based value (i.e.
NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed
parameter is a 1's-based value.
Fix is to rename the variable to avoid future confusion.
While in the neighborhood, also check that the starting LBA is less than
the size of the backing storage to avoid an integer overflow.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:10 +0000 (23:09 -0800)]
bhyve nvme: Update v1.4 Identify Controller data
Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE).
Also, do not advertise secure erase functionality in the Format NVM
Attributes field of the Identify Controller data structure as the
Controller does not implement secure erase.
Chuck Tuffli [Sun, 30 Jan 2022 07:08:47 +0000 (23:08 -0800)]
bhyve nvme: Add Temperature Threshold support
This adds the ability for a guest OS to send Set / Get Feature,
Temperature Threshold commands. The implementation assumes a constant
temperature and will generate an Asynchronous Event Notification if the
specified threshold is above/below this value. Although the
specification allows 9 temperature values, this implementation only
implements the Composite Temperature.
While in the neighborhood, move the clear of the CSTS register in the
reset function after all other cleanup. This avoids a race with the
guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe
emulation is actually complete with the reset.
Chuck Tuffli [Sun, 30 Jan 2022 07:07:29 +0000 (23:07 -0800)]
bhyve nvme: Remove redundant AER Limit checks
The NVMe emulation checked if the Asynchronous Event Request Limit
(a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function
is only called from nvme_opc_async_event_req() which also checks for
exceeding the AERL.
Chuck Tuffli [Sun, 30 Jan 2022 07:05:58 +0000 (23:05 -0800)]
bhyve nvme: Fix NVM Format completion status
The NVM Format command is unique among the Admin commands in that it
needs to finish asynchronously. For this reason, the emulation code
invented a synthetic completion status (NVME_NO_STATUS) to indicate that
the command was still in progress and the command processing loop should
not generate a completion message. The implementation used the value
0xffff for the synthetic value as this set both the Status Code and
Status Code Type fields to reserved values.
Format initialized the completion status to this value and expected
error cases to override it with a status code/type appropriate to the
situation. The macros used to set the NVMe status are careful not to
modify bit 0 (i.e. the phase bit), which with the synthetic completion
status, causes the phase bit to get out of sync. When running tests in a
guest with illegal NVM Format commands, Admin commands would eventually
hang because it appeared there were no completions due to the incorrect
phase bit value.
Fix is to only set NVME_NO_STATUS if the blockif delete command
succeeds. While in the neighborhood, add a missing break statement when
NVM Format is not supported.
Chuck Tuffli [Wed, 15 Dec 2021 07:17:55 +0000 (23:17 -0800)]
bhyve nvme: Inform guests of namespace resize
Register a "block resize" callback to be notified of changes to the
backing storage for the Namespace. Use this to generate an Asynchronous
Event Notification, Namespace Attributes Changed when the guest OS
provides an Asynchronous Event Request.
Chuck Tuffli [Wed, 1 Dec 2021 05:07:32 +0000 (21:07 -0800)]
bhyve blockif: fix blockif_candelete with Capsicum
NVMe conformance tests for the Format command failed if the
backing-storage for the bhyve device was a file instead of a Zvol. The
tests (and the specification) expect a Format to destroy all previously
written data. The bhyve NVMe emulation implements this by trimming /
deallocating all data from the backing-storage.
The blockif_candelete() function indicated the file did not support
deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE)
even though the kernel supported file hole punching. This occurs on
builds with Capsicum enabled because blockif did not allow the
fpathconf(2) right.
Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call.
Mark Johnston [Fri, 21 Jan 2022 19:54:05 +0000 (14:54 -0500)]
Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a
page fault occurs while copying data in or out of user buffers. The VFS
treats such errors specially and will retry the I/O operation (which may
have made some partial progress).
When the FreeBSD and Linux implementations of zfs_write() were merged,
the handling of errors from dmu_write_uio_dbuf() changed such that
EFAULT is not handled as a partial write. For example, when appending
to a file, the z_size field of the znode is not updated after a partial
write resulting in EFAULT.
Restore the old handling of errors from dmu_write_uio_dbuf() to fix
this. This should have no impact on Linux, which has special handling
for EFAULT already.
Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12964
(cherry picked from commit 063daa8350d4a78f96d1ee6550910363fd3756fb)
Mark Johnston [Fri, 21 Jan 2022 18:28:13 +0000 (13:28 -0500)]
Avoid memory allocations in the ARC eviction thread
When the eviction thread goes to shrink an ARC state, it allocates a set
of marker buffers used to hold its place in the state's sublists.
This can be problematic in low memory conditions, since
1) the allocation can be substantial, as we allocate NCPU markers;
2) on at least FreeBSD, page reclamation can block in
arc_wait_for_eviction()
In particular, in stress tests it's possible to hit a deadlock on
FreeBSD when the number of free pages is very low, wherein the system is
waiting for the page daemon to reclaim memory, the page daemon is
waiting for the ARC eviction thread to finish, and the ARC eviction
thread is blocked waiting for more memory.
Try to reduce the likelihood of such deadlocks by pre-allocating markers
for the eviction thread at ARC initialization time. When evicting
buffers from an ARC state, check to see if the current thread is the ARC
eviction thread, and use the pre-allocated markers for that purpose
rather than dynamically allocating them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12985
(cherry picked from commit 6e2a59181e286a397d260fa9f140b58688d60c58)
Mark Johnston [Thu, 10 Feb 2022 20:32:23 +0000 (15:32 -0500)]
libctf: Rip out CTFv1 support
CTFv1 was obsolete before libctf was imported into FreeBSD, and
ctfconvert/ctfmerge can emit only CTFv2. Make ctf.h a bit easier to
maintain by ripping v1 support out. No functional change intended.
Mark Johnston [Thu, 10 Feb 2022 20:31:26 +0000 (15:31 -0500)]
tcp: Avoid conditionally defined fields in union lro_address
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure. Fix this by
unconditionally defining the address fields.
As a side effect, this eliminates some duplication in the kernel's CTF
type graph.
Reviewed by: rscheff, tuexen
Sponsored by: The FreeBSD Foundation
linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC. Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
Bjoern A. Zeeb [Sun, 20 Feb 2022 18:10:45 +0000 (18:10 +0000)]
Bump __FreeBSD_version to 1300526 for LinuxKPI changes.
This successfully builds against drm-fbsd13-kmod-5.4.144.g20220128
so no conflicting changes on the MFC. Given there are overlaps, bump
__FreeBSD_version so they can be detected and removed as pleases.
Bjoern A. Zeeb [Thu, 17 Feb 2022 00:58:12 +0000 (00:58 +0000)]
LinuxKPI: 802.11 simplify beacon checks in rx path
In linuxkpi_ieee80211_rx() check if the frame is a beacon once upfront
and use the result for enhanced debugging and further checks.
This was done intially for rx_status->device_timestamp debugging.
Bjoern A. Zeeb [Fri, 1 Oct 2021 10:51:50 +0000 (10:51 +0000)]
LinuxKPI: implement dma_sync_single_for_*, apply to (un)map single/sg
Implement dma_sync_single_for_{cpu,device} translating the Linux
DMA_ flags to BUS_DMASYNC_ combinations. Make map_single/unmap_single*
functions call the respective sync function. Apply the same logic to
the scatter-gather list map/unmap functions.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D32255
Bjoern A. Zeeb [Wed, 16 Feb 2022 23:57:27 +0000 (23:57 +0000)]
LinuxKPI: 802.11: disable ic_headroom for the moment
There is a problem with some drivers, such as rtw88, asking for more
headroom than we currently can handle throughout the stack (we have
other legacy wireless driver in the tree with similar problems).
This may trigger an assertion in the TCP syncache where we are checking
for a reply to fit in MHLEN.
While for the moment we still copy data from mbufs to skbs,
we can simply disable the extra headroom request in ic_headroom and
deal with it ourselves (which we already did anyway).
Leave a link to the thread on freebsd-transport detailing more of the
problem so we can find it again and solve it here or there.
Bjoern A. Zeeb [Thu, 17 Feb 2022 00:15:56 +0000 (00:15 +0000)]
LinuxKPI: 802.11 advertise full offload scanning based on hw_scan only
We disabled hw_scan for drivers not advertising SINGLE_SCAN_ON_ALL_BANDS.
Do not depend on this hw flag to set IEEE80211_FEXT_SCAN_OFFLOAD for
net80211 as otherwise scanning will never work.
Long-term we probably want to re-think how we do/integrate hw_scan
better in net80211.
Bjoern A. Zeeb [Tue, 15 Feb 2022 23:45:15 +0000 (23:45 +0000)]
LinuxKPI: 802.11 header updates and add/adjust source dependencies.
This update is for more/newer versions of drivers:
- add and properly place more structs, enums, defines needed by drivers.
- correct types of struct fields.
- make various function arguments const.
- move REG_RULE() macro to its own file regulatory.h and
use macros for calculations.
- add linuxkpi_ieee80211_get_channel() implementation.
- change linuxkpi_ieee80211_ifattach() to return int for error checking.
Bjoern A. Zeeb [Wed, 16 Feb 2022 02:10:10 +0000 (02:10 +0000)]
LinuxKPI: skbuff updates
Various updates to skbuff for new/updated drivers and some housekeeping:
- update types and struct members, add new (stub) functions
- improve freeing of frags.
- fix an issue with sleeping during alloc for dev_alloc_skb().
- Adjust a KASSERT for skb_reserve() which apparently can be called
multiple times if no data was put into the skb yet.
- move the sysctl from linux_8022.c (which may be in a different module)
to linux_skbuff.c so in case we turn debugging on we do not run into
unresolved symbols. Rename the sysctl variable to be less conflicting
and update debugging macros along with that; also add IMPROVE().
- add DDB support to show an skbuff.
- adjust comments/whitespace.
Bjoern A. Zeeb [Wed, 16 Feb 2022 03:20:29 +0000 (03:20 +0000)]
LinuxKPI: 802.11: defer workq allocation until we have a name
Turned out all the workq's taskqueues were named "wlanNA" if you had
more then one card in a machine as by the time we called wiphy_name()
the device name was not set yet and we returned the fallback.
Move the alloc_ordered_workqueue() from linuxkpi_ieee80211_alloc_hw()
to linuxkpi_ieee80211_ifattach() at which time the device name has
to be set to give us a unique name.
Bjoern A. Zeeb [Wed, 16 Feb 2022 03:48:54 +0000 (03:48 +0000)]
LinuxKPI: 802.11 assign an(y) early chandef
The Realtek driver assumes an early chandef to be set. At the time
of linuxkpi_ieee80211_ifattach() we do not really know one yet so
try to find the first one which is available and set that.
This prevents a NULL-deref panic.
Bjoern A. Zeeb [Wed, 16 Feb 2022 03:00:34 +0000 (03:00 +0000)]
LinuxKPI: 802.11 scan update
Realtek's rtw88 is returning a hard-coded 1 in case they cannot
hw_scan (fw not advertising it). In that case if we want any scan
to run we need to fall-back to sw scan. Start dealing with this.
Long-term we probably need to keep internal state.
Bjoern A. Zeeb [Wed, 9 Feb 2022 11:58:40 +0000 (11:58 +0000)]
LinuxKPI: add kstrtoint_from_user() and DECLARE_FLEX_ARRAY()
Add an implementation of kstrtoint_from_user() based on the other
implementations and an attempt at DECLARE_FLEX_ARRAY() which works
for the driver needing it.
Bjoern A. Zeeb [Mon, 14 Feb 2022 22:29:38 +0000 (22:29 +0000)]
LinuxKPI: 802.11: get rid of lkpi_ic_getradiocaps warnings
Users are seeing warnings about 2 channels (1 per band)
triggered by an ioctl from wpa_supplicant usually:
lkpi_ic_getradiocaps: Adding chan ... returned error 55
This was an early FAQ.
Check the current number of channels against maxchans and the return
code from net80211. In case net80211 reports that we reached the limit
do not print the warning and do not try to add further channels.
Bjoern A. Zeeb [Wed, 9 Feb 2022 12:07:44 +0000 (12:07 +0000)]
LinuxKPI: add linux/pm_qos.h
Add a linux/pm_qos.h with three dummy functions and a struct as needed
by a driver and drm-kmod [1] in main with no intend to support this for
the moment.
Bjoern A. Zeeb [Tue, 8 Feb 2022 23:47:15 +0000 (23:47 +0000)]
TCP syncache: enhance KASSERT output
Improve the "syncache: mbuf too small" assertion message with various
variables (some not actually needed) but enough that it will be obvious
if (a) we use IPv4 or IPv6, (b) if UDP tunneling is on, (c) what
max_linkhdr is, and (d) what MHLEN is.
This should help diagnostics in the future.
The case was hit with wireless drivers setting a large ic_headroom
and using IPv6.
John Baldwin [Thu, 10 Feb 2022 20:47:08 +0000 (12:47 -0800)]
libthr: Disable stack unwinding on ARM.
When a thread exits, _Unwind_ForcedUnwind() is used to walk up stack
frames executing pending cleanups pushed by pthread_cleanup_push().
The cleanups are popped by thread_unwind_stop() which is passed as a
callback function to _Unwind_ForcedUnwind().
LLVM's libunwind uses a different function type for the callback on
32-bit ARM relative to all other platforms. The previous unwind.h
header (as well as the unwind.h from libcxxrt) use the non-ARM type on
all platforms, so this has likely been broken on 32-bit arm since it
switched to using LLVM's libunwind.
For now, just disable stack unwinding on 32-bit arm to unbreak the
build until a proper fix is tested.
John Baldwin [Thu, 27 Jan 2022 22:42:40 +0000 (14:42 -0800)]
Change the return value of _Unwind_GetCFA in include/unwind.h.
I tested the original commit as part of a series that culminates in
removing this header and installing LLVM libunwind's unwind.h in its
place so missed updating this header as was done in b84693501af6.
Pointy hat to: jhb
Reported by: kevans
Fixes: 3a502289d316 Use uintptr_t for return type of _Unwind_GetCFA.
Kristof Provost [Tue, 1 Feb 2022 17:25:57 +0000 (18:25 +0100)]
pf: deal with tables gaining or losing counters
When we create a table without counters, add an entry and later
re-define the table to have counters we wound up trying to read
non-existent counters.
We now cope with this by attempting to add them if needed, removing them
when they're no longer needed and not trying to read from counters that
are not present.
Kenneth D. Merry [Mon, 24 Jan 2022 21:19:25 +0000 (16:19 -0500)]
Fix non-printable characters in NVMe model and serial numbers.
The NVMe 1.4 spec simply says that Model and Serial numbers are
ASCII strings. Unlike SCSI, it doesn't prohibit non-printable
characters or say that the strings should be padded with spaces.
Since 2014, we have had cam_strvis_sbuf(), which gives additional
options for handling non-ASCII characters. That behavior hasn't
been available for non-sbuf consumers, so users of cam_strvis()
were left with having octal ASCII codes inserted.
So, to avoid having garbage or octal chracters in the strings, use
cam_strvis_sbuf() to create a new function, cam_strvis_flag(), and
re-implement cam_strvis() using cam_strvis_flag().
Now, for the NVMe drives, we can use cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_SPC flag. This transforms non-printable
characters into spaces.
sys/cam/cam.c:
Add a new function, cam_strvis_flag(), that creates an sbuf
on the stack with the user's destination buffer, and calls
cam_strvis_sbuf() with the given flag argument.
Re-implement cam_strvis() to call cam_strvis_flag with the
CAM_STRVIS_FLAG_NONASCII_ESC argument. This should be the
equivalent of the old cam_strvis() function, except for the
overhead of creating the sbuf and calling sbuf_putc/printf.
sys/cam/cam.h:
Declaration for cam_strvis_flag.
sys/cam/nvme/nvme_all.c:
In nvme_print_ident, use the NONASCII_SPC flag with
cam_strvis_flag().
sys/cam/nvme/nvme_da.c:
In ndaregister(), use cam_strvis_flag() with the
NONASCII_SPC flag for the disk description and serial
number we report to GEOM.
sys/cam/nvme/nvme_xpt.c:
In nvme_probe_done(), use cam_strvis_flag with the
NONASCII_SPC flag when storing the drive serial number
in the CAM EDT.