Michael Tuexen [Tue, 28 Sep 2021 03:25:58 +0000 (05:25 +0200)]
sctp: fix usage of stream scheduler functions
sctp_ss_scheduled() should only be called for streams that are
scheduled. So call sctp_ss_remove_from_stream() before it.
This bug was uncovered by the earlier cleanup.
Michael Tuexen [Tue, 21 Sep 2021 15:13:57 +0000 (17:13 +0200)]
sctp: Simplify stream scheduler usage
Callers are getting the stcb send lock, so just KASSERT that.
No need to signal this when calling stream scheduler functions.
No functional change intended.
Mark Johnston [Sat, 11 Sep 2021 14:15:21 +0000 (10:15 -0400)]
sctp: Tighten up locking around sctp_aloc_assoc()
All callers of sctp_aloc_assoc() mark the PCB as connected after a
successful call (for one-to-one-style sockets). In all cases this is
done without the PCB lock, so the PCB's flags can be corrupted. We also
do not atomically check whether a one-to-one-style socket is a listening
socket, which violates various assumptions in solisten_proto().
We need to hold the PCB lock across all of sctp_aloc_assoc() to fix
this. In order to do that without introducing lock order reversals, we
have to hold the global info lock as well.
So:
- Convert sctp_aloc_assoc() so that the inp and info locks are
consistently held. It returns with the association lock held, as
before.
- Fix an apparent bug where we failed to remove an association from a
global hash if sctp_add_remote_addr() fails.
- sctp_select_a_tag() is called when initializing an association, and it
acquires the global info lock. To avoid lock recursion, push locking
into its callers.
- Introduce sctp_aloc_assoc_connected(), which atomically checks for a
listening socket and sets SCTP_PCB_FLAGS_CONNECTED.
There is still one edge case in sctp_process_cookie_new() where we do
not update PCB/socket state correctly.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31908
Michael Tuexen [Mon, 9 Aug 2021 07:23:42 +0000 (09:23 +0200)]
ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options
Fix getsockopt() for the IPPROTO_IPV6 level socket options with the
following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS,
IPV6_DSTOPTS, and IPV6_NEXTHOP.
Reviewed by: markj
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D31458
Ed Maste [Mon, 14 Feb 2022 20:14:02 +0000 (15:14 -0500)]
Cirrus-CI: use qemu-nox11
We use -nographic for the smoke test and there is no need to pull in all
of the x11 deps. This saves some time and bandwidth during package
installation.
When I originally added Cirrus-CI support the -nox11 package was not
available.
Andrew Turner [Mon, 7 Feb 2022 11:47:04 +0000 (11:47 +0000)]
Fix the signal code on 32-bit breakpoints on arm64
When debugging 32-bit programs a debugger may insert a instruction that
will raise the undefined instruction trap. The kernel handles these
by raising a SIGTRAP, however the code was incorrect.
Fix this by using the expected TRAP_BRKPT signal code.
Andrew Turner [Tue, 1 Feb 2022 11:43:13 +0000 (11:43 +0000)]
Add the Arm SPE interrupt to acpidump
To support the Arm Statistical Profiling Extension (SPE) ACPI 6.3 added
a place to hold the SPE interrupt. Add to acpidump to show when printing
the Arm Generic Interrupt data.
Andrew Turner [Wed, 22 Dec 2021 17:26:33 +0000 (17:26 +0000)]
Teach DTrace about BTI on arm64
The Branch Target Identification (BTI) Armv8-A extension adds new
instructions that can be placed where we may indirrectly branch to,
e.g. at the start of a function called via a function pointer. We can't
emulate these in DTrace as the kernel will have raised a different
exception before the DTrace handler has run.
Skip over the BTI instruction if it's used as the first instruction in
a function.
Andrew Turner [Wed, 29 Dec 2021 12:10:49 +0000 (12:10 +0000)]
Fix undefined behaviour in the USB controllers
The USB controller drivers assume they can cast a NULL pointer to a
struct and find the address of a member. KUBSan complains about this so
replace with the __offsetof and __containerof macros that use either a
builtin function where available, or the same NULL pointer on older
compilers without the builtin.
Reviewers: hselasky
Subscribers: imp
Reviewed by: hselasky
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33865
Previously it depended on sysctl, which itself has no dependencies,
so rcorder(8) had a bit too much flexibility when choosing when to run
it. Make sure it runs just between 'fsck' and 'root'.
It is a bit more user-friendly if before running mount(8) the service
checks if there are any file systems left to be mounted. This patch
implements this behavior.
Also, while here, create mount points directories (as suggested by
otis).
Rick Macklem [Tue, 15 Feb 2022 22:18:23 +0000 (14:18 -0800)]
gssd: Modify /etc/rc.d/gssd so that it starts after NETWORKING
Arno Tuber reported via email that he needed to restart the gssd daemon
after booting, to get his Kerberized NFS mount to work.
Without this patch, rcorder shows that the gssd starts before NETWORKING
and kdc. The gssd will need NETWORKING to connect to the KDC and, if
the kdc is running on the same system, it does not make sense to start it
before the kdc. This fixed the problem for Arno.
While here, I also added a "# BEFORE: mountcritremote".
It does not affect ordering at this time, but I felt
it should be added, since the gssd needs to be running
when remote NFS mounts are done.
If the NVMe Controller doesn't support Namespace Management, it should
return "Invalid Namespace or Format" when the Host request Identify
Namespace with the global NSID value.
Chuck Tuffli [Sun, 30 Jan 2022 07:10:59 +0000 (23:10 -0800)]
bhyve nvme: Fix Set Features, AEN
NVMe Controllers which do not support Endurance Groups must return an
error when the Endurance Group Event Aggregate Log Change Notices bit is
set in Set Features, Asynchronous Event Configuration.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:57 +0000 (23:09 -0800)]
bhyve nvme: Fix LBA out-of-range calculation
The function which checks for a valid LBA range mistakenly named an
input value as NLB ("Number of Logical Blocks") instead of "number of
blocks". The NVMe specification defines NLB as a zero-based value (i.e.
NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed
parameter is a 1's-based value.
Fix is to rename the variable to avoid future confusion.
While in the neighborhood, also check that the starting LBA is less than
the size of the backing storage to avoid an integer overflow.
Chuck Tuffli [Sun, 30 Jan 2022 07:09:10 +0000 (23:09 -0800)]
bhyve nvme: Update v1.4 Identify Controller data
Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE).
Also, do not advertise secure erase functionality in the Format NVM
Attributes field of the Identify Controller data structure as the
Controller does not implement secure erase.
Chuck Tuffli [Sun, 30 Jan 2022 07:08:47 +0000 (23:08 -0800)]
bhyve nvme: Add Temperature Threshold support
This adds the ability for a guest OS to send Set / Get Feature,
Temperature Threshold commands. The implementation assumes a constant
temperature and will generate an Asynchronous Event Notification if the
specified threshold is above/below this value. Although the
specification allows 9 temperature values, this implementation only
implements the Composite Temperature.
While in the neighborhood, move the clear of the CSTS register in the
reset function after all other cleanup. This avoids a race with the
guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe
emulation is actually complete with the reset.
Chuck Tuffli [Sun, 30 Jan 2022 07:07:29 +0000 (23:07 -0800)]
bhyve nvme: Remove redundant AER Limit checks
The NVMe emulation checked if the Asynchronous Event Request Limit
(a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function
is only called from nvme_opc_async_event_req() which also checks for
exceeding the AERL.
Chuck Tuffli [Sun, 30 Jan 2022 07:05:58 +0000 (23:05 -0800)]
bhyve nvme: Fix NVM Format completion status
The NVM Format command is unique among the Admin commands in that it
needs to finish asynchronously. For this reason, the emulation code
invented a synthetic completion status (NVME_NO_STATUS) to indicate that
the command was still in progress and the command processing loop should
not generate a completion message. The implementation used the value
0xffff for the synthetic value as this set both the Status Code and
Status Code Type fields to reserved values.
Format initialized the completion status to this value and expected
error cases to override it with a status code/type appropriate to the
situation. The macros used to set the NVMe status are careful not to
modify bit 0 (i.e. the phase bit), which with the synthetic completion
status, causes the phase bit to get out of sync. When running tests in a
guest with illegal NVM Format commands, Admin commands would eventually
hang because it appeared there were no completions due to the incorrect
phase bit value.
Fix is to only set NVME_NO_STATUS if the blockif delete command
succeeds. While in the neighborhood, add a missing break statement when
NVM Format is not supported.
Chuck Tuffli [Wed, 15 Dec 2021 07:17:55 +0000 (23:17 -0800)]
bhyve nvme: Inform guests of namespace resize
Register a "block resize" callback to be notified of changes to the
backing storage for the Namespace. Use this to generate an Asynchronous
Event Notification, Namespace Attributes Changed when the guest OS
provides an Asynchronous Event Request.
Chuck Tuffli [Wed, 1 Dec 2021 05:07:32 +0000 (21:07 -0800)]
bhyve blockif: fix blockif_candelete with Capsicum
NVMe conformance tests for the Format command failed if the
backing-storage for the bhyve device was a file instead of a Zvol. The
tests (and the specification) expect a Format to destroy all previously
written data. The bhyve NVMe emulation implements this by trimming /
deallocating all data from the backing-storage.
The blockif_candelete() function indicated the file did not support
deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE)
even though the kernel supported file hole punching. This occurs on
builds with Capsicum enabled because blockif did not allow the
fpathconf(2) right.
Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call.
Mark Johnston [Fri, 21 Jan 2022 19:54:05 +0000 (14:54 -0500)]
Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
FreeBSD's implementation of zfs_uio_fault_move() returns EFAULT when a
page fault occurs while copying data in or out of user buffers. The VFS
treats such errors specially and will retry the I/O operation (which may
have made some partial progress).
When the FreeBSD and Linux implementations of zfs_write() were merged,
the handling of errors from dmu_write_uio_dbuf() changed such that
EFAULT is not handled as a partial write. For example, when appending
to a file, the z_size field of the znode is not updated after a partial
write resulting in EFAULT.
Restore the old handling of errors from dmu_write_uio_dbuf() to fix
this. This should have no impact on Linux, which has special handling
for EFAULT already.
Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12964
(cherry picked from commit 063daa8350d4a78f96d1ee6550910363fd3756fb)
Mark Johnston [Fri, 21 Jan 2022 18:28:13 +0000 (13:28 -0500)]
Avoid memory allocations in the ARC eviction thread
When the eviction thread goes to shrink an ARC state, it allocates a set
of marker buffers used to hold its place in the state's sublists.
This can be problematic in low memory conditions, since
1) the allocation can be substantial, as we allocate NCPU markers;
2) on at least FreeBSD, page reclamation can block in
arc_wait_for_eviction()
In particular, in stress tests it's possible to hit a deadlock on
FreeBSD when the number of free pages is very low, wherein the system is
waiting for the page daemon to reclaim memory, the page daemon is
waiting for the ARC eviction thread to finish, and the ARC eviction
thread is blocked waiting for more memory.
Try to reduce the likelihood of such deadlocks by pre-allocating markers
for the eviction thread at ARC initialization time. When evicting
buffers from an ARC state, check to see if the current thread is the ARC
eviction thread, and use the pre-allocated markers for that purpose
rather than dynamically allocating them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #12985
(cherry picked from commit 6e2a59181e286a397d260fa9f140b58688d60c58)
Mark Johnston [Thu, 10 Feb 2022 20:32:23 +0000 (15:32 -0500)]
libctf: Rip out CTFv1 support
CTFv1 was obsolete before libctf was imported into FreeBSD, and
ctfconvert/ctfmerge can emit only CTFv2. Make ctf.h a bit easier to
maintain by ripping v1 support out. No functional change intended.
Mark Johnston [Thu, 10 Feb 2022 20:31:26 +0000 (15:31 -0500)]
tcp: Avoid conditionally defined fields in union lro_address
The layout of the structure ends up depending on whether the including
file includes opt_inet.h and opt_inet6.h, so different compilation units
can end up seeing different versions of the structure. Fix this by
unconditionally defining the address fields.
As a side effect, this eliminates some duplication in the kernel's CTF
type graph.
Reviewed by: rscheff, tuexen
Sponsored by: The FreeBSD Foundation
linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC. Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.
Reimplement bdf0f24bb16d556a5b by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.
Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31968
Bjoern A. Zeeb [Sun, 20 Feb 2022 18:10:45 +0000 (18:10 +0000)]
Bump __FreeBSD_version to 1300526 for LinuxKPI changes.
This successfully builds against drm-fbsd13-kmod-5.4.144.g20220128
so no conflicting changes on the MFC. Given there are overlaps, bump
__FreeBSD_version so they can be detected and removed as pleases.
Bjoern A. Zeeb [Thu, 17 Feb 2022 00:58:12 +0000 (00:58 +0000)]
LinuxKPI: 802.11 simplify beacon checks in rx path
In linuxkpi_ieee80211_rx() check if the frame is a beacon once upfront
and use the result for enhanced debugging and further checks.
This was done intially for rx_status->device_timestamp debugging.
Bjoern A. Zeeb [Fri, 1 Oct 2021 10:51:50 +0000 (10:51 +0000)]
LinuxKPI: implement dma_sync_single_for_*, apply to (un)map single/sg
Implement dma_sync_single_for_{cpu,device} translating the Linux
DMA_ flags to BUS_DMASYNC_ combinations. Make map_single/unmap_single*
functions call the respective sync function. Apply the same logic to
the scatter-gather list map/unmap functions.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D32255
Bjoern A. Zeeb [Wed, 16 Feb 2022 23:57:27 +0000 (23:57 +0000)]
LinuxKPI: 802.11: disable ic_headroom for the moment
There is a problem with some drivers, such as rtw88, asking for more
headroom than we currently can handle throughout the stack (we have
other legacy wireless driver in the tree with similar problems).
This may trigger an assertion in the TCP syncache where we are checking
for a reply to fit in MHLEN.
While for the moment we still copy data from mbufs to skbs,
we can simply disable the extra headroom request in ic_headroom and
deal with it ourselves (which we already did anyway).
Leave a link to the thread on freebsd-transport detailing more of the
problem so we can find it again and solve it here or there.
Bjoern A. Zeeb [Thu, 17 Feb 2022 00:15:56 +0000 (00:15 +0000)]
LinuxKPI: 802.11 advertise full offload scanning based on hw_scan only
We disabled hw_scan for drivers not advertising SINGLE_SCAN_ON_ALL_BANDS.
Do not depend on this hw flag to set IEEE80211_FEXT_SCAN_OFFLOAD for
net80211 as otherwise scanning will never work.
Long-term we probably want to re-think how we do/integrate hw_scan
better in net80211.
Bjoern A. Zeeb [Tue, 15 Feb 2022 23:45:15 +0000 (23:45 +0000)]
LinuxKPI: 802.11 header updates and add/adjust source dependencies.
This update is for more/newer versions of drivers:
- add and properly place more structs, enums, defines needed by drivers.
- correct types of struct fields.
- make various function arguments const.
- move REG_RULE() macro to its own file regulatory.h and
use macros for calculations.
- add linuxkpi_ieee80211_get_channel() implementation.
- change linuxkpi_ieee80211_ifattach() to return int for error checking.
Bjoern A. Zeeb [Wed, 16 Feb 2022 02:10:10 +0000 (02:10 +0000)]
LinuxKPI: skbuff updates
Various updates to skbuff for new/updated drivers and some housekeeping:
- update types and struct members, add new (stub) functions
- improve freeing of frags.
- fix an issue with sleeping during alloc for dev_alloc_skb().
- Adjust a KASSERT for skb_reserve() which apparently can be called
multiple times if no data was put into the skb yet.
- move the sysctl from linux_8022.c (which may be in a different module)
to linux_skbuff.c so in case we turn debugging on we do not run into
unresolved symbols. Rename the sysctl variable to be less conflicting
and update debugging macros along with that; also add IMPROVE().
- add DDB support to show an skbuff.
- adjust comments/whitespace.
Bjoern A. Zeeb [Wed, 16 Feb 2022 03:20:29 +0000 (03:20 +0000)]
LinuxKPI: 802.11: defer workq allocation until we have a name
Turned out all the workq's taskqueues were named "wlanNA" if you had
more then one card in a machine as by the time we called wiphy_name()
the device name was not set yet and we returned the fallback.
Move the alloc_ordered_workqueue() from linuxkpi_ieee80211_alloc_hw()
to linuxkpi_ieee80211_ifattach() at which time the device name has
to be set to give us a unique name.
Bjoern A. Zeeb [Wed, 16 Feb 2022 03:48:54 +0000 (03:48 +0000)]
LinuxKPI: 802.11 assign an(y) early chandef
The Realtek driver assumes an early chandef to be set. At the time
of linuxkpi_ieee80211_ifattach() we do not really know one yet so
try to find the first one which is available and set that.
This prevents a NULL-deref panic.