The IBTA specification has new speed - NDR. That speed supports signaling
rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX
device to IB speed and width. Added translation of new 100Gb, 200Gb and
400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.
Arka Sharma [Fri, 18 Feb 2022 15:34:15 +0000 (09:34 -0600)]
mmap map_at_zero test: handle W^X
Use kern.elfXX.allow_wx to decide whether to map W+X or W-only memory.
Future work could expand this test to add an "allow_wx" axis to the
test matrix, but I would argue that a separate test should be written,
since that's orthogonal to map_at_zero.
Austin Zhang [Thu, 13 Jan 2022 17:13:25 +0000 (11:13 -0600)]
atrtc: reads Century field from FADT table
The ACPI spec describes the FADT->Century field as:
The RTC CMOS RAM index to the century of data value (hundred and
thousand year decimals). If this field contains a zero, then the
RTC centenary feature is not supported. If this field has a non-zero
value, then this field contains an index into RTC RAM space that
OSPM can use to program the centenary field.
Use this field to decide whether to program the CENTURY register
of the CMOS RTC device.
Yinlong Lu [Thu, 29 Apr 2021 10:04:36 +0000 (05:04 -0500)]
ipmi: support getting address from EFI
The original implementation only supports getting the address from legacy
BIOS (by searching for the SMBIOS_SIG pattern in a fixed address space).
Try to get the SMBIOS table from EFI through efirt (EFI Runtime Services)
firstly. Continue to search in the legacy BIOS if a NULL address is
returned from EFI.
By this way the ipmi function supports both legacy BIOS and UEFI systems.
Jessica Clarke [Thu, 3 Mar 2022 03:22:30 +0000 (03:22 +0000)]
fuse: Fix build on 32-bit architectures
MFC 3d721de049be ("Fix NFS exports of FUSE file systems for big
directories") missed a case of a uint64_t from HEAD that should be a
u_long in 13 due to KPI differences. Specifically, HEAD has b214fcceacad
("Change VOP_READDIR's cookies argument to a **uint64_t"), but stable/13
does not.
Alan Somers [Sun, 2 Jan 2022 17:18:47 +0000 (10:18 -0700)]
Fix NFS exports of FUSE file systems for big directories
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
fusefs: optimize NFS readdir for FUSE_NO_OPENDIR_SUPPORT
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
fusefs: require FUSE_NO_OPENDIR_SUPPORT for NFS exporting
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
Justin Hibbits [Thu, 3 Feb 2022 23:20:36 +0000 (17:20 -0600)]
powerpc/atomic: Fix atomic_testand_*_long on powerpc64
After b5d227b0 FreeBSD was panicking on boot with "Duplicate free" in
UMA. Analyzing the asm, the '1' mask was treated as an integer, rather
than a long, causing 'slw' (shift left word) to be used for the shifting
instruction, not 'sld' (shift left double). This means the upper bits
of the bitfield were not getting used, resulting in corruption of the
bitfield.
While fixing this, the 'and' check of the mask does not need to be
recorded, so don't record (drop the '.').
Add machine-optimized implementations for the following:
* atomic_testandset_int
* atomic_testandclear_int
* atomic_testandset_long
* atomic_testandclear_long
This fixes the build with ISA_206_ATOMICS enabled.
Add the associated atomic_testandset_32, atomic_testandclear_32, so
that ice(4) can potentially build.
Navdeep Parhar [Fri, 4 Feb 2022 21:16:35 +0000 (13:16 -0800)]
cxgbe(4): Changes to the fatal error handler.
* New error_flags that can be used from the error ithread and elsewhere
without a synch_op.
* Stop the adapter immediately in t4_fatal_err but defer most of the
rest of the handling to a task. The task is allowed to sleep, unlike
the ithread. Remove async_event_task as it is no longer needed.
* Dump the devlog, CIMLA, and PCIE_FW exactly once on any fatal error
involving the firmware or the CIM block. While here, dump some
additional info (see dump_cim_regs) for these errors.
* If both reset_on_fatal_err and panic_on_fatal_err are set then attempt
a reset first and do not panic the system if it is successful.
Dimitry Andric [Sun, 27 Feb 2022 12:53:19 +0000 (13:53 +0100)]
Apply lld fixes for internal errors building perl on 32-bit PowerPC
Merge commit f457863ae345 from llvm git (by Fangrui Song):
[ELF] Support REL-format R_AARCH64_NONE relocation
-fprofile-use=/-fprofile-sample-use= compiles may produce REL-format
.rel.llvm.call-graph-profile even if the prevailing format is RELA on AArch64.
Add R_AARCH64_NONE to getImplicitAddend to fix this linker error:
```
ld.lld: error: internal linker error: cannot read addend for relocation R_AARCH64_NONE
PLEASE submit a bug report to https://crbug.com and run tools/clang/scripts/process_crashreports.py (only works inside Google) which will upload a report and include the crash backtrace.
```
Merge commit 53fc5d9b9a01 from llvm git (by Fangrui Song):
[ELF] Support R_PPC_NONE/R_PPC64_NONE in getImplicitAddend
Support R_PPC_REL32 to fix `ld.lld: error: drti.c:(.SUNW_dof+0x4E4): internal linker error: cannot read addend for relocation R_PPC_REL32`.
While here, add some common relocation types for AArch64, PPC, and PPC64.
We perform minimum tests.
Warner Losh [Fri, 9 Apr 2021 22:35:17 +0000 (16:35 -0600)]
efivar: Attempt to fix setting/printing/deleting EFI vars with '-' in their name
Due to how we're parsing UUIDs, we were disallowing setting, printing or
deleting any UEFI variable with a '-' in it when you attempted to do that
operation with the exact name (wildcard reporting was unaffected). Fix the
parser to loop over all the dashes in the name and only give up when all
possible matches are exhausted.
Eric van Gyzen [Thu, 24 Feb 2022 22:53:03 +0000 (16:53 -0600)]
sendfile_test: fix copy-paste bug
Require the newly opened file descriptor to be good, instead of
re-requiring the one that was required three lines earlier.
Thankfully, opening /dev/null is really unlikely to fail.
Eric van Gyzen [Thu, 17 Feb 2022 15:53:48 +0000 (09:53 -0600)]
elfdump: handle small files more gracefully
elfdump -E on an empty file would complain "Invalid argument" because
it tried to mmap zero bytes. With the -E flag, elfdump should
simply exit non-zero. For tiny files, the code would reference off
the end of the mapped region.
Ensure the file is large enough to contain an ELF header before mapping it.
Eric van Gyzen [Fri, 14 Jan 2022 14:12:57 +0000 (08:12 -0600)]
Allow downstream projects to easily add private and internal libs
Allow projects based on the FreeBSD tree to append to _PRIVATELIBS
and _INTERNALLIBS by simply maintaining their own lists of
LOCAL_PRIVATELIBS and LOCAL_INTERNALLIBS, respectively.
Eric van Gyzen [Fri, 1 Oct 2021 11:25:48 +0000 (06:25 -0500)]
sem_clockwait_np test: fix usage of ATF API
ATF_REQUIRE_ERRNO requires the given errno iff the given expression is
true. These test cases used it incorrectly, potentially allowing
sem_clockwait_np to succeed when it was expected to fail. Use separate
ATF calls to require failure and the expected errno.
Eric van Gyzen [Fri, 1 Oct 2021 10:37:17 +0000 (05:37 -0500)]
sem_clockwait_np test: relax time constraint on VMs
In a guest on a busy hypervisor, the time remaining after an
interrupted sleep could be much lower than other environments.
Relax the lower bound on VMs.
Eric van Gyzen [Fri, 23 Jul 2021 13:49:55 +0000 (08:49 -0500)]
aio_md_test: NUL-terminate result of readlink
readlink does not NUL-terminate the output buffer. This led to spurious
failures to destroy the md device because the unit number was garbage.
NUL-terminate the output buffer.
Eric van Gyzen [Fri, 23 Jul 2021 13:24:52 +0000 (08:24 -0500)]
aio_md_test: fix cleanup
ATF cleanup functions cannot use functions such as ATF_REQUIRE
and atf_tc_fail. These functions assert that a test case is
currently running, which is not true during cleanup, so the
process aborts. Change the cleanup function to simply print
to stderr and return.
Eric van Gyzen [Thu, 27 May 2021 16:33:22 +0000 (11:33 -0500)]
libprocstat kstack: fix race with thread creation
When collecting kernel stacks for a target process, if the process
adds a thread between the two calls to sysctl, ignore the additional
threads. Previously, procstat would print only a useless error
message. Now, it prints a consistent snapshot of the stacks.
We know that snapshot is already stale, but it could still be stale
even with a more complex fix to reallocate and retry, so such a fix
is hardly worth the effort.
Eric van Gyzen [Mon, 26 Apr 2021 15:01:17 +0000 (10:01 -0500)]
Wait longer for a previous IPI to be sent
When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.
No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec. Linux and illumos
wait forever. In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c. In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c. However, misbehaving hardware
could hang the system if we wait forever.
Eric van Gyzen [Tue, 6 Apr 2021 14:36:52 +0000 (09:36 -0500)]
uefisign: fix handling of errors from child proc
Close the unused pipe file descriptors so the parent will notice if
the child exits prematurely. Previously, the parent would block
forever on a read from the pipe.
Mark Johnston [Tue, 22 Feb 2022 14:26:33 +0000 (09:26 -0500)]
riscv: Fix another race in pmap_pinit()
Commit c862d5f2a789 ("riscv: Fix a race in pmap_pinit()") did not really
fix the race. Alan writes,
Suppose that N entries in the L1 tables are in use, and we are in the
middle of the memcpy(). Specifically, we have read the zero-filled
(N+1)st entry from the kernel L1 table. Then, we are preempted. Now,
another core/thread does pmap_growkernel(), which fills the (N+1)st
entry. Finally, we return to the original core/thread, and overwrite
the valid entry with the zero that we earlier read.
Try to fix the race properly, by copying kernel L1 entries while holding
the allpmaps lock. To avoid doing unnecessary work while holding this
global lock, copy only the entries that we expect to be valid.
Fixes: c862d5f2a789 ("riscv: Fix a race in pmap_pinit()")
Reported by: alc, jrtc27
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 8 Feb 2022 18:15:54 +0000 (13:15 -0500)]
riscv: Fix a race in pmap_pinit()
All pmaps share the top half of the address space. With 3-level page
tables, the top-level kernel map entries are not static: they might
change if the kernel map is extended (via pmap_growkernel()) or a 1GB
mapping in the direct map is demoted (not implemented yet). Thus the
riscv pmap maintains the allpmaps list to synchronize updates to
top-level entries.
When a pmap is created, it is inserted into this list after copying
top-level entries from the kernel pmap. The copying is done without
holding the allpmaps lock, and it is possible for pmap_pinit() to race
with kernel map updates. In particular, if a thread is modifying L1
entries, and a concurrent pmap_pinit() copies the old version of the
entries, it might not receive the update.
Fix the problem by copying the kernel map entries after inserting the
pmap into the list. This ensures that the nascent pmap always receives
updates, though pmap_distribute_l1() may race with the page copy.
Reviewed by: mhorne, jhb
Sponsored by: The FreeBSD Foundation
Kristof Provost [Sat, 19 Feb 2022 15:34:31 +0000 (16:34 +0100)]
bridge: Don't share broadcast packets
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.
Kristof Provost [Tue, 15 Feb 2022 10:49:39 +0000 (11:49 +0100)]
netinet: allow UDP tunnels to be removed
udp_set_kernel_tunneling() rejects new callbacks if one is already set.
Allow callbacks to be cleared. The use case for this is OpenVPN DCO,
where the socket is opened by userspace and then adopted by the kernel
to run the tunnel. If the DCO interface is removed but userspace does
not close the socket (something the kernel cannot prevent) the installed
callbacks could be called with an invalidated context.
Allow new functions to be set, but only if they're NULL (i.e. allow the
callback functions to be cleared).
Mark Johnston [Fri, 14 Jan 2022 20:03:53 +0000 (15:03 -0500)]
vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space." This can be
misleading, as there are other reasons an OOM kill can be triggered. In
particular, it's entirely possible to trigger an OOM kill on a system
with plenty of free swap space.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Navdeep Parhar [Sat, 12 Feb 2022 00:58:46 +0000 (16:58 -0800)]
cxgbe(4): Fix illegal hardware access in cxgbe_refresh_stats.
cxgbe_refresh_stats takes into account VI_SKIP_STATS but not
VI_INIT_DONE when deciding whether to read the hardware stats. But
before this change VI_SKIP_STATS was set only for VIs with VI_INIT_DONE.
That meant that cxgbe_refresh_stats always accessed the hardware for
uninitialized VIs, and this is a problem if the adapter is suspended or
in the middle of a reset.
Fix this by setting VI_SKIP_STATS on all VIs during suspend. While
here, ignore VI_INIT_DONE in vi_refresh_stats too to be consistent with
cxgbe_refresh_stats.
Navdeep Parhar [Thu, 13 Jan 2022 22:21:49 +0000 (14:21 -0800)]
cxgbe(4): Fix bad races between sysctl and driver detach.
The default sysctl context setup by newbus for a device is eventually
freed by device_sysctl_fini, which runs after the device driver's detach
routine. sysctl nodes associated with this context must not use any
resources (like driver locks, hardware access, counters, etc.) that are
released by driver detach.
There are a lot of sysctl nodes like this in cxgbe(4) and the fix is to
hang them off a context that is explicitly freed by the driver before it
releases any resource that might be used by a sysctl.
This fixes panics when running "sysctl dev.t6nex dev.cc" in a tight loop
and loading/unloading the driver in parallel.
Navdeep Parhar [Wed, 5 Jan 2022 18:45:06 +0000 (10:45 -0800)]
cxgbe(4): Do not request an FEC that is invalid for the requested speed.
This eliminates error messages like this from the driver when running at
50Gbps with 100G cables:
[3726] cc0: l1cfg failed: 71
[4407] cc0: l1cfg failed: 71
Note that link comes up anyway with or without this change.
Navdeep Parhar [Mon, 3 Jan 2022 22:35:45 +0000 (14:35 -0800)]
cxgbe(4): Update firmwares to 1.26.6.0.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CHANGES
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version : 1.26.6.0
Date : 01/03/2022
================================================================================
Fixes
-----
BASE:
- Fixed one module eeprom read failure.
- Fixed an issue with speed selection when 40G and 25G are advertised and
supported.
- Fixed a random traffic hang when T5 receives invalid ets BW in dcbx
messages from a switch.
- Fixed very long link up time with few switches.
================================================================================
Navdeep Parhar [Mon, 3 Jan 2022 21:31:46 +0000 (13:31 -0800)]
cxgbe(4): Fix stats collection for ports with port_id != tx_chan
This fixes a driver panic during stats collection when a port's id does
not match its tx channel. The bug affected only the T580 card running
with a non-default VPD.
Navdeep Parhar [Thu, 9 Dec 2021 19:10:48 +0000 (11:10 -0800)]
cxgbe(4): Update firmwares to 1.26.4.0
(Rest is from the README that came with the firmware)
Version : 1.26.4.0
Date : 12/02/2021
Fixes
-----
BASE:
- Fixed error on setting 25G speed on 100G copper with multiple FEC set in
firmware commands.
- Handle link of unknown optics modules by enabling module tx unconditionally.
- Fixed link not coming up for 25G CRS phys. Firmware incorrectly tried to
bring up the link in RS-FEC but as per IEEE spec, it must be BASER FEC.
- Fixed an issue where firmware doesn't automatically retry next FEC if driver
asks to bring up the link using RS-FEC and link doesn't come up.
Navdeep Parhar [Mon, 15 Nov 2021 18:55:04 +0000 (10:55 -0800)]
cxgbe(4): Change the way t4_shutdown_adapter brings the link(s) down.
Modify the GPIO pins only on the Base-T cards and even there drive all
of them low instead of putting them in hi-z state. For the rest (this
is the common case), directly power off the PLLs of the high speed
serdes. This is the simplest method that does not involve or conflict
with the firmware but still works with all T4-T6 cards regardless of
what's plugged into the port.
This fixes a problem where the peer wouldn't always see a link down if
it is connected to the device using a -CR4 copper cable.
Navdeep Parhar [Wed, 10 Nov 2021 19:38:54 +0000 (11:38 -0800)]
cxgbe(4): internal knob for flexible control over FEC selection.
Recent firmwares have support for autonomous FEC selection and a "force"
knob to let the driver control this behavior (or not) in a fine grained
manner. This change adds a driver knob so that all the different ways of
configuring the link FEC can be exercised. Note that this controls the
internal driver/firmware interaction for link configuration and is not
meant for general use.
Navdeep Parhar [Wed, 10 Nov 2021 18:54:53 +0000 (10:54 -0800)]
cxgbe(4): separate sysctls for user-requested and in-use FEC.
Recent firmwares have more leeway in FEC selection and there is a need
to track the FECs requested by the driver separately from the FEC in use
on the link. The existing dev.<port>.<inst>.fec sysctl can read both but
its behavior depends on the link state and it is sometimes hard to find
out what was requested when the link is up.
Split the fec sysctl into two (requested_fec and link_fec) to get access
to both pieces of information regardless of the link state.
Bjoern A. Zeeb [Thu, 24 Feb 2022 21:38:27 +0000 (21:38 +0000)]
iwlwifi: enhance debug information
Add a string of the debug type to the output of the debug message so it
is easier to search for specific events in a trace with lots of debugging
on. While here remove superflous ().
Bjoern A. Zeeb [Tue, 22 Feb 2022 22:48:08 +0000 (22:48 +0000)]
LinuxKPI: change DECLARE_FLEX_ARRAY()
DECLARE_FLEX_ARRAY can be used inside a structure. On FreeBSD due to
-Wgnu-variable-sized-type-not-at-end this yields an error. Use [0]
instead of [] to overcome this.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D34350
Chuck Tuffli [Wed, 23 Feb 2022 15:18:54 +0000 (07:18 -0800)]
bhyve nvme: Advertise Namespace changed AEN
Advertise Namespace Attribute Notices events in the Optional
Asynchronous Events Supported (OAES) field of the Identify Controller
data structure. Additionally, rename the enums and macros to clarify
these are AEN's related to Notices and not generic information.
Chuck Tuffli [Mon, 21 Feb 2022 18:34:14 +0000 (10:34 -0800)]
nvme: Add OAES bit-field definitions
Create definitions for the Optional Asynchronous Events Supported (OAES)
values. Also adds a helper macro for the common use case of "mask and
shift". E.g.
value = NVME_CTRLR_DATA_OAES_NS_ATTR_MASK << NVME_CTRLR_DATA_OAES_NS_ATTR_SHIFT;
becomes
value = NVMEB(NVME_CTRLR_DATA_OAES_NS_ATTR);
1. A delayed_work struct in the WORK_ST_TIMER state.
2. Thread A calls mod_delayed_work()
3. Thread B (a callout thread) simultaneously calls
linux_delayed_work_timer_fn()
The following sequence of events is possible:
A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on(). This is a no-op because the
state is WORK_ST_TASK.
As a result, the delayed_work struct will never be invoked. This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.
Fix this by introducing locking into the cancel path (which
corresponds with the lock held while the callout runs). This will
prevent the callout from changing the state of the task until the
cancel is complete, preventing the race.
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which
is only safe in the net epoch. Ensure that callers are in the net
epoch before calling this function.
If the UCL ctld parser encountered a port that used the CTL
ioctl device, it fell into a special case that had an erroneous
early return. This caused all configuration in the target
following the port attribute to be skipped. Fix this by replacing
the return with a continue so that the rest of the config is
parsed correctly.
Ed Maste [Mon, 21 Feb 2022 04:09:36 +0000 (23:09 -0500)]
vt: fix double-click word selection for first/last word on line
Previously when double-clicking on the first word on a line we would
select from the cursor position to the end of the word, not from the
beginning of the line. Similarly, when double-clicking on the last word
on a line we would select from the beginning of the word to the cursor
position rather than the end of the line.
This is because we searched backward or forward for a space character to
mark the beginning or end of a word. Now, use the beginning or end of
the line if we do not find a space.