- Use appropriate mdoc macros
- Document that tcp= is a synonym to rfb= (tcp is used in the examples,
but never mentioned)
- Clarify the IP address specification
bhyve.8: Improve emulation description of the -s flag
- Set width of the list to the longest key word for readability.
- Separate descriptions of amd_hostbridge and hostbridge emulations.
Also, wordsmith their descriptions for consistency with other entries.
- Use Cm instead of Li for command modifiers.
- Do not stylize AMD with Li, there's no need to do it.
- Fix a typo in the definition of ahci-hd ("hard drive" instead of
"hard-drive").
Also, remove the macros of the nested list which contained slot,
emulation and conf. This decreases the indention of the -s description.
It was necessary to clean up the slot description.
bhyve.8: Improve the description and synopsis of -l
- Describe "-l help" separately for readability.
- List all the supported comX devices explicitly
- Use Cm instead of Ar for command modifiers (i.e., literal values a
user can specify as an argument to the command).
- Explain where to get more information about the possible values of the
conf argument.
In particular:
- Sort short options to align with style(9)
- Add two missing flags: -G and -r
- Drop unnecessary angle brackets for consistency
- Rename the "vm" argument to vmname for consistency with the manual
page
It seems that the number of lines is no longer an optional parameter to
the -C flag. Document it accordingly both in the manual page and the
usage message.
Dimitry Andric [Thu, 5 Aug 2021 18:57:22 +0000 (20:57 +0200)]
Add ElfW() macro for compatibility with Linux
Some Linux software using ELF headers assumes the existence of an
ElfW(type) macro, which concatenates 'Elf', the default ELF word size,
and the given type. This is identical to our __ElfN(x) macro in
<sys/elf_generic.h>. Add the macro for compatibility, with a comment
that we prefer the __ElfN() macro for FreeBSD.
Add check that ifp supports IPv6 multicasts in in6_getmulti.
This fixes panic when user application tries to join into multicast
group on an interface that doesn't support IPv6 multicasts, like
IFT_PFLOG interfaces.
Alexander Motin [Fri, 30 Jul 2021 03:16:22 +0000 (23:16 -0400)]
coretemp(4): Switch to smp_rendezvous_cpus().
Use of smp_rendezvous_cpus() instead of sched_bind() allows to not
block indefinitely if target CPU is running some thread with higher
priority, while all we need is single rdmsr/wrmsr instruction call.
I guess it should also be much cheaper than full thread migration.
Alexander Motin [Fri, 30 Jul 2021 03:39:04 +0000 (23:39 -0400)]
ipmi(4): Add more watchdog error checks.
Add request submission status checks before checking req->ir_compcode,
otherwise it may be zero just because of initialization.
Add checks for req->ir_compcode errors in ipmi_reset_watchdog() and
ipmi_set_watchdog(). In first case explicitly check for 0x80, which
means timer was not previously set, that I found happening after BMC
cold reset. This change makes watchdog timer to recover instead of
permanently ignoring reset errors after BMC reset or upgraded.
Mitchell Horne [Wed, 4 Aug 2021 17:31:36 +0000 (14:31 -0300)]
hwpmc: disable uncore class on Sandy Bridge and newer
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b15. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.
Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.
The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.
Work around the problem by implementing a pf-specific variant.
Roy Marples [Wed, 28 Jul 2021 15:46:59 +0000 (08:46 -0700)]
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Alexander Motin [Wed, 28 Jul 2021 20:15:43 +0000 (16:15 -0400)]
Do not expose to scheduler caches of single CPU.
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.
Kristof Provost [Mon, 2 Aug 2021 07:46:33 +0000 (09:46 +0200)]
pf: bound DIOCGETSTATES memory use
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.
pf: locally originating connections with 'route-to' fail
Similar to the REPLY_TO shortcut (6d786845cf) we also can't shortcut
ROUTE_TO. If we do we will fail to apply transformations or update the
state, which can lead to premature termination of the connections.
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Conrad Meyer [Fri, 26 Oct 2018 21:03:57 +0000 (21:03 +0000)]
Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of). See the Differential URL for more detail.
The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).
Conrad Meyer [Fri, 26 Oct 2018 20:55:01 +0000 (20:55 +0000)]
Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'. Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.
I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.
Conrad Meyer [Sat, 20 Oct 2018 21:09:12 +0000 (21:09 +0000)]
Fortuna: Fix a race to prevent reseed spamming
If multiple threads enter fortuna_pre_read contemporaneously, such as via
read(2) or getrandom(2), they could race to check how long it has been since
the last update due to a TOCTOU problem with 'now'.
To resolve the race, simply check the current time after we win the lock
race.
If getsbinuptime is perceived to be expensive, another option might be to
just accept the race and validate that fs_lasttime isn't "in the future."
(It should be within the last ~2^31 seconds out of ~2^32 seconds
representable duration.)
virtio: enable VTNET_LEGACY_TX when ALTQ is enabled.
ALTQ only works on network drivers which use if_start (rather than
if_transmit). vtnet uses if_start if built with VTNET_LEGACY_TX. Default
to that the kernel is built with ALTQ enabled, to reduce user surprise.
Alexander Motin [Thu, 1 Jul 2021 19:28:55 +0000 (15:28 -0400)]
mrsas(4): Report more correct maximum I/O size.
Subtract one SGE for the case of misaligned address. Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.
While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).
Warner Losh [Sat, 31 Jul 2021 17:57:21 +0000 (11:57 -0600)]
awk: merge fixes to metamode
This is a partial MFC of c63c5ab001106/r349062. The whole thing doesn't
apply cleanly, but this bit, at least, is needed to fix metamode on
stable/12 after the changes to awk were merged from head. Rather than
risk breaking other things, I'm just merging the bit I know that's
needed. All build tools need to be in DEPENDOBJ so the dependency order
is correct and they get built with host tools.
smartpqi: Maintenance commit of Microchip smartpqi
Newly added features and bug fixes in latest Microchip SmartPQI driver.
1) Newly added TMF feature.
2) Added newly Huawei & Inspur PCI ID's
3) Fixed smartpqi driver hangs in Z-Pool while running on FreeBSD12.1
4) Fixed flooding dmesg in kernel while the controller is offline during
in ioctls.
5) Avoided unnecessary host memory allocation for rcb sg buffers.
6) Fixed race conditions while accessing internal rcb structure.
7) Fixed where Logical volumes exposing two different names to the OS
it's due to the system memory is overwritten with DMA stale data.
8) Fixed dynamically unloading a smartpqi driver.
9) Added device_shutdown callback instead of deprecated shutdown_final
kernel event in smartpqi driver.
10) Fixed where Os is crashed during physical drive hot removal during
heavy IO.
11) Fixed OS crash during controller lockup/offline during heavy IO.
12) Fixed coverity issues in smartpqi driver
13) Fixed system crash while creating and deleting logical volume in a
continuous loop.
14) Fixed where the volume size is not exposing to OS when it expands.
15) Added HC3 pci id's.
16) Fixed compiler issues in 12.2 kernel.
Note: this is a direct commit, submitted by the vendor to support
stable/12
Reviewed by: imp, Murthy Bhat, Scott Benesh
Differential Revision: https://reviews.freebsd.org/D24428
Warner Losh [Mon, 12 Jul 2021 03:26:08 +0000 (21:26 -0600)]
awk: remove proctab.c
proctab.c is a generated file and never should have been committed to
the tree. This file has been added and removed a couple of times, most
recently added by me in my 2019 updates.
Kristof Provost [Tue, 2 Mar 2021 15:57:27 +0000 (16:57 +0100)]
pf tests: Test the match keyword
The new match keyword can currently only assign queues, so we can only
test it with ALTQ.
Set up a basic scenario where we use 'match' to assign ICMP traffic to a
slow queue, and confirm that it's really getting slowed down.
Kristof Provost [Tue, 2 Mar 2021 15:01:04 +0000 (16:01 +0100)]
pf: match keyword support
Support the 'match' keyword.
Note that support is limited to adding queuing information, so without
ALTQ support in the kernel setting match rules is pointless.
For the avoidance of doubt: this is NOT full support for the match
keyword as found in OpenBSD's pf. That could potentially be built on top
of this, but this commit is NOT that.
The general style in sbin/nvmecontrol apppears to print uint64_t types
using %j, so I'm using that instead of the more general (but admittedly
ugly) PRIu64.
Warner Losh [Fri, 2 Jul 2021 22:00:42 +0000 (16:00 -0600)]
nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.
To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@
Found by: jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31002
Warner Losh [Fri, 2 Jul 2021 21:58:19 +0000 (15:58 -0600)]
nvme: Fix alignment on nvme structures
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.
Warner Losh [Sat, 29 May 2021 05:01:52 +0000 (23:01 -0600)]
nvme: fix a race between failing the controller and failing requests
Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.
This is a temporary kludge until I can work out questions that arose
during the review, not least is what was the race that queueing to a
failure task solved. The original commit is vague and other error paths
in the same context do a direct failure. I'll investigate that more
completely before committing changing that to a direct failure. mav@
raised this issue during the review, but didn't otherwise object.
Multiple threads, though, solve the problem in the mean time until other
such means can be perfected.
Warner Losh [Thu, 11 Mar 2021 15:42:44 +0000 (08:42 -0700)]
nvme: use config_intrhook_drain to avoid removable card races
nvme drives are configured early in boot. However, a number of the configuration
steps takes which take a while, so we defer those to a config intrhook that runs
before the root filesystem is mounted. At the same time, the PCI hot plug wakes
up and tests the status of the card. It may decide that the card has gone away
and deletes the child. As part of that process nvme_detach is called. If this
call happens after the config_intrhook starts to run, but before it is finished,
there's a race where we can tear down the device's soft state while the
config_intrhook is still using it. Use the new config_intrhook_drain to
disestablish the hook. Either it will be removed w/o running, or the routine
will wait for it to finish. This closes the race and allows safe hotplug at any
time, even very early in boot.
The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.