Alexander Motin [Mon, 9 Aug 2021 01:34:33 +0000 (21:34 -0400)]
Optimize res_find().
When the device name is provided, we can simply run strncmp() for each
line to quickly skip unrelated ones, that is much faster than sscanf()
and only then strcmp().
Mark Johnston [Mon, 16 Aug 2021 17:15:25 +0000 (13:15 -0400)]
sigtimedwait: Use a unique wait channel for sleeping
When a sigtimedwait(2) caller goes to sleep, it uses a wait channel of
p->p_sigacts with the proc lock as the interlock. However, p_sigacts
can be shared between processes if a child is created with
rfork(RFSIGSHARE | RFPROC). Thus we can end up with two threads
sleeping on the same wait channel using different locks, which is not
permitted.
Fix the problem simply by using a process-unique wait channel, following
the example of sigsuspend. The actual wait channel value is irrelevant
here, sleeping threads are awoken using sleepq_abort().
Reported by: syzbot+8c417afabadb50bb8827@syzkaller.appspotmail.com
Reported by: syzbot+1d89fc2a9ef92ef64fa8@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Alan Somers [Wed, 21 Jul 2021 21:11:00 +0000 (15:11 -0600)]
Escape any '.' characters in sysctl node names
ZFS creates some sysctl nodes that include a pool name, and '.' is an
allowed character in pool names. But it's the separator in the sysctl
tree, so it can't be included in a sysctl name. Replace it with "%25".
Handily, "%" is illegal in ZFS pool names, so there's no ambiguity
there.
Alexander Motin [Sun, 8 Aug 2021 22:19:08 +0000 (18:19 -0400)]
kbdmux(4): Make callout handler mpsafe.
Both callout and taskqueue now have drain() routines not requiring
external locking. It allows to remove TASK flag and manual drain,
so the only thing remaining for lock to protect inside the callout
handler is ks_inq_length zero comparison, that can be lockless.
kern: ether_gen_addr: randomize on default hostuuid, too
Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.
Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.
Kevin Bowling [Tue, 10 Aug 2021 19:47:22 +0000 (12:47 -0700)]
e1000: rctl/srrctl buffer size init, rfctl fix
Simplify the setup of srrctl.BSIZEPKT on igb class NICs.
Improve the setup of rctl.BSIZE on lem and em class NICs.
Don't try to touch rfctl on lem class NICs.
Manipulate rctl.BSEX correctly on lem and em class NICs.
Marius Strobl [Sat, 23 Jan 2021 18:18:28 +0000 (19:18 +0100)]
e1000: consistently use the hw variables
It's rather confusing when adapter->hw and hw are mixed and matched
within a particular function.
Some of this was missed in cd1cf2fc1d49c509ded05dcd41b7600a5957fb9a
and r353778 respectively.
- Use appropriate mdoc macros
- Document that tcp= is a synonym to rfb= (tcp is used in the examples,
but never mentioned)
- Clarify the IP address specification
bhyve.8: Improve emulation description of the -s flag
- Set width of the list to the longest key word for readability.
- Separate descriptions of amd_hostbridge and hostbridge emulations.
Also, wordsmith their descriptions for consistency with other entries.
- Use Cm instead of Li for command modifiers.
- Do not stylize AMD with Li, there's no need to do it.
- Fix a typo in the definition of ahci-hd ("hard drive" instead of
"hard-drive").
Also, remove the macros of the nested list which contained slot,
emulation and conf. This decreases the indention of the -s description.
It was necessary to clean up the slot description.
bhyve.8: Improve the description and synopsis of -l
- Describe "-l help" separately for readability.
- List all the supported comX devices explicitly
- Use Cm instead of Ar for command modifiers (i.e., literal values a
user can specify as an argument to the command).
- Explain where to get more information about the possible values of the
conf argument.
In particular:
- Sort short options to align with style(9)
- Add two missing flags: -G and -r
- Drop unnecessary angle brackets for consistency
- Rename the "vm" argument to vmname for consistency with the manual
page
It seems that the number of lines is no longer an optional parameter to
the -C flag. Document it accordingly both in the manual page and the
usage message.
Dimitry Andric [Thu, 5 Aug 2021 18:57:22 +0000 (20:57 +0200)]
Add ElfW() macro for compatibility with Linux
Some Linux software using ELF headers assumes the existence of an
ElfW(type) macro, which concatenates 'Elf', the default ELF word size,
and the given type. This is identical to our __ElfN(x) macro in
<sys/elf_generic.h>. Add the macro for compatibility, with a comment
that we prefer the __ElfN() macro for FreeBSD.
Add check that ifp supports IPv6 multicasts in in6_getmulti.
This fixes panic when user application tries to join into multicast
group on an interface that doesn't support IPv6 multicasts, like
IFT_PFLOG interfaces.
Alexander Motin [Fri, 30 Jul 2021 03:16:22 +0000 (23:16 -0400)]
coretemp(4): Switch to smp_rendezvous_cpus().
Use of smp_rendezvous_cpus() instead of sched_bind() allows to not
block indefinitely if target CPU is running some thread with higher
priority, while all we need is single rdmsr/wrmsr instruction call.
I guess it should also be much cheaper than full thread migration.
Alexander Motin [Fri, 30 Jul 2021 03:39:04 +0000 (23:39 -0400)]
ipmi(4): Add more watchdog error checks.
Add request submission status checks before checking req->ir_compcode,
otherwise it may be zero just because of initialization.
Add checks for req->ir_compcode errors in ipmi_reset_watchdog() and
ipmi_set_watchdog(). In first case explicitly check for 0x80, which
means timer was not previously set, that I found happening after BMC
cold reset. This change makes watchdog timer to recover instead of
permanently ignoring reset errors after BMC reset or upgraded.
Mitchell Horne [Wed, 4 Aug 2021 17:31:36 +0000 (14:31 -0300)]
hwpmc: disable uncore class on Sandy Bridge and newer
It was written for Nehalem and Westmere, with minor but incomplete
updates for Sandy Bridge in 78d763a29b15. The uncore architecture
changed significantly with this generation, bringing new layouts and
locations for some MSRs.
Misprogramming these MSRs in ucp_start_pmc() may panic the system, and
this is trivially reproducible via pmcstat(8) on at least Broadwell and
Haswell. Disable the class on these CPUs until it can be updated more
completely and leave a TODO comment detailing some of the work required.
Note that the nclasses value for Broadwell was already incorrect and
doesn't need changing.
The result is that any uncore events listed by pmcstat -L will no longer
be allocatable, but this is already the case for newer generations of
Intel CPUs.
Numerous counters got migrated from straight uint64_t to the counter(9)
API. Unfortunately the implementation comes with a significiant
performance hit on some platforms and cannot be easily fixed.
Work around the problem by implementing a pf-specific variant.
Roy Marples [Wed, 28 Jul 2021 15:46:59 +0000 (08:46 -0700)]
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.
This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.
Alexander Motin [Wed, 28 Jul 2021 20:15:43 +0000 (16:15 -0400)]
Do not expose to scheduler caches of single CPU.
Before this change my dual-Xeon(R) Gold 6242R always reported 3 levels
or topology (root, package/L3 and core/L2). But with SMT disabled
core/L2 matches thread, so additional topology level only causes more
traversal work. With this change SMT case is reported same as before,
while non-SMT is reported with only 2 much more simple levels.
Kristof Provost [Mon, 2 Aug 2021 07:46:33 +0000 (09:46 +0200)]
pf: bound DIOCGETSTATES memory use
Similar to what we did earlier for DIOCGETSTATESV2 we only allocate
enough memory for a handful of states and copy those out, bit by bit,
rather than allocating memory for all states in one go.
pf: locally originating connections with 'route-to' fail
Similar to the REPLY_TO shortcut (6d786845cf) we also can't shortcut
ROUTE_TO. If we do we will fail to apply transformations or update the
state, which can lead to premature termination of the connections.
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Conrad Meyer [Fri, 26 Oct 2018 21:03:57 +0000 (21:03 +0000)]
Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of). See the Differential URL for more detail.
The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).
Conrad Meyer [Fri, 26 Oct 2018 20:55:01 +0000 (20:55 +0000)]
Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'. Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.
I found this via code inspection, referencing ยง9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.
Conrad Meyer [Sat, 20 Oct 2018 21:09:12 +0000 (21:09 +0000)]
Fortuna: Fix a race to prevent reseed spamming
If multiple threads enter fortuna_pre_read contemporaneously, such as via
read(2) or getrandom(2), they could race to check how long it has been since
the last update due to a TOCTOU problem with 'now'.
To resolve the race, simply check the current time after we win the lock
race.
If getsbinuptime is perceived to be expensive, another option might be to
just accept the race and validate that fs_lasttime isn't "in the future."
(It should be within the last ~2^31 seconds out of ~2^32 seconds
representable duration.)
virtio: enable VTNET_LEGACY_TX when ALTQ is enabled.
ALTQ only works on network drivers which use if_start (rather than
if_transmit). vtnet uses if_start if built with VTNET_LEGACY_TX. Default
to that the kernel is built with ALTQ enabled, to reduce user surprise.