Michael Tuexen [Tue, 9 Feb 2021 22:35:55 +0000 (23:35 +0100)]
libsysdecode: fix decoding of TCP_NOPUSH and TCP_MD5SIG
TCP_FASTOPEN_MIN_COOKIE_LEN was incorrectly registered as a name of
a IPPROTO_TCP level socket option, which overwrote TCP_NOPUSH.
TCP_FASTOPEN_PSK_LEN was incorrectly registered as a name of an
IPPROTO_TCP level socket option, which overwrote TCP_MD5SIG.
Emmanuel Vadot [Thu, 25 Feb 2021 15:34:28 +0000 (16:34 +0100)]
mkimg: We always want the last block of the last inserted partition
Even with an absolute offset we want to know the last block the partition
otherwise we endup with an image the size of the metadata.
This allow to create image with the ESP placed at a specific position which
is useful on arm/arm64 where u-boot have always a hard time to read the ESP
if it's not aligned on 512k.
mkimg -v -o sdcard -s gpt -p efi::54M:1M -p freebsd-ufs::1G
now works.
Warner Losh [Thu, 11 Feb 2021 23:06:30 +0000 (16:06 -0700)]
efibootmgr: Check for efi supported after parsing args
Move the check for efi variables being supported to after parsing the args. This
allows '-h' to produce both as a normal user as well as on all systems.
Warner Losh [Wed, 17 Feb 2021 22:08:19 +0000 (15:08 -0700)]
uart: only use MSI on devices that advertise 1 MSI vector
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
Warner Losh [Fri, 19 Feb 2021 22:34:25 +0000 (15:34 -0700)]
boot: remove gptboot.efifat, it never should have been
conical hat reduction: Make sure we also remove gotboot.efifat. It was created,
briefly, and shouldn't have existed in the first place. Kill it at the same
place we kill boot1.efifat.
Warner Losh [Mon, 8 Feb 2021 19:29:20 +0000 (12:29 -0700)]
hid: bump HID_ITEM_MAXUSAGES to 8
My YOGA requires a minimum of 7 to parse w/o an error. Since the memory savings
are trivial and the yoga a popular system, bump the default up to 8. There's no
API/ABI issues in doing this. This hid_item struct isn't exported to userland
and the one libusbhid has is different and only shares a name...
Warner Losh [Mon, 8 Feb 2021 21:43:25 +0000 (14:43 -0700)]
acpi: limit the AMDI0020/AMDI0010 workaround to an option
It appears that production versions of EPYC firmware get the _STA method right
for these nodes. In fact, this workaround breaks on production hardware by
including too many uart nodes. This work around was for pre-release hardware
that wound up not having a large deployment. Move this work around to a kernel
option since the machines that needed it have been powered off and are difficult
to resurrect. Should there be a more significant deployment than is understood,
we can restrict it based on smbios strings.
Mark Johnston [Thu, 25 Feb 2021 23:49:47 +0000 (18:49 -0500)]
pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system. However, if PG_PS is set, there is no next level.
Approved by: re (gjb)
Reported by: rpokala
Reviewed by: kib
Tested by: rpokala
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28922
Mark Johnston [Wed, 24 Feb 2021 15:08:53 +0000 (10:08 -0500)]
iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
Approved by: re (gjb)
PR: 253583
Reviewed by: gallatin, erj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
Handle partial data re-sending on ktls/sendfile on FreeBSD
Add a handler for EBUSY sendfile error in addition to
EAGAIN. With EBUSY returned the data still can be partially
sent and user code has to be notified about it, otherwise it
may try to send data multiple times.
Robert Watson [Tue, 16 Feb 2021 15:19:05 +0000 (15:19 +0000)]
Reimplement FreeBSD/arm64 dtrace_gethrtime() to use the system timer.
dtrace_gethrtime() is the high-resolution nanosecond timestemp used
for the DTrace 'timestamp' built-in variable. The new implementation
uses the EL0 cycle counter and frequency registers in ARMv8-A. This
replaces a previous implementation that relied on an
instrumentation-safe implementation of getnanotime(), which provided
only timer resolution.
Mitchell Horne [Thu, 28 Jan 2021 17:49:47 +0000 (13:49 -0400)]
arm64: extend struct db_reg to include watchpoint registers
The motivation is to provide access to these registers from userspace
via ptrace(2) requests PT_GETDBREGS and PT_SETDBREGS.
This change breaks the ABI of these particular requests, but is
justified by the fact that the intended consumers (debuggers) have not
been taught to use them yet. Making this change now enables active
upstream work on lldb to begin using this interface, and take advantage
of the hardware debugging registers available on the platform.
PR: 252860
Reported by: Michał Górny (mgorny@gentoo.org)
Reviewed by: andrew, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Mitchell Horne [Fri, 5 Feb 2021 21:46:48 +0000 (17:46 -0400)]
arm64: handle watchpoint exceptions from EL0
This is a prerequisite to allowing the use of hardware watchpoints for
userspace debuggers.
This is also a slight departure from the x86 behaviour, since `si_addr`
returns the data address that triggered the watchpoint, not the
address of the instruction that was executed. Otherwise, there is no
straightforward way for the application to determine which watchpoint
was triggered. Make a note of this in the siginfo(3) man page.
Reviewed by: jhb, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Mitchell Horne [Tue, 9 Feb 2021 18:29:38 +0000 (14:29 -0400)]
arm64: validate breakpoint registers
In particular, we want to disallow setting breakpoints on kernel
addresses from userspace. The control register fields are validated or
ignored as appropriate.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Allan Jude [Sun, 14 Feb 2021 18:39:09 +0000 (18:39 +0000)]
Use iflib_if_init_locked() during media change instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for media changes.
iflib_if_init_locked() calls stop then init, so fixes the problem.
PR: 253473
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Approved by: re (gjb)
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
Martin Matuska [Mon, 22 Feb 2021 17:37:47 +0000 (18:37 +0100)]
zfs: disable use of hardware crypto offload drivers
From openzfs-master e7adccf7f commit message:
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed. This
is only a problem for asynchronous drivers. Second, some hardware
drivers have input constraints which ZFS does not satisfy. For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes. FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.
Martin Matuska [Mon, 22 Feb 2021 17:05:07 +0000 (18:05 +0100)]
zfs: fix panic if scrubbing after removing a slog device
From openzfs-master 11f2e9a4 commit message:
vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.
When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.
Patch Author: Patrick Mooney <pmooney@pfmooney.com>
Toomas Soome [Sun, 21 Feb 2021 10:32:18 +0000 (12:32 +0200)]
loader: autoload_font will hung loader when there is no local console
If we start with console set to comconsole, the local
console (vidconsole, efi) is never initialized and attempt to
use the data can render the loader hung.
Mark Johnston [Mon, 22 Feb 2021 20:50:09 +0000 (15:50 -0500)]
vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import(). This
results in a boot-time hang on such platforms.
Toomas Soome [Thu, 28 Jan 2021 07:45:47 +0000 (09:45 +0200)]
loader: unload command should reset tg_kernel_supported in gfx_state
While loading kernel, we check if vt/vbe backend support is included in
kernel and set the tg_kernel_supported flag in gfx_state. unload
command needs to reset this flag to allow next load to perform
this check with new kernel.
Dimitry Andric [Wed, 27 Jan 2021 21:28:43 +0000 (22:28 +0100)]
Fix loader detection of vbefb support on !amd64
On i386, after 6c7a932d0b8baaaee16eca0ba061bfa6e0e57bfd, the vbefb vt
driver was no longer detected by the loader, if any kernel module was
loaded after the kernel itself.
This was caused by the parse_vt_drv_set() function being called multiple
times, resetting the detection flag. (It was called multiple times,
becuase i386 .ko files are shared objects like the kernel proper, while
this is not the case on amd64.)
Fix this by skipping the set_vt_drv_set lookup if vbefb was already
detected.
Roger Pau Monné [Wed, 20 Jan 2021 18:40:51 +0000 (19:40 +0100)]
xen-blkback: fix leak of grant maps on ring setup failure
Multi page rings are mapped using a single hypercall that gets passed
an array of grants to map. One of the grants in the array failing to
map would lead to the failure of the whole ring setup operation, but
there was no cleanup of the rest of the grant maps in the array that
could have likely been created as a result of the hypercall.
Add proper cleanup on the failure path during ring setup to unmap any
grants that could have been created.
This is part of XSA-361.
Approved by: re (implicit, so)
Approved by: so
Security: CVE-2021-26932
Security: FreeBSD-SA-21:06.xen
Security: XSA-361
Sponsored by: Citrix Systems R&D
Alexander Motin [Wed, 17 Feb 2021 02:15:28 +0000 (21:15 -0500)]
cxgbe(4): Save proper zone index on low memory in refill_fl().
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones. But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx. It caused problems
with the later use of that cluster, including memory and/or data
corruption.
While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.
Mark Johnston [Wed, 17 Feb 2021 15:49:38 +0000 (10:49 -0500)]
libdtrace: Stop relying on lex compatibility
It does not appear to be required, and as of commit 6b7e592c215f
("lex: Do not let input() return 0 when end-of-file is reached") it
causes input to return 0 instead of EOF when end-of-input is reached.
Approved by: re (gjb)
PR: 253440
Sponsored by: The FreeBSD Foundation
Jung-uk Kim [Wed, 17 Feb 2021 07:22:47 +0000 (02:22 -0500)]
lex: Do not let input() return 0 when end-of-file is reached
Importing flex 2.6.4 has introduced a regression: input() now returns 0
instead of EOF to indicate that the end of input was reached, just like
traditional AT&T and POSIX lex. Note the behavior contradicts flex(1).
See "INCOMPATIBILITIES WITH LEX AND POSIX" section for information.
This incompatibility traces back to the original version and documented
in its manual page by the Vern Paxson.
Apparently, it has been reported in a few places, e.g.,
Unfortunately, this also breaks the scanner used by libdtrace and
dtrace is unable to resolve some probe argument types as a result. See
PR253440 for more information.
Note the regression was introduced by the following upstream commit
without any explanation or documentation change:
Now we restore the traditional flex behavior unless lex-compatibility
mode is set with "-l" option because I believe the author originally
wanted to make it more lex and POSIX compatible.
PR: 253440
Reported by: markj
Approved by: re (gjb)
Jamie Gritton [Fri, 19 Feb 2021 22:13:35 +0000 (14:13 -0800)]
MFS jail: Change both root and working directories in jail_attach(2)
jail_attach(2) performs an internal chroot operation, leaving it up to
the calling process to assure the working directory is inside the jail.
Add a matching internal chdir operation to the jail's root. Also
ignore kern.chroot_allow_open_directories, and always disallow the
operation if there are any directory descriptors open.
Reported by: mjg
Approved by: re (gjb), markj, kib
Kyle Evans [Fri, 12 Feb 2021 00:58:27 +0000 (18:58 -0600)]
pkg(7): address minor nits (mostly clang-analyze complaints)
- One (1) spurious whitespace.
- One (1) occurrence of "random(3) bad, arc4random(3)" good.
- Three (3) writes that will never be seen.
The latter two points are complaints from clang-analyze. Switching to
arc4random(3) is decidedly a good idea because we weren't doing any kind
of PRNG seeding anyways. The discarded assignments are arguably good
for future-proofing, but it's better to improve the S/N ratio from
clang-analyze.
Michal Krawczyk [Thu, 18 Feb 2021 09:00:58 +0000 (10:00 +0100)]
MFC 1c808fcd859f: Allocate BAR for ENA MSIx vector table
In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.
As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.
To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.
Approved by: re (gjb)
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
Alan Somers [Mon, 15 Feb 2021 22:51:31 +0000 (15:51 -0700)]
libpmc: fix linking with C programs
Revision r334749 Added some C++ code to libpmc. It didn't change the ABI,
but it did introduce a dependency on libc++. Nobody noticed because every
program that in the base system that uses libpmc is also C++.
Approved by: re (gjb)
Reported-by: Dom Dwyer <dom@itsallbroken.com>
Reviewed By: vangyzen
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28550
Jamie Gritton [Tue, 16 Feb 2021 19:19:13 +0000 (11:19 -0800)]
MFS jail: Handle a possible race between jail_remove(2) and fork(2)
jail_remove(2) includes a loop that sends SIGKILL to all processes
in a jail, but skips processes in PRS_NEW state. Thus it is possible
the a process in mid-fork(2) during jail removal can survive the jail
being removed.
Add a prison flag PR_REMOVE, which is checked before the new process
returns. If the jail is being removed, the process will then exit.
Also check this flag in jail_attach(2) which has a similar issue.
Approved by: re (gjb)
Reported by: trasz
Approved by: kib
Toomas Soome [Mon, 8 Feb 2021 18:49:09 +0000 (20:49 +0200)]
loader: remove BORDER_PIXELS
BORDER_PIXELS is left over from picking up the source from illumos
port. Since FreeBSD VT does not use border in terminal size
calculation, there is no reason why should loader use it.
MFC 12148d4300db:
Fix for locking order reversal in USB audio driver, when using mmap().
Locking the second lock which causes the LOR, can be skipped because
the code updating the shared variables is always executing from the
same USB thread.
lock order reversal:
1st 0xfffff80005cc3840 pcm7:play:dsp7.p0 (pcm play channel, sleep mutex)
@ usb_transfer.c:2342
2nd 0xfffff80005cc3860 pcm7:record:dsp7.r0 (pcm record channel, sleep mutex)
@ uaudio.c:2317
lock order pcm record channel -> pcm play channel established at:
witness_checkorder+0x461
__mtx_lock_flags+0x98
dsp_mmap_single+0x151
vm_mmap_cdev+0x65
devfs_mmap_f+0x143
kern_mmap_req+0x594
sys_mmap+0x46
amd64_syscall+0x12e
fast_syscall_common+0xf8
lock order pcm play channel -> pcm record channel attempted at:
witness_checkorder+0xd82
__mtx_lock_flags+0x98
uaudio_chan_play_callback+0xeb
usbd_callback_wrapper+0x7ec
usb_command_wrapper+0x7e
usb_callback_proc+0x8e
usb_process+0xf3
fork_exit+0x80
fork_trampoline+0xe
Approved by: re (gjb)
Found by: Stefan Ehmann <shoesoft@gmx.net>
Sponsored by: Mellanox Technologies // NVIDIA Networking
Martin Matuska [Mon, 15 Feb 2021 08:10:01 +0000 (09:10 +0100)]
zfs: Avoid updating the L2ARC device header unnecessarily
From openzfs-master 0ae184a6b commit message:
If we do not write any buffers to the cache device and the evict hand
has not advanced do not update the cache device header.
Martin Matuska [Mon, 15 Feb 2021 07:40:27 +0000 (08:40 +0100)]
zfs: fix RAIDZ2/3 not healing parity with 2+ bad disks
From openzfs-master 62d4287f2 commit message:
When scrubbing, (non-sequential) resilvering, or correcting a checksum
error using RAIDZ parity, ZFS should heal any incorrect RAIDZ parity by
overwriting it. For example, if P disks are silently corrupted (P being
the number of failures tolerated; e.g. RAIDZ2 has P=2), `zpool scrub`
should detect and heal all the bad state on these disks, including
parity. This way if there is a subsequent failure we are fully
protected.
With RAIDZ2 or RAIDZ3, a block can have silent damage to a parity
sector, and also damage (silent or known) to a data sector. In this
case the parity should be healed but it is not.