Ed Maste [Tue, 2 Mar 2021 22:35:48 +0000 (17:35 -0500)]
growfs: allow operation on RW-mounted filesystems
growfs supports growing mounted filesystems (writes are temporarily
suspended while the grow happens). Drop the check for fs_clean == 0
to restore this case. Leave fs_flags check for FS_UNCLEAN or
FS_NEEDSFSCK which represent the state of the filesystem when it was
mounted, and fsck should be run first if they are set.
PR: 253754
Reviewed by: mckusick
Approved by: re (gjb)
Fixes: 6eb925f8450f ("Filesystem utilities that modify the...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29021
Peter Grehan [Sat, 27 Feb 2021 04:15:04 +0000 (14:15 +1000)]
Import wireguard fixes from pfSense 2.5
Merge the following fixes from https://github.com/pfsense/FreeBSD-src 1940e7d3 Save address of ingress packets to allow wg to work on HA 8f5531f1 Fix connection to IPv6 endpoint 825ed9ee Fix tcpdump for wg IPv6 rx tunnel traffic 2ec232d3 Fix issue with replying to INITIATION messages in server mode ec77593a Return immediately in wg_init if in DETACH'd state 0f0dde6f Remove unnecessary wg debug printf on transmit 2766dc94 Detect and fix case in wg_init() where sockets weren't cleaned up b62cc7ac Close the UDP tunnel sockets when the interface has been stopped
linux: fix handling of flags for 32 bit send(2) syscall
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.
Approved by: re (gjb)
PR: 248065
Reported By: Alex S <iwtcex@gmail.com>
Tested By: Alex S <iwtcex@gmail.com>
Reviewed By: emaste
MFC After: 3 days
Sponsored By: The FreeBSD Foundation
Use compat.linux.emul_path instead of hardcoded path in /etc/rc.d/linux
In /etc/rc.d/linux the mounting paths of procfs, sysfs and devfs
are hardcoded to "/compat/linux". Switching to the content of
compat.linux.emul_path sysctl would allow to switch linuxulator
to different place.
Approved by: re (gjb)
Submitted by: freebsdnewbie_freenet.de
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27807
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file. An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager. In other words, both the file size and pager
size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES. In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
Approved by: re (gjb)
Reported by: dumbbell
Reviewed by: chs, kib
Tested by: dumbbell, pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28811
tcp: various improvements and fixes to PRR
- Ensure cwnd doesn't shrink to zero with PRR
- use accurate rfc6675_pipe when enabled
- Address two incorrect calculations and enhance readability
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)
- Avoid accounting left-edge twice in partial ACK.
- Include new data sent in PRR calculation
- Improve PRR initial transmission timing
- Fix prr_out when pipe < ssthresh
Kyle Evans [Fri, 26 Feb 2021 21:46:47 +0000 (15:46 -0600)]
jail: allow root to implicitly widen its cpuset to attach
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.
This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.
With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.
A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.
Thomas Skibo [Mon, 11 Jan 2021 20:58:12 +0000 (16:58 -0400)]
ddb: fix show devmap output on 32-bit arm
The output has been broken since 1b6dd6d772ca. Casting to uintmax_t
before the call to printf is necessary to ensure that 32-bit addresses
are interpreted correctly.
Rick Macklem [Sun, 28 Feb 2021 01:54:05 +0000 (17:54 -0800)]
nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.
This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.
Mark Johnston [Wed, 24 Feb 2021 02:15:50 +0000 (21:15 -0500)]
rmlock: Add a required compiler membar to the rlock slow path
The tracker flags need to be loaded only after the tracker is removed
from its per-CPU queue. Otherwise, readers may fail to synchronize with
pending writers attempting to propagate priority to active readers, and
readers and writers deadlock on each other. This was observed in a
stable/12-based armv7 kernel where the compiler had reordered the load
of rmp_flags to before the stores updating the queue.
Improve the handling of INIT chunks in specific szenarios and
report and appropriate error cause.
Thanks to Anatoly Korniltsev for reporting the issue for the
userland stack.
Michael Tuexen [Sun, 14 Feb 2021 11:10:31 +0000 (12:10 +0100)]
tcp: improve behaviour when using TCP_NOOPT
Use ISS for SEG.SEQ when sending a SYN-ACK segment in response to
an SYN segment received in the SYN-SENT state on a socket having
the IPPROTO_TCP level socket option TCP_NOOPT enabled.
Approved by: re (gjb)
Reviewed by: rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D28656
Michael Tuexen [Tue, 9 Feb 2021 22:35:55 +0000 (23:35 +0100)]
libsysdecode: fix decoding of TCP_NOPUSH and TCP_MD5SIG
TCP_FASTOPEN_MIN_COOKIE_LEN was incorrectly registered as a name of
a IPPROTO_TCP level socket option, which overwrote TCP_NOPUSH.
TCP_FASTOPEN_PSK_LEN was incorrectly registered as a name of an
IPPROTO_TCP level socket option, which overwrote TCP_MD5SIG.
Emmanuel Vadot [Thu, 25 Feb 2021 15:34:28 +0000 (16:34 +0100)]
mkimg: We always want the last block of the last inserted partition
Even with an absolute offset we want to know the last block the partition
otherwise we endup with an image the size of the metadata.
This allow to create image with the ESP placed at a specific position which
is useful on arm/arm64 where u-boot have always a hard time to read the ESP
if it's not aligned on 512k.
mkimg -v -o sdcard -s gpt -p efi::54M:1M -p freebsd-ufs::1G
now works.
Warner Losh [Thu, 11 Feb 2021 23:06:30 +0000 (16:06 -0700)]
efibootmgr: Check for efi supported after parsing args
Move the check for efi variables being supported to after parsing the args. This
allows '-h' to produce both as a normal user as well as on all systems.
Warner Losh [Wed, 17 Feb 2021 22:08:19 +0000 (15:08 -0700)]
uart: only use MSI on devices that advertise 1 MSI vector
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
Warner Losh [Fri, 19 Feb 2021 22:34:25 +0000 (15:34 -0700)]
boot: remove gptboot.efifat, it never should have been
conical hat reduction: Make sure we also remove gotboot.efifat. It was created,
briefly, and shouldn't have existed in the first place. Kill it at the same
place we kill boot1.efifat.
Warner Losh [Mon, 8 Feb 2021 19:29:20 +0000 (12:29 -0700)]
hid: bump HID_ITEM_MAXUSAGES to 8
My YOGA requires a minimum of 7 to parse w/o an error. Since the memory savings
are trivial and the yoga a popular system, bump the default up to 8. There's no
API/ABI issues in doing this. This hid_item struct isn't exported to userland
and the one libusbhid has is different and only shares a name...
Warner Losh [Mon, 8 Feb 2021 21:43:25 +0000 (14:43 -0700)]
acpi: limit the AMDI0020/AMDI0010 workaround to an option
It appears that production versions of EPYC firmware get the _STA method right
for these nodes. In fact, this workaround breaks on production hardware by
including too many uart nodes. This work around was for pre-release hardware
that wound up not having a large deployment. Move this work around to a kernel
option since the machines that needed it have been powered off and are difficult
to resurrect. Should there be a more significant deployment than is understood,
we can restrict it based on smbios strings.
Mark Johnston [Thu, 25 Feb 2021 23:49:47 +0000 (18:49 -0500)]
pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system. However, if PG_PS is set, there is no next level.
Approved by: re (gjb)
Reported by: rpokala
Reviewed by: kib
Tested by: rpokala
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28922
Mark Johnston [Wed, 24 Feb 2021 15:08:53 +0000 (10:08 -0500)]
iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
Approved by: re (gjb)
PR: 253583
Reviewed by: gallatin, erj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
Handle partial data re-sending on ktls/sendfile on FreeBSD
Add a handler for EBUSY sendfile error in addition to
EAGAIN. With EBUSY returned the data still can be partially
sent and user code has to be notified about it, otherwise it
may try to send data multiple times.
Robert Watson [Tue, 16 Feb 2021 15:19:05 +0000 (15:19 +0000)]
Reimplement FreeBSD/arm64 dtrace_gethrtime() to use the system timer.
dtrace_gethrtime() is the high-resolution nanosecond timestemp used
for the DTrace 'timestamp' built-in variable. The new implementation
uses the EL0 cycle counter and frequency registers in ARMv8-A. This
replaces a previous implementation that relied on an
instrumentation-safe implementation of getnanotime(), which provided
only timer resolution.
Mitchell Horne [Thu, 28 Jan 2021 17:49:47 +0000 (13:49 -0400)]
arm64: extend struct db_reg to include watchpoint registers
The motivation is to provide access to these registers from userspace
via ptrace(2) requests PT_GETDBREGS and PT_SETDBREGS.
This change breaks the ABI of these particular requests, but is
justified by the fact that the intended consumers (debuggers) have not
been taught to use them yet. Making this change now enables active
upstream work on lldb to begin using this interface, and take advantage
of the hardware debugging registers available on the platform.
PR: 252860
Reported by: Michał Górny (mgorny@gentoo.org)
Reviewed by: andrew, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Mitchell Horne [Fri, 5 Feb 2021 21:46:48 +0000 (17:46 -0400)]
arm64: handle watchpoint exceptions from EL0
This is a prerequisite to allowing the use of hardware watchpoints for
userspace debuggers.
This is also a slight departure from the x86 behaviour, since `si_addr`
returns the data address that triggered the watchpoint, not the
address of the instruction that was executed. Otherwise, there is no
straightforward way for the application to determine which watchpoint
was triggered. Make a note of this in the siginfo(3) man page.
Reviewed by: jhb, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Mitchell Horne [Tue, 9 Feb 2021 18:29:38 +0000 (14:29 -0400)]
arm64: validate breakpoint registers
In particular, we want to disallow setting breakpoints on kernel
addresses from userspace. The control register fields are validated or
ignored as appropriate.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Allan Jude [Sun, 14 Feb 2021 18:39:09 +0000 (18:39 +0000)]
Use iflib_if_init_locked() during media change instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for media changes.
iflib_if_init_locked() calls stop then init, so fixes the problem.
PR: 253473
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Approved by: re (gjb)
The previous implementation was reported to try to coalesce packets
in situations when it should not, that resulted in assertion later.
This implementation better checks the first packet of the chain for
the coallescing elligibility.
Martin Matuska [Mon, 22 Feb 2021 17:37:47 +0000 (18:37 +0100)]
zfs: disable use of hardware crypto offload drivers
From openzfs-master e7adccf7f commit message:
First, the crypto request completion handler contains a bug in that it
fails to reset fs_done correctly after the request is completed. This
is only a problem for asynchronous drivers. Second, some hardware
drivers have input constraints which ZFS does not satisfy. For
instance, ccp(4) apparently requires the AAD length for AES-GCM to be a
multiple of the cipher block size, and with qat(4) the AES-GCM AAD
length may not be longer than 240 bytes. FreeBSD's generic crypto
framework doesn't have a mechanism to automatically fall back to a
software implementation if a hardware driver cannot process a request,
and ZFS does not tolerate such errors.
Martin Matuska [Mon, 22 Feb 2021 17:05:07 +0000 (18:05 +0100)]
zfs: fix panic if scrubbing after removing a slog device
From openzfs-master 11f2e9a4 commit message:
vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL
This prevents a panic after a SLOG add/removal on the root pool followed
by a zpool scrub.
When a SLOG is removed, a hole takes its place - the vdev_ops for a hole
is vdev_hole_ops, which defines the handler functions of vdev_op_hold
and vdev_op_rele as NULL.
Patch Author: Patrick Mooney <pmooney@pfmooney.com>
Toomas Soome [Sun, 21 Feb 2021 10:32:18 +0000 (12:32 +0200)]
loader: autoload_font will hung loader when there is no local console
If we start with console set to comconsole, the local
console (vidconsole, efi) is never initialized and attempt to
use the data can render the loader hung.
Mark Johnston [Mon, 22 Feb 2021 20:50:09 +0000 (15:50 -0500)]
vm_kern: Avoid sign extension in the KVA_QUANTUM definition
Otherwise, on a powerpc64 NUMA system with hashed page tables, the
first-level superpage reservation size is large enough that the value of
the kernel KVA arena import quantum, KVA_NUMA_IMPORT_QUANTUM, is
negative and gets sign-extended when passed to vmem_set_import(). This
results in a boot-time hang on such platforms.
Toomas Soome [Thu, 28 Jan 2021 07:45:47 +0000 (09:45 +0200)]
loader: unload command should reset tg_kernel_supported in gfx_state
While loading kernel, we check if vt/vbe backend support is included in
kernel and set the tg_kernel_supported flag in gfx_state. unload
command needs to reset this flag to allow next load to perform
this check with new kernel.
Dimitry Andric [Wed, 27 Jan 2021 21:28:43 +0000 (22:28 +0100)]
Fix loader detection of vbefb support on !amd64
On i386, after 6c7a932d0b8baaaee16eca0ba061bfa6e0e57bfd, the vbefb vt
driver was no longer detected by the loader, if any kernel module was
loaded after the kernel itself.
This was caused by the parse_vt_drv_set() function being called multiple
times, resetting the detection flag. (It was called multiple times,
becuase i386 .ko files are shared objects like the kernel proper, while
this is not the case on amd64.)
Fix this by skipping the set_vt_drv_set lookup if vbefb was already
detected.