Ryan Moeller [Sat, 27 Feb 2021 03:05:31 +0000 (03:05 +0000)]
sbin/ifconfig: Get lagg status with libifconfig
Also trimmed an unused block of code that never prints out LAGG_PROTOS.
Reviewed by: kp (earlier version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28961
Ryan Moeller [Fri, 26 Feb 2021 23:40:58 +0000 (23:40 +0000)]
sbin/ifconfig: Get carp status with libifconfig
A trivial change now that ifconfig is already using libifconfig.
Reviewed by: kp (earlier version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28955
Rick Macklem [Sun, 28 Feb 2021 01:54:05 +0000 (17:54 -0800)]
nfsclient: fix panic in cache_enter_time()
Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.
This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.
Mateusz Guzik [Sat, 27 Feb 2021 22:23:23 +0000 (22:23 +0000)]
cache: temporarily drop the assert that dvp != vp when adding an entry
Historically it was allowed for any names, but arguably should never be
even attempted. Allow it again since there is a release pending and
allowing it is bug-compatible with previous behavior.
Robert Wing [Sat, 27 Feb 2021 21:07:35 +0000 (12:07 -0900)]
bhyve/snapshot: rename and bump size of MAX_SNAPSHOT_VMNAME
MAX_SNAPSHOT_VMNAME is a macro used to set the size of a character
buffer that stores a filename or the path to a file - this file is used
by the save/restore feature.
Since the file doesn't have anything to do with a vm name, rename
MAX_SNAPSHOT_VMNAME to MAX_SNAPSHOT_FILENAME. Bump the size to PATH_MAX
while here.
Robert Wing [Sat, 27 Feb 2021 21:05:52 +0000 (12:05 -0900)]
bhyvectl: reduce code duplication
Combine send_start_checkpoint() and send_start_suspend() into a
single function named snapshot_request().
snapshot_request() is equivalent to send_start_checkpoint() and
send_start_suspend() except that it takes an additional argument. The
additional argument, enum ipc_opcode, is used to determine the type of
snapshot request being performed. Also, switch to using strlcpy instead
of strncpy.
Robert Watson [Sat, 27 Feb 2021 16:22:26 +0000 (16:22 +0000)]
Add a comment on why the call to mac_vnode_relabel() might be in the wrong
place -- in the VOP rather than vn_setexttr() -- and that it is for historic
reasons. We might wish to relocate it in due course, but this way at least
we document the asymmetry.
Alexander Motin [Sat, 27 Feb 2021 15:14:05 +0000 (10:14 -0500)]
Micro-optimize OOA queue processing.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
only one row (cache line). Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.
Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve. About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.
Martin Matuska [Fri, 26 Feb 2021 21:52:41 +0000 (22:52 +0100)]
zfs: add missing checks for unsupported features
After the merge of OpenZFS master-9312e0fd1 it has become possible to
import ZFS pools witn an active org.illumos:edonr feature on FreeBSD,
leading to a panic.
In addition, "zpool status" reported all pools without edonr as upgradable
and "zpool upgrade -v" lists edonr in the list of upgradable features.
This is an accepted but not yet included bugfix by upstream.
inetd: Add examples from manual page and other sources
The manual page lists a bunch of examples, some of which already exist
in this file. Since it's both easier to remember when all examples are
listed in the same location, move examples so they get installed into
/etc/inetd.conf
This also means users won't have to copy-paste, but can simply
uncomment one or more services to use them.
As such, it also becomes necessary to remove the examples from the
manual page, so instead add a note explaining where the previous
examples as well as others may be found.
Cross-references, including to ports, have also been added where
applicable.
The rsync example has lived in the bug tracker for too long,
considering how useful it can situationally be, for example when
backup jobs on client devices are run through periodic(8) weekly.
The microsoft-ds entry is necessary for Windows 10 compatibility
(this can be confirmed with packet capturing, as it is not readily
documented at time of writing).
While here, remove two examples for which compatible daemons could not
be found in ports.
Submitted by: David Yeske <dyeske at gmail.com> (in part, prev ver)
PR: 122037
Reviewed by: kevans, brueffer, lwhsu, yuripv
Differential Revision: https://reviews.freebsd.org/D28882
Warner Losh [Fri, 26 Feb 2021 18:43:35 +0000 (11:43 -0700)]
cam: add new ASC and ASCQ values related to drive depopulation
Add 04/25 Depopulation restoration in progress, 31/04 Depopulation failed, and
31/05 Depopulation restoration failed.
These are defined in SPC-6r2 (though 31/4 was added in an earlier draft). They
relate to different aspects of in-progress or failed depopulation removal and
restoration commands.
Mark Johnston [Thu, 25 Feb 2021 23:49:47 +0000 (18:49 -0500)]
pmap: Fix largemap restart checks in the kernel_maps sysctl handler
The purpose of these checks is to ensure that the address of the
next-level page table page is valid, since nothing is synchronizing with
a concurrent update of the large map and large map PTPs are freed to the
system. However, if PG_PS is set, there is no next level.
Reported by: rpokala
Reviewed by: kib
Tested by: rpokala
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28922
Adrian Chadd [Thu, 25 Feb 2021 21:06:03 +0000 (13:06 -0800)]
[ar71xx] Fix routerstation / routerstation pro redboot FIS probing
Some changes back in ye olde times somewhere has changed the default
block size the flash device exposes. So, the default geom redboot
FIS probing (to find the partition table structure in flash!)
is no longer finding it.
So, force it to probe at the last 64k of flash regardless of the
underlying flash block size.
Brandon Bergren [Thu, 25 Feb 2021 18:55:58 +0000 (12:55 -0600)]
[PowerPC64LE] pseries: Fix input buffering logic.
In uart_phyp_get(), when the internal buffer is empty, we make a
hypercall to retrieve up to 16 bytes of input data from the
hypervisor. As this is specified to be returned in BE format, we need
to do a 64-bit byte swap on the first and second half of the data.
If the buffer being passed in was insufficient to return the fetched
data, we store the remainder in the internal buffer and use it to
satisfy the following calls to uart_phyp_get() until it is drained.
However, in this case, we were accidentally byteswapping the internal
buffer again.
Move the byteswapping code to just after the hypercall so it only gets
swapped when we're filling the buffer.
Fixes arrow keys in qemu on pseries, among other console oddities.
Sponsored by: Tag1 Consulting, Inc.
MFC after: 3 days
Ryan Libby [Thu, 25 Feb 2021 20:11:19 +0000 (12:11 -0800)]
Close races in vm object chain traversal for unlock
We were unlocking the vm object before reading the backing_object field.
In the meantime, the object could be freed and reused. This could cause
us to go off the rails in the object chain traversal, failing to unlock
the rest of the objects in the original chain and corrupting the lock
state of the victim chain.
Cy Schubert [Thu, 25 Feb 2021 19:04:50 +0000 (11:04 -0800)]
rc: fix rc script parsing
77e1ccbee3ed6c837929e4e232fd07f95bfc8294 introduced a bug whereby
rc scripts in etc/rc.d and $local_startup failed to parse output
from called commands because IFS was set to " " instead of the
default " \t\n". This caused parsing of output that contains any
whitespace character, such as tabs and newlines, not matching just a
space to fail.
Modify lock_delay() to increase the delay time after spinning
Modify lock_delay() to increase the delay time after spinning,
not before. Previously we would spin at least twice instead of once.
In NetApp's benchmarks this fixes a performance regression compared
to FreeBSD 10, which called cpu_spinwait() directly.
Reviewed By: mjg
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27331
A TCP server has to take into consideration, if TCP_NOOPT is preventing
the negotiation of TCP features. This affects most TCP options but
adherance to RFC7323 with the timestamp option will prevent a session
from getting established.
PR: 253576
Reviewed By: tuexen, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28652
Address two incorrect calculations and enhance readability of PRR code
- address second instance of cwnd potentially becoming zero
- fix sublte bug due to implicit int to uint typecase in max()
- fix bug due to typo in hand-coded CEILING() function by using howmany() macro
- use int instead of long, and add a missing long typecast
- replace if conditionals with easier to read imax/imin (as in pseudocode)
Reviewed By: #transport, kbowling
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28813
Emmanuel Vadot [Thu, 25 Feb 2021 15:34:28 +0000 (16:34 +0100)]
mkimg: We always want the last block of the last inserted partition
Even with an absolute offset we want to know the last block the partition
otherwise we endup with an image the size of the metadata.
This allow to create image with the ESP placed at a specific position which
is useful on arm/arm64 where u-boot have always a hard time to read the ESP
if it's not aligned on 512k.
mkimg -v -o sdcard -s gpt -p efi::54M:1M -p freebsd-ufs::1G
now works.
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
buf: Fix the dirtybufthresh check
dirtybufthresh is a watermark, slightly below the high watermark for
dirty buffers. When a delayed write is issued, the dirtying thread will
start flushing buffers if the dirtybufthresh watermark is reached. This
helps ensure that the high watermark is not reached, otherwise
performance will degrade as clustering and other optimizations are
disabled (see buf_dirty_count_severe()).
When the buffer cache was partitioned into "domains", the dirtybufthresh
threshold checks were not updated. Fix this.
Mark Johnston [Thu, 25 Feb 2021 15:04:44 +0000 (10:04 -0500)]
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file. An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager. In other words, both the file size and pager
size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES. In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
ipfw: make algo name argument optional for some table types
Most of table types currently supported by ipfw have only one
algorithm implementation. When user creates such tables, allow
to omit algo name in arguments. E.g. now it is possible:
ipfw table T1 create type number
ipfw table T2 create type iface
ipfw table T3 create type flow
Along with the termcap database, ncurses will now lookup for the
terminfo database, note that the terminfo database is being looked
up first and then it fallsback on the termcap one.
While here drop our custom reader for the termcap database, over the
time it is needed maintenance to be able to catchup with changes on ncurses
side.
Install the ncurses tools which are needed to deal with the terminfo
database: tic, infocmp, toe
Replace our termcap only aware tools with the ncurses counterpart:
tput, tabs, tset, clear and reset
In particular they can your the extra capabilities described in the
terminfo database, which does not exist in termcap
Note that to add a new terminfo information to the database from ports
the ports will just need to add their extra information into:
/usr/local/share/site-terminfo/<firstletteroftheterm>/<term>
Andrew Turner [Fri, 5 Feb 2021 11:41:17 +0000 (11:41 +0000)]
Limit when we call DELAY from KCSAN on amd64
In some cases the DELAY implementation on amd64 can recurse on a spin
mutex in the i8254 early delay code. Detect when this is going to
happen and don't call delay in this case. It is safe to not delay here
with the only issue being KCSAN may not detect data races.
Andrew Turner [Tue, 23 Feb 2021 12:34:45 +0000 (12:34 +0000)]
Use pmap_qenter in the N1SDP PCIe driver
In the Neoverse N1 SDP PCIe driver we need to map a page shared
between the firmware and the kernel. Previously we would use
pmap_kenter for this, however as this is not standardised between
architectures switch to the common pmap_qenter.
While here fix the error handling code to clean up on failure.
Reviewed by: br
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D28890
Kristof Provost [Wed, 24 Feb 2021 15:40:37 +0000 (16:40 +0100)]
bridge tests: Test that we also forward on some interfaces
Ensure that we not only block on some interfaces, but also forward on
some. Without the previous commit we wound up discarding on all ports,
rather than only on the ports needed to break the loop.
Make sure PD_KILL isn't passed to do_jail_attach, where it might end
up trying to kill the caller's prison (even prison0).
Fix the child jail loop in prison_deref_kill, which was doing the
post-order part during the pre-order part. That's not a system-
killer, but make jails not always die correctly.
Nathan Whitehorn [Thu, 25 Feb 2021 02:16:56 +0000 (21:16 -0500)]
Use makefs(8) in release VM-image generation instead of md(4) and newfs.
Using makefs instead reduces the privileges needed to build VM images,
simplifies the script (no need to copy files to a fresh image at the end),
and improves portability by allowing generation of cross-endian images.
As a result of the last, this patch also adds support for generation of
powerpc64 and powerpc64le VM images.
No other changes to the output. Tested and working for both amd64 and
powerpc64 targets.
Ryan Libby [Wed, 24 Feb 2021 23:56:16 +0000 (15:56 -0800)]
ofed: quiet gcc -Wint-in-bool-context
The int in the argument to the ternary triggered -Wint-in-bool-context
from gcc. Upstream linux has a larger and more entangled patch, 12f727721eee61b3d19dedb95cb893b2baa9fe41, which doesn't apply cleanly.
When we eventually sync that, we can just drop this change.
Ryan Libby [Wed, 24 Feb 2021 23:56:16 +0000 (15:56 -0800)]
ddb: reliably fail with ambiguous commands
db_cmd_match had an even/odd bug, where if a third command was partially
matched (or any odd number greater than one) the search result would be
set back from CMD_AMBIGUOUS to CMD_FOUND, causing the last command in
the list to be executed instead of failing the match.
Max Laier [Wed, 24 Feb 2021 23:56:16 +0000 (15:56 -0800)]
vm pqbatch: move unmanaged page assert under pagequeue lock
This KASSERT is overzealous because of the following race condition:
1) A managed page which is currently in PQ_LAUNDRY is freed.
vm_page_free_prep calls vm_page_dequeue_deferred()
The page state is:
PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
2) The laundry worker comes around and pick up the page and calls
vm_pageout_defer(m, PQ_LAUNDRY, true) to check if page is still in the
queue. We do a vm_page_astate_load and get
PQ_LAUNDRY, PGA_DEQUEUE|PGA_ENQUEUED
as per above.
3) The laundry worker is pre-empted and another thread allocates our page
from the free pool. For example vm_page_alloc_domain_after calls
vm_page_dequeue() and sets VPO_UNMANAGED because we are allocating for
an OBJT_UNMANAGED object.
The page state is:
PQ_NONE, 0 - VPO_UNMANAGED
4) The laundry worker resumes, and processes vm_pageout_defer based on the
stale astate which leads to a call to vm_page_pqbatch_submit, which will
trip on the KASSERT.
Marcin Wojtas [Fri, 22 Jan 2021 12:13:03 +0000 (13:13 +0100)]
Enable PIE by default on 64-bit architectures
This patch adds Position Independent Executables (PIE)
flags for building OS. It allows to enable the ASLR
feature based only on the sysctl knobs, without
need to rebuild the image. Tests showed that
no problems with stability / performance degradation
were seen when using PIEs with ASLR disabled.
The change is limited only for 64-bit architectures.
Use bsd.opts.mk instead of the src.opts.mk in order
to satisfy all build dependencies related to MK_PIE.
Marcin Wojtas [Tue, 23 Feb 2021 12:42:26 +0000 (13:42 +0100)]
Disable PIE for powerpc bootloaders.
Bootloaders for powerpc are not built as position independent
code. Since bsd.prog.mk is used for building, when PIE is enabled,
the PIE flags are added and that causes the build to fail.
Adding MK_PIE=no stops bsd.prog.mk from adding PIE specific flags.
Marcin Wojtas [Fri, 12 Feb 2021 15:41:49 +0000 (16:41 +0100)]
Disable PIE for MIPS ubldr
When performing buildworld for MIPS with PIE enabled, the build fails
with "position-independent code requires '-mabicalls'" message.
-mno-abicalls and -fno-pic flags are explicitly set in MIPS ubldr
makefile, so to work around this problem, set MK_PIE=no for MIPS
ubldr.
Nathan Whitehorn [Wed, 24 Feb 2021 15:31:44 +0000 (10:31 -0500)]
Add GPT PREP-boot type to mkimg(1) from geom_gpt.
This partition type can be used to boot some PowerKVM VMs. We don't
support it well because of some limitations in SLOF, but it's worth at
least have feature parity in geom and mkimg.
Mark Johnston [Wed, 24 Feb 2021 15:08:53 +0000 (10:08 -0500)]
iflib: Avoid double counting in rxeof
iflib_rxeof() was counting everything twice. This was introduced when
pfil hooks were added to the iflib receive path. We want to count rx
packets/bytes before the pfil hooks are executed, so remove the counter
adjustments that are executed after.
PR: 253583
Reviewed by: gallatin, erj
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28900
Call softdep_prealloc() before taking ffs_lock_ea(), if unlock is committing
softdep_prealloc() must be called to ensure enough journal space is
available, before ffs_extwrite(). Also it must be done before taking
ffs_lock_ea(), because it calls ffs_syncvnode(), potentially dropping
the vnode lock.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ffs_lock_ea is after the vnode lock, so vnode must not be relocked under
lock_ea. Move ffs_truncate() call in ffs_close_ea() after the lock_ea is
dropped, and only truncate to length zero, since this is the only mode
supported by ffs_truncate() for EAs. Previously code did truncation and
then write.
Zero the part of the ext area that is unused, if truncation is due but not
done because ea area is not zero-length.
Reviewed by: mckusick
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ffs: do not call softdep_prealloc() from UFS_BALLOC()
Do it in ffs_write(), where we can gracefuly handle relock and its
consequences. In particular, recheck the v_data to see if the vnode
reclamation ended, and return EBADF when we cannot proceed with the
write.
Reviewed by: mckusick
Reported by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation