Mitchell Horne [Thu, 6 Jan 2022 19:40:16 +0000 (15:40 -0400)]
livedump: add event handler hooks
Add three hooks to the livedump process: before, after, and for each
block of dumped data. This allows, for example, quiescing the system
before the dump begins or protecting data of interest to ensure its
consistency in the final output.
Mitchell Horne [Tue, 23 Mar 2021 20:47:14 +0000 (17:47 -0300)]
Add new vnode dumper to support live minidumps
This dumper can instantiate and write the dump's contents to a
file-backed vnode.
Unlike existing disk or network dumpers, the vnode dumper should not be
invoked during a system panic, and therefore is not added to the global
dumper_configs list. Instead, the vnode dumper is constructed ad-hoc
when a live dump is requested using the new ioctl on /dev/mem. This is
similar in spirit to a kgdb session against the live system via
/dev/mem.
As described briefly in the mem(4) man page, live dumps are not
guaranteed to result in a usuable output file, but offer some debugging
value where forcefully panicing a system to dump its memory is not
desirable/feasible.
A future change to savecore(8) will add an option to save a live dump.
Reviewed by: markj, Pau Amma <pauamma@gundo.com> (manpages)
Discussed with: kib
MFC after: 3 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D33813
Mitchell Horne [Mon, 9 Aug 2021 17:21:07 +0000 (14:21 -0300)]
Split out dumper allocation from list insertion
Add a new function, dumper_create(), to allocate a dumper.
dumper_insert() will call this function and retains the existing
behaviour.
This is desirable for performing live dumps of the system. Here, there
is a need to allocate and configure a dumper structure that is invoked
outside of the typical debugger context. Therefore, it should be
excluded from the list of panic-time dumpers.
free_single_dumper() is made public and renamed to dumper_destroy().
Reviewed by: kib, markj
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D34068
Eric van Gyzen [Fri, 6 Aug 2021 15:38:51 +0000 (10:38 -0500)]
netdump: send key before dump, in case dump fails
Previously, if an encrypted netdump failed, such as due to a timeout or
network failure, the key was not saved, so a partial dump was
completely useless.
Send the key first, so the partial dump can be decrypted, because even a
partial dump can be useful.
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.
As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).
However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.
The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.
The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.
Doug Moore [Mon, 6 Jun 2022 21:26:01 +0000 (16:26 -0500)]
iommu_gas: restrict tree search to promising paths
In iommu_gas_lowermatch and iommu_gas_uppermatch, a subtree search is
quickly terminated if the largest available free space in the subtree
is below a limit, where that limit is related to the size of the
allocation request. However, that limit is too small; it does not
account for both of the guard pages that will surround the allocated
space, but only for one of them. Consequently, it permits the search
to proceed through nodes that cannot produce a successful allocation
for all the requested space. Fix that limit to improve search
performance.
Warner Losh [Sun, 21 Nov 2021 16:05:07 +0000 (09:05 -0700)]
loader: abstract boot services exiting to libefi function
Move direct call of ExitBootServices to efi_exit_boot_services. This
function sets boot_services_active to false so callers don't have to do
it everywhere (though currently only loader/bootinfo.c is affected).
Warner Losh [Thu, 4 Nov 2021 15:34:20 +0000 (09:34 -0600)]
efi: switch boot_services_gone to boot_services_active
Turn the presence or absence of boot services into a positive bool (and
change its type to bool). Move declaration to efi.h in the global
variables section.
Cy Schubert [Mon, 20 Jun 2022 14:21:55 +0000 (07:21 -0700)]
wpa: Restore missing patch
In December after a failed MFV due to a now understood issue I had with
git -- git aborts with extremely large MFV -- this patch was removed
during the revert. Restore this patch.
Martin Matuska [Sat, 25 Jun 2022 07:13:20 +0000 (09:13 +0200)]
zfs: merge openzfs/zfs@6c3c5fcfb (zfs-2.1-release) into stable/13
OpenZFS release 2.1.5
Notable upstream pull requeset merges:
#12575 Reject zfs send -RI with nonexistent fromsnap
#12687 Skip spacemaps reading in case of pool readonly import
#12746 Default to zfs_dmu_offset_next_sync=1
#13277 FreeBSD: Use NDFREE_PNBUF if available
#13311 Fix error handling in FreeBSD's get/putpages VOPs
#13345 FreeBSD: Fix translation from ABD to physical pages
#13373 zfs: holds: dequadratify
#13375 Corrected edge case in uncompressed ARC->L2ARC handling
#13405 Reduce dbuf_find() lock contention
#13406 FreeBSD: use zero_region instead of allocating a dedicated page
#13499 zed: Take no action on scrub/resilver checksum errors
#13484 FreeBSD: libspl: Add locking around statfs globals
#13513 Remove wrong assertion in log spacemap
#13537 Improve sorted scan memory accounting
Vitaliy Gusev [Thu, 9 Jun 2022 12:57:25 +0000 (08:57 -0400)]
vmm: move bumping VMEXIT_USERSPACE stat to the right place
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.
Also svm_run's implementation even wrongly misses that stat.
Yan Ka Chiu [Sun, 22 May 2022 16:33:02 +0000 (12:33 -0400)]
pam_exec: fix segfault when authtok is null
According to pam_exec(8), the `expose_authtok` option should be ignored
when the service function is `pam_sm_setcred`. Currently `pam_exec` only
prevent prompt for anth token when `expose_authtok` is set on
`pam_sm_setcred`. This subsequently led to segfault when there isn't an
existing auth token available.
Bug reported on this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263893
After reading https://reviews.freebsd.org/rS349556 I am not sure if the
default behaviour supposed to be simply not prompt for authentication
token, or is it to ignore the option entirely as stated in the man page.
This patch is therefore only adding an additional NULL check on the item
`pam_get_item` provide, and exit with `PAM_SYSTEM_ERR` when such item is
NULL.
Mitchell Horne [Tue, 14 Jun 2022 16:09:11 +0000 (13:09 -0300)]
ddb: namespacing of struct command
'command' is too generic for something specific to the kernel debugger;
change this so it is less likely to collide with local variable names.
Also rename struct command_table to struct db_command_table.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35367
Mitchell Horne [Thu, 2 Jun 2022 13:14:41 +0000 (10:14 -0300)]
Use KERNEL_PANICKED() in more places
This is slightly more optimized than checking panicstr directly. For
most of these instances performance doesn't matter, but let's make
KERNEL_PANICKED() the common idiom.
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35373
Glen Barber [Wed, 22 Jun 2022 18:23:39 +0000 (14:23 -0400)]
release: arm - increase IMAGE_SIZE
For some reason, while 3072M is sufficient for 14-CURRENT, it is not
for 13-STABLE. Notably, previous investigations suggest that there
are changes to makefs(8) in main that do not exist in stable/13,
in which 14-CURRENT seems perfectly happy to ignore the target image
size is smaller than the data being populated to it.
I have no futher investigative details at the moment, but as this had
caused arm failures for the past three weeks, this is the more hasty
measure, hence the MFC timeframe noted.
Justin Hibbits [Mon, 13 Jun 2022 19:04:29 +0000 (14:04 -0500)]
arm64: Print per-CPU cache summary
Summary:
It can be useful to see a summary of CPU caches on bootup. This is done
for most platforms already, so add this to arm64, in the form of (taken
from Apple M1 pro test):
This is printed out per-CPU, only under bootverbose.
Future refinements could instead determine if a cache level is shared
with other cores (L2 is shared among cores on some SoCs, for instance),
and perform a better calculation to the full true cache sizes. For
instance, it's known that the M1 pro, on which this test was done, has 2
12MB L2 clusters, for a total of 24MB. Seeing each CPU with 12288KB L2
would make one think that there's 12MB * NCPUs, for possibly 120MB
cache, which is incorrect.
Sponsored by: Juniper Networks, Inc.
Reviewed by: #arm64, andrew
Differential Revision: https://reviews.freebsd.org/D35366
John Baldwin [Thu, 13 Jan 2022 22:49:14 +0000 (14:49 -0800)]
Remove usr/lib/libssp.a.
GNU's libssp installed this (in addition to libssp_nonshared.a), but
the libc-based libssp does not.
Reviewed by: kevans, emaste
Fixes: cd0d51baaa45 Provide libssp based on libc
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33852
Emmanuel Vadot [Thu, 4 Nov 2021 09:42:37 +0000 (10:42 +0100)]
linuxkpi: Add i2c support
Add i2c support to linuxkpi. This is needed by drm-kmod.
For every i2c_adapter added by i2c_add_adapter we add a child to the
device named "lkpi_iic". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
For every i2c_adapter added by i2c_bit_add_bus we add a child to the
device named "lkpi_iicbb". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
With the help of iic(4), this expose the i2c controller to userspace
allowing a user to query DDC information from a monitor.
e.g.: i2c -f /dev/iic0 -a 0x28 -c 128 -d r
will query the standard EDID from the monitor if plugged.
The bitbang part (lkpi_iicbb) isn't tested at all for now as I don't have
compatible hardware (all my hardware have native i2c controller).
Tested on: Intel (SandyBridge, Skylake, ApolloLake)
Tested on: AMD (Picasso, Polaris (amd64 and arm64))
Warner Losh [Fri, 10 Dec 2021 00:04:45 +0000 (17:04 -0700)]
Create wrapper for Giant taken for newbus
Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.
Corvin Köhne [Mon, 30 May 2022 08:02:52 +0000 (10:02 +0200)]
vmm: add tunable to trap WBINVD
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.
Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.
In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.
Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.
Corvin Köhne [Mon, 30 May 2022 08:01:36 +0000 (10:01 +0200)]
bhyve: use bhyve_config for SMBIOS strings
Some software uses SMBIOS entries to identify the system on which it's
running. In order to make it possible to use such software inside a VM,
SMBIOS entries should be configurable. Therefore, bhyve_config can be
used. While only a few SMBIOS entries might be of interest, it makes
sense that all SMBIOS entries are configurable. This way all SMBIOS
tables are build the same way and there's no special handling for some
tables.
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.
Doug Moore [Fri, 10 Jun 2022 21:53:16 +0000 (16:53 -0500)]
rb_tree: drop needless tests from rb_next, rb_prev
In RB_NEXT, when there is no RB_RIGHT node, the search must proceed
through the parent node.
There is code written to handle the case when the parent is non-NULL
and the current element is the left child of that parent. If you
assume that the current element is either the left child of its
parent, or the right child of its parent, but not both, then this test
is not necessary. Instead of assigning RB_PARENT(elm, field) to elm
when elm == RB_LEFT, removing the test has the code assign
RB_PARENT(elm, field) to elm when elm != RB_RIGHT. There's no need to
examine the RB_LEFT field at all.
This change removes that needless RB_LEFT test, and makes a similar
change to the RB_PREV implementation.
Alan Somers [Mon, 16 May 2022 22:32:10 +0000 (16:32 -0600)]
makefs: fix calculation of file sizes
When a new FS image is created we need to calculate how much space each
file is going to consume.
Fix two bugs in that logic:
1) Count the space needed for indirect blocks for large files.
1) Normally the trailing data of a file is written to a block of frag
size, 4 kB by default.
However for files that use indirect blocks a full block is allocated,
32kB by default. Take that into account.
Adjust size calculations to match what is done in ffs_mkfs routine:
* Depending on the UFS version the superblock is stored at a different
offset. Take that into account.
* Add the cylinder group block size.
* All of the above has to be aligned to the block size.
Finally, Remove "ncg" variable. It's always 1 and it was used to
multiply stuff.
Alan Somers [Wed, 4 May 2022 23:36:17 +0000 (17:36 -0600)]
fusefs: handle evil servers that return illegal inode numbers
* If during FUSE_CREATE, FUSE_MKDIR, etc the server returns the same
inode number for the new file as for its parent directory, reject it.
Previously this would triggers a recurse-on-non-recursive lock panic.
* If during FUSE_LINK the server returns a different inode number for
the new name as for the old one, reject it. Obviously, that can't be
a hard link.
* If during FUSE_LOOKUP the server returns the same inode number for the
new file as for its parent directory, reject it. Nothing good can
come of this.
Test pfsync in a more realistic scenario with carp and route_to rules.
Build this topology and initiate a single ping session from client to
server:
┌──────┐
│client│
└───┬──┘
│
┌───┴───┐
│bridge0│
└┬─────┬┘
│ │
┌────────────────┴─┐ ┌─┴────────────────┐
│gw_route_to_master├─┤gw_route_to_backup│
└────────────────┬─┘ └─┬────────────────┘
│ │
┌┴─────┴┐
│bridge1│
└┬─────┬┘
│ │
┌────────────────┴─┐ ┌─┴────────────────┐
│gw_reply_to_master├─┤gw_reply_to_backup│
└────────────────┬─┘ └─┬────────────────┘
│ │
┌┴─────┴┐
│bridge2│
└───┬───┘
│
┌───┴──┐
│server│
└──────┘
gw* jails forward traffic through pf route-to rules, not fib lookups.
If backup_promotion arg is given (as in the pfsync_pbr test case), a
carp failover event occurs during the ping session on both gateways.
Verify that ping messages still go where we expect them to go.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Kristof Provost [Sat, 4 Jun 2022 10:38:40 +0000 (12:38 +0200)]
pf: Improve route-to handling of pfsync'd states
When a state if pfsync’d to a different host it doesn’t get all of the
expected pointers, including the pointer to the struct pfi_kif / struct
ifnet rt_kif pointer. (I.e. the interface to route out on).
That in turn means that pf_route() ends up dropping the packet.
Use the rule's struct pfi_kif pointer so we can still route out of the
expected interface.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Dmitry Chagin [Fri, 17 Jun 2022 21:35:52 +0000 (00:35 +0300)]
Cut out leftover of the sv_transtrap hook from the struct sysentvec on
mips. The hook is deleted by commit eca368ec from the main, but since
mips is retired from main, it is left after merge.
John Baldwin [Tue, 9 Nov 2021 17:42:12 +0000 (09:42 -0800)]
vfs: Consistently validate AT_* flags in kern_* functions.
Some syscalls checked for invalid AT_* flags in sys_* and others in
kern_*.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D32864
Dmitry Chagin [Mon, 30 May 2022 16:53:52 +0000 (19:53 +0300)]
linux(4): Fix the type of a constant in the signal mask macro
Since l_sigset_t is 64-bit unsigned on all Linuxulators, fix the type
of a constant in the signal mask manipulation macro.
The suffix L indicates type long which is 32-bit on i386, therefore,
bitwise operations between a 32-bit constant and 64-bit signal mask
lead to the wrong result.
Dmitry Chagin [Mon, 30 May 2022 16:49:45 +0000 (19:49 +0300)]
linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.