Vitaliy Gusev [Thu, 9 Jun 2022 12:57:25 +0000 (08:57 -0400)]
vmm: move bumping VMEXIT_USERSPACE stat to the right place
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.
Also svm_run's implementation even wrongly misses that stat.
Yan Ka Chiu [Sun, 22 May 2022 16:33:02 +0000 (12:33 -0400)]
pam_exec: fix segfault when authtok is null
According to pam_exec(8), the `expose_authtok` option should be ignored
when the service function is `pam_sm_setcred`. Currently `pam_exec` only
prevent prompt for anth token when `expose_authtok` is set on
`pam_sm_setcred`. This subsequently led to segfault when there isn't an
existing auth token available.
Bug reported on this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263893
After reading https://reviews.freebsd.org/rS349556 I am not sure if the
default behaviour supposed to be simply not prompt for authentication
token, or is it to ignore the option entirely as stated in the man page.
This patch is therefore only adding an additional NULL check on the item
`pam_get_item` provide, and exit with `PAM_SYSTEM_ERR` when such item is
NULL.
Mitchell Horne [Tue, 14 Jun 2022 16:09:11 +0000 (13:09 -0300)]
ddb: namespacing of struct command
'command' is too generic for something specific to the kernel debugger;
change this so it is less likely to collide with local variable names.
Also rename struct command_table to struct db_command_table.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35367
Mitchell Horne [Thu, 2 Jun 2022 13:14:41 +0000 (10:14 -0300)]
Use KERNEL_PANICKED() in more places
This is slightly more optimized than checking panicstr directly. For
most of these instances performance doesn't matter, but let's make
KERNEL_PANICKED() the common idiom.
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D35373
Glen Barber [Wed, 22 Jun 2022 18:23:39 +0000 (14:23 -0400)]
release: arm - increase IMAGE_SIZE
For some reason, while 3072M is sufficient for 14-CURRENT, it is not
for 13-STABLE. Notably, previous investigations suggest that there
are changes to makefs(8) in main that do not exist in stable/13,
in which 14-CURRENT seems perfectly happy to ignore the target image
size is smaller than the data being populated to it.
I have no futher investigative details at the moment, but as this had
caused arm failures for the past three weeks, this is the more hasty
measure, hence the MFC timeframe noted.
Justin Hibbits [Mon, 13 Jun 2022 19:04:29 +0000 (14:04 -0500)]
arm64: Print per-CPU cache summary
Summary:
It can be useful to see a summary of CPU caches on bootup. This is done
for most platforms already, so add this to arm64, in the form of (taken
from Apple M1 pro test):
This is printed out per-CPU, only under bootverbose.
Future refinements could instead determine if a cache level is shared
with other cores (L2 is shared among cores on some SoCs, for instance),
and perform a better calculation to the full true cache sizes. For
instance, it's known that the M1 pro, on which this test was done, has 2
12MB L2 clusters, for a total of 24MB. Seeing each CPU with 12288KB L2
would make one think that there's 12MB * NCPUs, for possibly 120MB
cache, which is incorrect.
Sponsored by: Juniper Networks, Inc.
Reviewed by: #arm64, andrew
Differential Revision: https://reviews.freebsd.org/D35366
John Baldwin [Thu, 13 Jan 2022 22:49:14 +0000 (14:49 -0800)]
Remove usr/lib/libssp.a.
GNU's libssp installed this (in addition to libssp_nonshared.a), but
the libc-based libssp does not.
Reviewed by: kevans, emaste
Fixes: cd0d51baaa45 Provide libssp based on libc
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33852
Emmanuel Vadot [Thu, 4 Nov 2021 09:42:37 +0000 (10:42 +0100)]
linuxkpi: Add i2c support
Add i2c support to linuxkpi. This is needed by drm-kmod.
For every i2c_adapter added by i2c_add_adapter we add a child to the
device named "lkpi_iic". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
For every i2c_adapter added by i2c_bit_add_bus we add a child to the
device named "lkpi_iicbb". This child handle the conversion between
Linux i2c_msgs to FreeBSD iic_msgs.
With the help of iic(4), this expose the i2c controller to userspace
allowing a user to query DDC information from a monitor.
e.g.: i2c -f /dev/iic0 -a 0x28 -c 128 -d r
will query the standard EDID from the monitor if plugged.
The bitbang part (lkpi_iicbb) isn't tested at all for now as I don't have
compatible hardware (all my hardware have native i2c controller).
Tested on: Intel (SandyBridge, Skylake, ApolloLake)
Tested on: AMD (Picasso, Polaris (amd64 and arm64))
Warner Losh [Fri, 10 Dec 2021 00:04:45 +0000 (17:04 -0700)]
Create wrapper for Giant taken for newbus
Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.
Corvin Köhne [Mon, 30 May 2022 08:02:52 +0000 (10:02 +0200)]
vmm: add tunable to trap WBINVD
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.
Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.
In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.
Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.
Corvin Köhne [Mon, 30 May 2022 08:01:36 +0000 (10:01 +0200)]
bhyve: use bhyve_config for SMBIOS strings
Some software uses SMBIOS entries to identify the system on which it's
running. In order to make it possible to use such software inside a VM,
SMBIOS entries should be configurable. Therefore, bhyve_config can be
used. While only a few SMBIOS entries might be of interest, it makes
sense that all SMBIOS entries are configurable. This way all SMBIOS
tables are build the same way and there's no special handling for some
tables.
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.
Doug Moore [Fri, 10 Jun 2022 21:53:16 +0000 (16:53 -0500)]
rb_tree: drop needless tests from rb_next, rb_prev
In RB_NEXT, when there is no RB_RIGHT node, the search must proceed
through the parent node.
There is code written to handle the case when the parent is non-NULL
and the current element is the left child of that parent. If you
assume that the current element is either the left child of its
parent, or the right child of its parent, but not both, then this test
is not necessary. Instead of assigning RB_PARENT(elm, field) to elm
when elm == RB_LEFT, removing the test has the code assign
RB_PARENT(elm, field) to elm when elm != RB_RIGHT. There's no need to
examine the RB_LEFT field at all.
This change removes that needless RB_LEFT test, and makes a similar
change to the RB_PREV implementation.
Alan Somers [Mon, 16 May 2022 22:32:10 +0000 (16:32 -0600)]
makefs: fix calculation of file sizes
When a new FS image is created we need to calculate how much space each
file is going to consume.
Fix two bugs in that logic:
1) Count the space needed for indirect blocks for large files.
1) Normally the trailing data of a file is written to a block of frag
size, 4 kB by default.
However for files that use indirect blocks a full block is allocated,
32kB by default. Take that into account.
Adjust size calculations to match what is done in ffs_mkfs routine:
* Depending on the UFS version the superblock is stored at a different
offset. Take that into account.
* Add the cylinder group block size.
* All of the above has to be aligned to the block size.
Finally, Remove "ncg" variable. It's always 1 and it was used to
multiply stuff.
Alan Somers [Wed, 4 May 2022 23:36:17 +0000 (17:36 -0600)]
fusefs: handle evil servers that return illegal inode numbers
* If during FUSE_CREATE, FUSE_MKDIR, etc the server returns the same
inode number for the new file as for its parent directory, reject it.
Previously this would triggers a recurse-on-non-recursive lock panic.
* If during FUSE_LINK the server returns a different inode number for
the new name as for the old one, reject it. Obviously, that can't be
a hard link.
* If during FUSE_LOOKUP the server returns the same inode number for the
new file as for its parent directory, reject it. Nothing good can
come of this.
Test pfsync in a more realistic scenario with carp and route_to rules.
Build this topology and initiate a single ping session from client to
server:
┌──────┐
│client│
└───┬──┘
│
┌───┴───┐
│bridge0│
└┬─────┬┘
│ │
┌────────────────┴─┐ ┌─┴────────────────┐
│gw_route_to_master├─┤gw_route_to_backup│
└────────────────┬─┘ └─┬────────────────┘
│ │
┌┴─────┴┐
│bridge1│
└┬─────┬┘
│ │
┌────────────────┴─┐ ┌─┴────────────────┐
│gw_reply_to_master├─┤gw_reply_to_backup│
└────────────────┬─┘ └─┬────────────────┘
│ │
┌┴─────┴┐
│bridge2│
└───┬───┘
│
┌───┴──┐
│server│
└──────┘
gw* jails forward traffic through pf route-to rules, not fib lookups.
If backup_promotion arg is given (as in the pfsync_pbr test case), a
carp failover event occurs during the ping session on both gateways.
Verify that ping messages still go where we expect them to go.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Kristof Provost [Sat, 4 Jun 2022 10:38:40 +0000 (12:38 +0200)]
pf: Improve route-to handling of pfsync'd states
When a state if pfsync’d to a different host it doesn’t get all of the
expected pointers, including the pointer to the struct pfi_kif / struct
ifnet rt_kif pointer. (I.e. the interface to route out on).
That in turn means that pf_route() ends up dropping the packet.
Use the rule's struct pfi_kif pointer so we can still route out of the
expected interface.
MFC after: 2 weeks
Sponsored by: Orange Business Services
Dmitry Chagin [Fri, 17 Jun 2022 21:35:52 +0000 (00:35 +0300)]
Cut out leftover of the sv_transtrap hook from the struct sysentvec on
mips. The hook is deleted by commit eca368ec from the main, but since
mips is retired from main, it is left after merge.
John Baldwin [Tue, 9 Nov 2021 17:42:12 +0000 (09:42 -0800)]
vfs: Consistently validate AT_* flags in kern_* functions.
Some syscalls checked for invalid AT_* flags in sys_* and others in
kern_*.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D32864
Dmitry Chagin [Mon, 30 May 2022 16:53:52 +0000 (19:53 +0300)]
linux(4): Fix the type of a constant in the signal mask macro
Since l_sigset_t is 64-bit unsigned on all Linuxulators, fix the type
of a constant in the signal mask manipulation macro.
The suffix L indicates type long which is 32-bit on i386, therefore,
bitwise operations between a 32-bit constant and 64-bit signal mask
lead to the wrong result.
Dmitry Chagin [Mon, 30 May 2022 16:49:45 +0000 (19:49 +0300)]
linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.
Dmitry Chagin [Sat, 28 May 2022 20:46:05 +0000 (23:46 +0300)]
linux(4): Handle SO_TIMESTAMPNS socket option
The SO_TIMESTAMPNS enables or disables the receiving of the SCM_TIMESTAMPNS
control message. The cmsg_data field is a struct timespec.
To distinguish between SO_TIMESTAMP and SO_TIMESTAMPNS in the recvmsg()
map the last one to the SO_BINTIME and convert bintime to the timespec.
In the rest, implementation is identical to the SO_TIMESTAMP.
Dmitry Chagin [Sat, 28 May 2022 20:45:39 +0000 (23:45 +0300)]
linux(4): Handle 64-bit SO_TIMESTAMP for 32-bit binaries
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.
To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.
Dmitry Chagin [Sat, 28 May 2022 20:30:22 +0000 (23:30 +0300)]
linux(4): Add a helper to copyout getsockopt value
For getsockopt(), optlen is a value-result argument, which is modified
on return to indicate the actual size of the value returned.
For some cases this was missed, fixed.
Dmitry Chagin [Sat, 28 May 2022 20:29:12 +0000 (23:29 +0300)]
linux(4): Check the socket before any others sanity checks
Strictly speaking, this check is performed by the kern_recvit(), but in
the Linux emulation layer before calling the kernel we do other sanity
checks and conversions from Linux types to the native types. This changes
an order of the error returning that is critical for some buggy Linux
applications.
For recvmmsg() syscall this fixes a panic in case when the user-supplied
vlen value is 0, then error is not initialized and garbage passed to the
bsd_to_linux_errno().
Split cpuset_getaffinity() into a two counterparts, where the
user_cpuset_getaffinity() is intended to operate on the cpuset_t from
user va, while kern_cpuset_getaffinity() expects the cpuset from kernel
va.
Accordingly, the code that clears the high bits is moved to the
user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns
the size of set copied to the user-space and then glibc wrapper clears
the high bits.