Mark Johnston [Sat, 25 Sep 2021 14:15:31 +0000 (10:15 -0400)]
amd64: Avoid copying td_frame from kernel procs
When creating a new thread, we unconditionally copy td_frame from the
creating thread. For threads which never return to user mode, this is
unnecessary since td_frame just points to the base of the stack or a
random interrupt frame.
If KASAN is configured this copying may also trigger false positives
since the td_frame region may contain poisoned stack regions. It was
not noticed before since thread0 used a dummy proc0_tf trapframe, and
kernel procs are generally created by thread0. Since commit df8dd6025af88a99d34f549fa9591a9b8f9b75b1, though, we call
cpu_thread_alloc(&thread0) when initializing FPU state, which
reinitializes thread0.td_frame.
Work around the problem by not copying the frame unless the copying
thread came from user mode. While here, de-duplicate the copying and
remove redundant re(initialization) of td_frame.
Reported by: syzbot+2ec89312bffbf38d9aec@syzkaller.appspotmail.com
Reviewed by: kib
Fixes: df8dd6025af8
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32057
Reported by: David Leadbeater, G-Research
Reviewed by: gordon
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38648
Zhenlei Huang [Thu, 23 Feb 2023 18:00:09 +0000 (02:00 +0800)]
Delete obsolete Solaris compat header file stdlib.h
This drops function `getexecname()` redirection.
Historically `getexecname()` is a compatibility definition. Since
openzfs has its own implementation of function `getexecname()` in libspl
and has been merged into base, the compat header file stdlib.h is
no longer needed and should not be used.
Also without this fix libspl will end up an incompatible version of
`getprogname()` with libc. In particular, if zfs is enabled, programs
such as pgrep in /rescue can be wrongly statically linked with libspl
and will not function properly.
PR: 269738
Reviewed by: markj
Fixes: 9e5787d2284e Merge OpenZFS support in to HEAD
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D38733
Dimitry Andric [Sat, 25 Feb 2023 15:25:57 +0000 (16:25 +0100)]
Ensure .inc files are regenerated when llvm/clang tblgen binaries change
When doing a fully incremental build (with WITHOUT_CLEAN enabled), from
a commit before llvm 15 was merged (3264f6b88fce), to a commit after
that, a number of .inc files were not regenerated. This could lead to
unexpected compilation errors when these .inc files were included from
llvm-project sources, similar to:
In file included from /usr/src/contrib/llvm-project/clang/lib/CodeGen/CGBuiltin.cpp:8268:
/usr/obj/usr/src/amd64.amd64/lib/clang/libclang/clang/Basic/arm_mve_builtin_cg.inc:279:18: error: no matching constructor for initialization of 'clang::CodeGen::Address'
Address Val2 = Address(Val1, CharUnits::fromQuantity(2));
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work around this by making the .inc files dependent on the tblgen binary
used for generating them. E.g., we can relatively safely assume that if
the binary gets updated, the .inc files must also be updated. (Although
this is not 100% optimal, the gain by complicating things even more is
probaby not worth the effort.)
MFC after: 3 days
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D38770
netlink: make the maximum allowed netlink socket buffer runtime tunable.
Dumping large routng tables (>1M paths with multipath) require the socket
buffer which is larger than the currently defined limit.
Allow the limit to be set in runtime, similar to kern.ipc.maxsockbuf.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 1 day
Dmitry Chagin [Sun, 26 Feb 2023 13:42:22 +0000 (16:42 +0300)]
linprocfs(4): Fixup process size in the /proc/pid/stat file
According to the Linux sources the kernel exposes a proces virtual
memory size via proc filesystem into the three files - stat, status
and statm. This is the struct mm->total_vm value adjusted to the
corresponding units - bytes, kilobytes and pages.
The fix is based on a fernape@ analysis.
PR: 265937
Reported by: Ray Bellis
MFC after: 3 days
Dmitry Chagin [Tue, 14 Feb 2023 14:46:33 +0000 (17:46 +0300)]
linux(4): Rename linux_timer.h to linux_time.h
To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.
Dmitry Chagin [Tue, 14 Feb 2023 14:46:31 +0000 (17:46 +0300)]
linux(4): Move uselib() to i386
This obsolete system call is not supported by glibc. In ancient libc
versions (before glibc 2.0), uselib() was used to load the shared
libraries with names found in an array of names in the binary.
On Linux, since 3.15, this system call is available only when
the kernel is configured with the CONFIG_USELIB option.
It doesn't look like anyone needs this syscall for others Linuxulators,
so move it to the corresponding MD Linuxulator.
Dmitry Chagin [Tue, 14 Feb 2023 14:46:31 +0000 (17:46 +0300)]
linux(4): Cleanup sys/sysent.h from linux_util
Include sys/sysent.h directly where it needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
Dmitry Chagin [Tue, 14 Feb 2023 14:46:30 +0000 (17:46 +0300)]
linux(4): Cleanup vm includes from linux_util.h
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.
Mark Johnston [Tue, 7 Feb 2023 19:33:27 +0000 (14:33 -0500)]
libdwarf: Add some constants from DWARF 5
This is not exhaustive - DWARF 5 has some new enumeration types not
implemented here - but I think I caught all the ones that are extended
in DWARF 5, plus the new compilation unit type (DW_UT_*), needed when
parsing .debug_info headers.
These were useful when extending libdwarf/ctfconvert/readelf to handle
DWARF generated by gcc 12, which is version 5 by default.
Reviewed by: emaste
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38273
Corvin Köhne [Wed, 11 Aug 2021 07:58:15 +0000 (09:58 +0200)]
bhyve: add basic qemu fwcfg implementation
qemu's fwcfg and bhyve's fwctl are both used to configure ovmf. qemu's
fwcfg is much more powerfull than bhyve's fwctl. For that reason, add
support for qemu's fwcfg.
Mina Galić [Tue, 28 Feb 2023 02:58:45 +0000 (19:58 -0700)]
apic: prevent divide by zero in CPU frequency init
If a CPU for some reason returns 0 as CPU frequency, we currently panic
on the resulting divide by zero when trying to initialize the CPU(s) via
APIC. When this happens, we'll fallback to measuring the frequency
instead.
The reason for this is while looping through loose source routing (LSRR)
and strict source routing (SSRR), hlen will become smaller than the IP
header. It may even become negative. This should terminate the loop.
However, when hlen is unsigned, an integer underflow occurs becoming a
large number causing the loop to continue virtually forever until hlen
is either by chance smaller than the lenghth of an IP header or it
segfaults.
Mark Johnston [Mon, 13 Feb 2023 21:24:40 +0000 (16:24 -0500)]
vm_fault: Fix a race in vm_fault_soft_fast()
When vm_fault_soft_fast() creates a mapping, it release the VM map lock
before unbusying the top-level object. Without the map lock, however,
nothing prevents the VM object from being deallocated while still busy.
Fix the problem by unbusying the object before releasing the VM map
lock. If vm_fault_soft_fast() fails to create a mapping, the VM map
lock is not released, so those cases don't need to change.
Reported by: syzkaller
Reviewed by: kib (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D38527
* Make nhop_set_blackhole() set all necessary properties for the
nexthop
* Make nexthops blackhole/reject based on the rtm_type netlink
property instead of using rtflags.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
Joerg Wunsch [Sun, 12 Feb 2023 21:26:52 +0000 (22:26 +0100)]
MFC: ARM release build: enable IPv6 SLAAC by default
When building ARM release images, enable IPv6 SLAAC by default in
addition to IPv4 DHCP.
Unlike amd64 (and other desktop/server) releases, ARM releases on SoC
setups are usually deployed by just using the installation image, so
there is no interactive network configuration. Not having IPv6
included by default is kind of an anachronism these days, given that
FreeBSD with the KAME project once pioneered IPv6 technology.
Dmitry Chagin [Tue, 14 Feb 2023 14:46:32 +0000 (17:46 +0300)]
linux(4): Move use_real_names knob to the linux.c
MI linux.[c|h] are the module independent in terms of the Linux emulation
layer (ie, intended for both ISA - 32 & 64 bit), analogue of MD linux.h.
There must be a code here that cannot be placed into the corresponding by
common sense MI source and header files, i.e., code is machine independent,
but ISA dependent.
For the use_real_names knob, the code must be placed into the
linux_socket.[c|h], however linux_socket is ISA dependent.
netlink: fix IPv6 route addition with link-local gateway
Currently kernel assumes that IPv6 gateway address is in "embedded"
form - that is, for the link-local IPv6 addresses, interface index
is embedded in bytes 2 and 3 of the address.
Fix address embedding in netlink by wrapping nhop_set_gw() in the
netlink-specific nl_set_nexthop_gw(), which does such embedding
automatically.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
testing: handling non-root users with VNETs in pytest-based tests.
Currently isolation and resource requirements are handled directly
by the kyua runner, based on the requirements specified by the test.
It works well for simple tests, but may cause discrepancy with tests
doing complex pre-setups. For example, all tests that perform
VNET setups require root access to properly function.
This change adds additional handling of the "require_user" property
within the python testing framework. Specifically, it requests
root access if the test class signals its root requirements and
drops privileges to the desired user after performing the pre-setup.
Tom Hukins [Fri, 24 Feb 2023 10:25:35 +0000 (10:25 +0000)]
netlink: Fix "version introduced" documentation.
netlink(4) and associated features will exist in FreeBSD 14.0 but they
will also exist in 13.2, an older version, from commits such as 02b958b
and b309249.
Mark Johnston [Thu, 9 Feb 2023 20:52:35 +0000 (15:52 -0500)]
vmm: Fix AP startup compatibility for old bhyve executables
These changes unbreak AP startup when using a 13.1-RELEASE bhyve
executable with a newer kernel:
- Correct the destination mask for the VM_EXITCODE_IPI message generated
by an INIT or STARTUP IPI in vlapic_icrlo_write_handler().
- Only initialize vlapics on active vCPUs. 13.1-RELEASE bhyve activates
AP vCPUs only after the BSP starts them with an IPI, and vmm now
allocates vcpu structures lazily, so the STARTUP handling in
vm_handle_ipi() could trigger a page fault.
- Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP
message.
Fixes: 7c326ab5bb9a ("vmm: don't lock a mtx in the icr_low write handler")
Reviewed by: jhb, corvink
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38446
Jessica Clarke [Thu, 30 Jun 2022 20:03:26 +0000 (21:03 +0100)]
.github: Attempt to fix and increase robustness of macOS action
Homebrew has added LLVM 14 and made that the default version, but GitHub
continues to install LLVM 13 for now, so it ends up only accessible via
the versioned name and not the unversioned one. We also add an explicit
installation of llvm@13 so that, if GitHub updates the image to using
LLVM 14, the action continues to work, albeit slightly more slowly. This
also ensures the compiler label remains correct rather than outdated, as
has occurred in the past, and that we don't get new versions of LLVM
before we're ready for them, which is especially relevant for stable
branches. This all mirrors how the Ubuntu jobs are configured.
Rick Macklem [Wed, 8 Feb 2023 22:25:01 +0000 (14:25 -0800)]
nfscl: Fix interaction between mmap'd and VOP_WRITE file updates
asomers@ found a problem with the NFS client, where a write to
an NFS mounted file done via mmap(2) was lost when fspacectl(2)
was done before it. This turned out to be caused by clearing the
dirty bit on pages when the client was doing commit RPCs,
due to the second argument to vfs_busy_pages() being set to 1.
Commit RPCs tell the server to commit previously written data to
stable storage. However, Commit RPCs do not write data from the
client to the server. As such, if the dirty bit on the page has
been set by a mmap'd write to an address in the page, it should
not be cleared. Clearing it causes the mmap'd write to by lost.
This patch fixes the problem by changing the 2nd argument to
vfs_busy_pages() to 0 for this case.
I doubt this bug has affected many, since it was inherited from
the old NFS client and was in 4.3 FreeBSD twenty years ago.
Although fspacectl(2) is FreeBSD 14 specific, a write(2) would
cause the same failure.
- Account for a filter required to enable reception of untagged frames
while registering and unregistering VLANs to avoid trying to add more
filters than HW supports
- While adding MAC/VLAN filters, pre-set matching method field in the
Admin Queue Command response buffer to expected error value to work
around an issue with some FW versions, which do not update that field if
operation fails, and be able correctly track which filters were
configured in HW.
- Remove unused IXL_MAX_FILTERS macro definition
- Update number of available MAC/VLAN filters as in newer FW versions it
was decreased by one.
Extend SFP+ cage crosstalk fix by re-checking link state after 5ms delay
to filter out spurious link up indication by transceiver with no fiber
cable connected.
Piotr Kubaj [Thu, 16 Feb 2023 23:49:43 +0000 (00:49 +0100)]
llvm: make sure to use ELFv2 ABI on powerpc64
Currently LLVM is more or less set up to use ELFv2, but it still defaults to
ELFv1 in some places. This causes lld to generate broken binaries when used
with LTO.
Kyle Evans [Thu, 28 Oct 2021 04:40:08 +0000 (23:40 -0500)]
kern: physmem: improve region coalescing logic
The existing logic didn't take into account newly inserted mappings
wholly contained by an existing region (or vice versa), nor did it
account for weird overlap scenarios. The latter is probably unlikely
to happen, but the former may happen in UEFI: BootServicesData allocated
within a large chunk of ConventionalMemory. This situation blows up vm
initialization.
While we're here, remove the "exact match" logic as it's likely wrong;
if an exact match exists with conflicting flags, for instance, then we
should probably be doing something else. The new logic takes into
account exact matches as part of the overlapping efforts.
Mark Johnston [Fri, 3 Feb 2023 15:54:23 +0000 (10:54 -0500)]
pvclock: Export a vDSO page even without rdtscp available
When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does. This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.
To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.
Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38342
Mark Johnston [Fri, 3 Feb 2023 15:53:20 +0000 (10:53 -0500)]
libc: Fall back to rdtsc when using pvclock and rdtscp is not available
In preparation for a follow-up revision wherein kvmclock may export
timekeeping info to userspace even in the absence of AMDID_RDTSCP, fall
back to using rdtsc when rdtscp isn't available. This mimics
pvclock_read_time_info() in the kernel.
Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38341
Ed Maste [Mon, 20 Feb 2023 16:19:35 +0000 (09:19 -0700)]
lua: reduce diffs between luaconf.h copies
Upstream luaconf.h is contrib/lua/src/luaconf.h.dist, while userland lua
and loader lua have copies in lib/liblua/luaconf.h and
stand/liblua/luaconf.h.
Adjust whitespace, VCS tags, etc. to match upstream's version, for ease
of comparison.
Reviewed By: imp
Sponsored By: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38206
Warner Losh [Mon, 20 Feb 2023 16:09:07 +0000 (09:09 -0700)]
stand: Update mfc notes
Using some automation I found a few mistakes in my earlier list and also
a change merged without a cherry-pick -x (and evidentally changed as
well, since git log --cherry didn't know it had been merged). Fix them
so I can use this list with my experimental mfc script.
nd6_resolve_slow() can be called without mbuf. If the LLE entry
is not reachable, nd6_resolve_slow() will add this NULL mbuf to
the holdchain via lltable_append_entry_queue, which will "append"
NULL to the end of the queue (effectively no-op) and bump la_numhold
value. When this entry gets freed, the kernel will panic due to the
inconsistency between the amount of mbufs in the queue and the value
of la_numhold.
Fix the panic by checking of mbuf is not NULL prior to inserting it
into the holdchain.
Arseny Smalyuk [Tue, 31 May 2022 20:04:51 +0000 (20:04 +0000)]
netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.
Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.
The current code missed interface addition when reallocating
temporary buffer.
Tweak the code to perform the reallocation first and add
interface afterwards unconditionally.
Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after: 3 days
netlink: return optional metadata with the operation result.
Some operations like interface creation may need to return metadata
- in this case, interface name - back to the caller if the operation
is successful.
This change implements attaching an `NLMSGERR_ATTR_COOKIE` nla to the
operation reply message via `nlmsg_report_cookie()`.
Additionally, on successful interface creation, interface index and
interface name are returned in the `IFLA_NEW_IFINDEX` and `IFLA_IFNAME
TLVs, encapsulated in the `NLMSGERR_ATTR_COOKIE`.
Xin LI [Mon, 13 Feb 2023 04:56:17 +0000 (20:56 -0800)]
cleanvar: Be more careful when cleaning up /var.
The cleanvar script uses find -delete to remove stale files under /var,
which could lead to unwanted removal of files in some unusual scenarios.
For example, when a mounted fdescfs(5) is present under /var/run/samba/fd,
find(1) could descend into a directory that is out of /var/run and remove
files that should not be removed.
To mitigate this, modify the script to use find -x, which restricts the
find scope to one file system only instead of descending into mounted
file systems.