Mark Johnston [Mon, 20 Mar 2023 18:16:00 +0000 (14:16 -0400)]
kerneldump: Inline dump_savectx() into its callers
The callers of dump_savectx() (i.e., doadump() and livedump_start())
subsequently call dumpsys()/minidumpsys(), which dump the calling
thread's stack when writing the dump. If dump_savectx() gets its own
stack frame, that frame might be clobbered when its caller later calls
dumpsys()/minidumpsys(), making it difficult for debuggers to unwind the
stack.
Fix this by making dump_savectx() a macro, so that savectx() is always
called directly by the function which subsequently calls
dumpsys()/minidumpsys().
This fixes stack unwinding for the panicking thread from arm64
minidumps. The same happened to work on amd64, but kgdb reports the
dump_savectx() calls as coming from dumpsys(), so in that case it
appears to work by accident.
Fixes: c9114f9f86f9 ("Add new vnode dumper to support live minidumps")
Reviewed by: mhorne, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39151
Daniel Kolesa [Mon, 20 Mar 2023 16:42:59 +0000 (17:42 +0100)]
sh(1): fix history file write checking
We cannot just compare histsizeval() against 0, since that returns
a string pointer, which is always non-zero (non-null). The logic
in sethistsize() initializes the history size to 100 with values
that are non-number, and an empty string counts as that. Therefore,
the only time we want to not write into history with HISTSIZE val
set is when it's explicitly 0.
Kristof Provost [Mon, 20 Mar 2023 13:29:55 +0000 (14:29 +0100)]
pfsync: fix pfsync_undefer_state() locking
pfsync_undefer_state() takes the bucket lock, but could get called from
places (e.g. from pfsync_update_state() or pfsync_delete_state()) where
we already held the lock.
As it can also be called from places where we don't yet hold the lock
create new locked variant for use when the lock is already held. Keep
using pfsync_undefer_state() where the lock must still be taken.
Kristof Provost [Mon, 20 Mar 2023 13:26:33 +0000 (14:26 +0100)]
pfsync: add missing unlock in pfsync_defer_tmo()
The callout for pfsync_defer_tmo() is created with
CALLOUT_RETURNUNLOCKED, because while the callout framework takes care
of taking the lock we want to run a few operations outside of the lock,
so we unlock ourselves.
However, if `sc->sc_sync_if == NULL` we return without releasing the
lock, and leak the lock, causing later deadlocks.
Ensure we always release the bucket lock when we exit pfsync_defer_tmo()
Kristof Provost [Wed, 15 Mar 2023 12:31:45 +0000 (13:31 +0100)]
carp: support unicast
Allow users to configure the address to send carp messages to. This
allows carp to be used in unicast mode, which is useful in certain
virtual configurations (e.g. AWS, VMWare ESXi, ...)
Kristof Provost [Thu, 16 Mar 2023 10:16:24 +0000 (11:16 +0100)]
carp tests: test manually switch between backup and master
There's been at least one issue where we failed to correctly enter
NET_EPOCH that was triggered in this scenario.
Add a test case for it to make it easier to detect issues like this in
the future.
Jose Luis Duran [Thu, 9 Feb 2023 15:47:53 +0000 (12:47 -0300)]
ping: Print the IP options of the original packet
When an ICMP packet contains an IP packet in its payload, and that
original IP packet contains options, these options were not displayed
accordingly in pr_iph().
pr_iph() is a function that prints the original "quoted packet" IP
header, with only an IP struct as an argument. The IP struct does not
contain IP options, and it is not guaranteed that the options will be
contiguous in memory to the IP struct after d9cacf605e2ac0f704e1ce76357cbfbe6cb63d52.
Pass the raw ICMP data along with the IP struct, in order to print the
options, if any.
Kirk McKusick [Sat, 18 Mar 2023 22:36:54 +0000 (15:36 -0700)]
Do not panic in case of corrupted UFS/FFS directory.
Historically the system panic'ed when it encountered a corrupt
directory. This change recovers well enough to continue operations.
This change is made in response to a similar change made in the ext2
filesystem as described in the cited Differential Revision.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38503
amd64: properly recalculate mitigations knobs after resume
Revision r333125 AKA 986c4ca38772f72 forced clear cpu_stdext_feature3
on suspend, since at that time microcode update was not reloaded
early on resume. Then, revision 050f5a8405c63 started re-reading
cpu_stdext_feature3 again. Since modern CPUs do not require mitigations
from the Skylake era, this went unnoticed for some time.
Keep zeroing cpu_stdext_feature3 on suspend, but re-read it in more
controlled way on resume after microcode is reloaded, and recalculate
active workarounds based on actual microcode capabilities.
Reported and tested by: romain
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39146
Wei Hu [Tue, 14 Mar 2023 15:49:33 +0000 (15:49 +0000)]
uart: Don't change settings or throttle putc for Hyper-V
Azure setup does not like it when FreeBSD overrides the settings of the
UART device. When Hyper-V is detected, don't do this and also don't
throttle putc() output. This is a workaround for the early boot hang
of FreeBSD on Azure.
Tested on Azure, ESXi (VM with serial port), and SG-8200
Wei Hu [Tue, 14 Mar 2023 15:13:46 +0000 (15:13 +0000)]
amd64 loader: Use efiserialio for Hyper-V booted systems
UEFI provides ConIn/ConOut handles for consoles that it supports,
which include the text-video and serial ports. When the serial port
is available, use the UEFI driver instead of direct io-port accesses
to avoid conflicts between the firmware and direct hardware access, as
happens on Hyper-V (Azure) setups.
This change enables efiserialio to be built for efi-amd64 and has
higher order priority vs comconsole, and only uses efiserialio
if the hypervisor is Hyper-V. When efiserialio successfully
probes, it will set efi_comconsole_avail=true which will prevent
comconsole from probing in this setup.
Fedor Uporov [Thu, 9 Feb 2023 09:34:25 +0000 (12:34 +0300)]
Add root directory entry check.
Add check that directory entry with ino=EXT2_ROOTINO
have correct namelength and name. It is possible to
create malicious image which will cause panic if root
directory entry have incorrect name.
Zhenlei Huang [Fri, 17 Mar 2023 17:20:58 +0000 (01:20 +0800)]
uhci(4): Correct PCI device ID for Zhaoxin USB controller
And minor style fixes.
Tested by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes: 986c7be472bd uhci(4): Add new USB IDs
Differential Revision: https://reviews.freebsd.org/D38924
Zhenlei Huang [Fri, 17 Mar 2023 17:24:46 +0000 (01:24 +0800)]
ehci(4): Correct PCI device ID for Zhaoxin USB 2.0 controller
And minor style fixes.
Tested by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Fixes: f9237e1937a4 ehci(4): Add new USB IDs
Differential Revision: https://reviews.freebsd.org/D38923
Vitaliy Gusev [Fri, 17 Mar 2023 09:17:22 +0000 (10:17 +0100)]
vmm: fix missing ipi statistic
ipi counters are missing in bhyvectl's output because vm_maxcpu is 0
when initializing them. That's because vmm_stat_register is executed
before vmm_init.
Instead of directly fixing it, there's a better solution in illumos
which is cherry picked:
https://github.com/illumos/illumos-gate/commit/65a3bc83734e5fb0fc2c19df3e5112b87dcdc3f8
It replaces the matrix statistic by two counters per vcpu. One for
counting the ipis to the vcpu and one counting the ipis received by the
vcpu. This has several advantages:
- A matrix statistic becomes huge when using many vcpus.
- A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit.
- Two counters are enough in most cases. DTrace can be used for more
advanced debugging purposes.
- A matrix statistic wastes memory. The matrix size is determined by
vm_maxcpu regardless of the number of vcpus assigned to the vm.
Emmanuel Vadot [Wed, 15 Mar 2023 09:29:27 +0000 (10:29 +0100)]
arm: Remove SOCFPGA specific kernel configs
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.
Emmanuel Vadot [Thu, 16 Mar 2023 09:48:06 +0000 (10:48 +0100)]
arm: Rename hdmi_if.m to crtc_if.m
There is nothing hdmi related in this interface, it's just a generic interface
for crt controller so rename it.
This also remove the 'hdmi' device used in arm kernel config. 'vt' now controls
if we build this interface (sc(4) isn't supported on arm).
Sponsored by: Beckhoff Automation GmbH & Co. KG
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D39120
Emmanuel Vadot [Wed, 15 Mar 2023 09:39:02 +0000 (10:39 +0100)]
arm: Remove IMX5 specific kernel configs
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.
Sponsored by: Beckhoff Automation GmbH & Co. KG
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D39089
Emmanuel Vadot [Wed, 15 Mar 2023 09:26:24 +0000 (10:26 +0100)]
arm: Remove VYBRID specific kernel config
We had GENERIC for a while now so anyone still interested in those boards
should make sure that we can boot on them with it and with upstream DTS files.
Sponsored by: Beckhoff Automation GmbH & Co. KG
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D39087
Emmanuel Vadot [Wed, 15 Mar 2023 09:17:24 +0000 (10:17 +0100)]
arm: Remove kernel config APALIS-IMX6
It reference to a non-existant dts file apalis-imx6.dts so unlikekly to compile.
Aldo IMX6 support is in GENERIC so anyone interested in this board should
make it work with GENERIC kernel (if that's not already the case).
Sponsored by: Beckhoff Automation GmbH & Co. KG
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D39086
Corvin Köhne [Wed, 11 Aug 2021 08:04:36 +0000 (10:04 +0200)]
bhyve: add helper for adding fwcfg files
Fwcfg items without a fixed index are reported by the file_dir. They
have an index of 0x20 and above. This helper simplifies the addition of
such fwcfg items. It selects a new free index, assigns it to the fwcfg
items and creates an proper entry in the file_dir.
cpuid detection may have picked up a more specific guest type already,
and a follow-up check of smbios vendor/product may erroneously blow
away the previously detected type.
This reportedly fixes the boot under Hyper-V, which advertises an
smbios.system.product of "Virtual Machine."
PR: 270239
Reviewed by: imp, kib (both earlier version, same concept)
Fixes: 2fee87562948 ("abstract out the vm detection via smbios..")
Differential Revision: https://reviews.freebsd.org/D39140
Rick Macklem [Thu, 16 Mar 2023 22:55:36 +0000 (15:55 -0700)]
nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts
Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.
This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.
It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.
This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.
Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.
Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.
A mount_nfs(8) man page update will be done in a separate commit.
Discussed on: freebsd-current@
MFC after: 3 months
Andrew Turner [Thu, 16 Mar 2023 15:36:06 +0000 (15:36 +0000)]
Switch the arm64 VM_MEMATTR_DEVICE to nGnRE
Move device memory to a weaker type. The new device memory type allows
the system to acknowledge a write to a device before the write has
completed. This is inline with VM_MEMATTR_DEVICE on armv6/armv7.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D38945
Andrew Turner [Thu, 16 Mar 2023 15:35:59 +0000 (15:35 +0000)]
Allow forcing non-posted memory on arm64
To allow for debugging after changing the arm64 VM_MEMATTR_DEVICE
memory type add a new set of tunables to tell the kernel to use
non-posted memory.
This adds the following tunables:
- kern.force_nonposted: When set to non-zero the kernel will use
non-posted memory for all device allocations.
- hint.<dev>.<unit>.force_nonposted: As above, however only forces
non-posted memory on the named device.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D38944
Randall Stewart [Thu, 16 Mar 2023 15:43:16 +0000 (11:43 -0400)]
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.
Ed Maste [Thu, 16 Mar 2023 14:29:55 +0000 (10:29 -0400)]
ssh: Update to OpenSSH 9.3p1
This release fixes a number of security bugs and has minor new
features and bug fixes. Security fixes, from the release notes
(https://www.openssh.com/txt/release-9.3):
This release contains fixes for a security problem and a memory
safety problem. The memory safety problem is not believed to be
exploitable, but we report most network-reachable memory faults as
security bugs.
* ssh-add(1): when adding smartcard keys to ssh-agent(1) with the
per-hop destination constraints (ssh-add -h ...) added in OpenSSH
8.9, a logic error prevented the constraints from being
communicated to the agent. This resulted in the keys being added
without constraints. The common cases of non-smartcard keys and
keys without destination constraints are unaffected. This problem
was reported by Luci Stanescu.
* ssh(1): Portable OpenSSH provides an implementation of the
getrrsetbyname(3) function if the standard library does not
provide it, for use by the VerifyHostKeyDNS feature. A
specifically crafted DNS response could cause this function to
perform an out-of-bounds read of adjacent stack data, but this
condition does not appear to be exploitable beyond denial-of-
service to the ssh(1) client.
The getrrsetbyname(3) replacement is only included if the system's
standard library lacks this function and portable OpenSSH was not
compiled with the ldns library (--with-ldns). getrrsetbyname(3) is
only invoked if using VerifyHostKeyDNS to fetch SSHFP records. This
problem was found by the Coverity static analyzer.
Kristof Provost [Sun, 12 Mar 2023 15:08:31 +0000 (16:08 +0100)]
pf tests: test IPv6 fragmentation with link-local addresses
We've observed a panic after pf_refragment6() with link-local addresses,
because pf_refragment6() calls ip6_forward() even for a simple output
case.
That results in us entering ip6_forward() with an mbuf with a NULL
m->m_pkthdr.rcvif, which can cause a NULL deref (but seemingly not for
GUAs.
Kristof Provost [Mon, 13 Mar 2023 09:27:59 +0000 (10:27 +0100)]
pf: set scope in pf_refragment6()
Link-local traffic needs to have a scope embedded before it's passed on
to ip6_output(). Do so in pf_refragment6(), because when we end up here
in the output path we may have passed through ip6_output() already
(before being reassembled), where the scope would have been removed.
Re-embed the scope so that link-local traffic is sent correctly.
Kristof Provost [Sun, 12 Mar 2023 17:34:42 +0000 (18:34 +0100)]
pf: distinguish forwarding and output cases for pf_refragment6()
Re-introduce PFIL_FWD, because pf's pf_refragment6() needs to know if
we're ip6_forward()-ing or ip6_output()-ing.
ip6_forward() relies on m->m_pkthdr.rcvif, at least for link-local
traffic (for in6_get_unicast_scopeid()). rcvif is not set for locally
generated traffic (e.g. from icmp6_reflect()), so we need to call the
correct output function.
Michael Tuexen [Thu, 16 Mar 2023 09:45:13 +0000 (10:45 +0100)]
sctp: don't do RTT measurements with cookies
When receiving a cookie, the receiver does not know whether the
peer retransmitted the COOKIE-ECHO chunk or not. Therefore, don't
do an RTT measurement. It might be much too long.
To overcome this limitation, one could do at least two things:
1. Bundle the INIT-ACK chunk with a HEARTBEAT chunk for doing the
RTT measurement. But this is not allowed.
2. Add a flag to the COOKIE-ECHO chunk, which indicates that it
is the initial transmission, and not a retransmission. But
this requires an RFC.