]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
17 months agoinpcb: better document INP_ANONPORT flag
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:36 +0000 (11:33 -0800)]
inpcb: better document INP_ANONPORT flag

The name is pretty self explaining, but it is unclear why we need this
flag, as kernel only sets it and never reads.

17 months agonetinet: don't return conflicting inpcb in in_pcbconnect_setup()
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:36 +0000 (11:33 -0800)]
netinet: don't return conflicting inpcb in in_pcbconnect_setup()

Last time this inpcb was actually used was in tcp_connect()
before c94c54e4df9a.

17 months agotcp: bring comment for tcp_connect() up to date
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:36 +0000 (11:33 -0800)]
tcp: bring comment for tcp_connect() up to date

We no longer use in_pcbbind() since 25102351509.  The comment about
truncating old TIME-WAIT describes a code that had been removed back
in 2004 in c94c54e4df9a.

17 months agoinpcb: use family specific sockaddr argument for connect functions
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:36 +0000 (11:33 -0800)]
inpcb: use family specific sockaddr argument for connect functions

Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D38356

17 months agonetinet6: require network epoch for in6_pcbconnect()
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:36 +0000 (11:33 -0800)]
netinet6: require network epoch for in6_pcbconnect()

This removes recursive epoch entry in the syncache case.  Fixes
unprotected access to V_in6_ifaddrhead in in6_pcbladdr(), as
well as access to prison IP address lists. It also matches what
IPv4 in_pcbconnect() does.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D38355

17 months agoinpcb: merge two versions of in6_pcbconnect() into one
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:35 +0000 (11:33 -0800)]
inpcb: merge two versions of in6_pcbconnect() into one

No functional change.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D38354

17 months agotcp: retire net.inet.tcp.tcp_require_unique_port
Gleb Smirnoff [Fri, 3 Feb 2023 19:33:35 +0000 (11:33 -0800)]
tcp: retire net.inet.tcp.tcp_require_unique_port

It was a safe belt just in case if the new port allocation
behaviour introduced in 25102351509 would cause a problem.

Reviewed by: markj, rscheff, tuexen
Differential revision: https://reviews.freebsd.org/D38353

17 months agopcb: Move an assignment into in_pcbdisconnect()
Mark Johnston [Fri, 3 Feb 2023 15:57:37 +0000 (10:57 -0500)]
pcb: Move an assignment into in_pcbdisconnect()

All callers of in_pcbdisconnect() clear the local address, so let's just
do that in the function itself.

Note that the inp's local address is not a parameter to the inp hash
functions.  No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38362

17 months agoinpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()
Mark Johnston [Fri, 3 Feb 2023 15:57:19 +0000 (10:57 -0500)]
inpcb: Assert against wildcard addrs in in_pcblookup_hash_locked()

No functional change intended.

Reviewed by: glebius
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38361

17 months agoinpcb: Deduplicate some assertions
Mark Johnston [Fri, 3 Feb 2023 15:56:26 +0000 (10:56 -0500)]
inpcb: Deduplicate some assertions

It makes more sense to check lookupflags in the function which actually
uses SMR.  No functional change intended.

Reviewed by: glebius
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38359

17 months agoshm: Document shm_create_largepage()
Mark Johnston [Fri, 3 Feb 2023 15:55:30 +0000 (10:55 -0500)]
shm: Document shm_create_largepage()

While here, move notes about FreeBSD-specific functionality to the
COMPATIBILITY section, and document the ECAPMODE error for shm_open().

Reviewed by: pauamma, kib
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38282

17 months agoman4: Add a manual page for kvmclock
Mark Johnston [Fri, 3 Feb 2023 15:54:50 +0000 (10:54 -0500)]
man4: Add a manual page for kvmclock

Reviewed by: pauamma, imp, kib
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38343

17 months agopvclock: Export a vDSO page even without rdtscp available
Mark Johnston [Fri, 3 Feb 2023 15:54:23 +0000 (10:54 -0500)]
pvclock: Export a vDSO page even without rdtscp available

When the cycle counter is "stable", i.e., synchronized across vCPUs by
the hypervisor, userspace can use a serialized rdtsc instead of relying
on rdtscp, just like the kernel timecounter does.  This can be useful
for performance in guests where the hypervisor hides rdtscp for some
reason.

To avoid breaking compatibility with older userspace which expects
rdtscp to be usable when pvclock exports timekeeping info, hide this
feature behind a sysctl.

Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38342

17 months agolibc: Fall back to rdtsc when using pvclock and rdtscp is not available
Mark Johnston [Fri, 3 Feb 2023 15:53:20 +0000 (10:53 -0500)]
libc: Fall back to rdtsc when using pvclock and rdtscp is not available

In preparation for a follow-up revision wherein kvmclock may export
timekeeping info to userspace even in the absence of AMDID_RDTSCP, fall
back to using rdtsc when rdtscp isn't available.  This mimics
pvclock_read_time_info() in the kernel.

Reviewed by: kib
Tested by: Shrikanth R Kamath <kshrikanth@juniper.net>
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38341

17 months agolinux(4): Microoptimize linux_ipc code to unindent else blocks.
Dmitry Chagin [Fri, 3 Feb 2023 16:17:34 +0000 (19:17 +0300)]
linux(4): Microoptimize linux_ipc code to unindent else blocks.

No functional change.

MFC after: 1 week

17 months agolinux(4): Use designated initializers.
Dmitry Chagin [Fri, 3 Feb 2023 16:17:15 +0000 (19:17 +0300)]
linux(4): Use designated initializers.

MFC after: 1 week

17 months agotest: Add fstab to all ufs images
Warner Losh [Fri, 3 Feb 2023 15:41:18 +0000 (08:41 -0700)]
test: Add fstab to all ufs images

Ensure that we populate /etc/fstab for all the ufs images.  Tweak sizes
while I'm at it.

Note: This file could use a good refactoring... or maybe a rewrite in
python or lua.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38317

17 months agokboot: Keep track of what's used in the segment
Warner Losh [Fri, 3 Feb 2023 15:41:11 +0000 (08:41 -0700)]
kboot: Keep track of what's used in the segment

Keep track of how much is used in the segment as we allocate it to the
application. Set memsz to 0 first, and increment it as used. Adjust the
bufsz before we call kexec so the kernel copies the right amount (it's
an error for bufsz to be bigger than memsz, so we set them == when we
retrieve the segment). Make sure we round to the page size, otherwise
kexec_load gets cranky.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38315

17 months agokboot: Allocate a really big first segment
Warner Losh [Fri, 3 Feb 2023 15:41:03 +0000 (08:41 -0700)]
kboot: Allocate a really big first segment

Allocate a huge segment for the first kexec_load segments. We limit the
lessor of:
allocation to the size of the remaining memory segment
45% of available memory
95% of the memory we can allocate

This allows us to have really large RAM disks. We likely need to limit
this to the amount we actually used, though, since this can be a lot of
memory.

We have to do this complicated calculation for a few reasons: First, we
need 2 copies of the loaded kernel in the memory: The kernel can copy
everything to a temporary buffer. Next, malloc (via mmap) is limited to
a certain amount due to over commit, so we have to not allocate all we
can (only most of what we can).

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38314

17 months agokboot: Remove externs
Warner Losh [Fri, 3 Feb 2023 15:40:56 +0000 (08:40 -0700)]
kboot: Remove externs

kboot_get_phys_load_segment is defined in kboot.h, so remove them from
the .c files.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38310

17 months agokboot: Try to read UEFI memory from physical memory on aarch64
Warner Losh [Fri, 3 Feb 2023 15:40:45 +0000 (08:40 -0700)]
kboot: Try to read UEFI memory from physical memory on aarch64

Try to open /dev/mem to read in the UEFI memory map. If we can't, then
we'll read it in the trampoline.

Retain reading in /proc/iomem to find reserved areas in Linux. We need
to know them for good places to put the kernel. These are not reflected
in the UEFI memory map. However, we should not adjust the UEFI memory
map since these reserved areas of the Linux kernel are free to be used
once we enter the kexec trampoline...

Sponsored by: Netflix
Reviewed by: tsoome, kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38264

17 months agokboot: Enable for aarch64
Warner Losh [Fri, 3 Feb 2023 15:40:38 +0000 (08:40 -0700)]
kboot: Enable for aarch64

Enable building loader.kboot for aarch64/arm64.

Sponsored by: Netflix
Reviewed by: tsoome, kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38262

17 months agokboot: Don't need an arch pointer to get segments
Warner Losh [Fri, 3 Feb 2023 15:40:30 +0000 (08:40 -0700)]
kboot: Don't need an arch pointer to get segments

There's no need for an arch pointer to get segments. We can call the
routine directly since we don't need this code to be called from
different context where a pointer is needed.

Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38266

17 months agokboot: MI fixups to enable aarch64 booting
Warner Losh [Fri, 3 Feb 2023 15:40:22 +0000 (08:40 -0700)]
kboot: MI fixups to enable aarch64 booting

A number of bug fixes to loading kernels and modules on aarch64 and amd64.
Fix offset calcuations.
Add a number of debugs, commented out for now (will GC them in the future)

With this, and the MD aarch64 commands, we can linux boot in qemu and on
real hardware.

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38261

17 months agokboot: Improve amd64 booting
Warner Losh [Fri, 3 Feb 2023 15:40:13 +0000 (08:40 -0700)]
kboot: Improve amd64 booting

Copy more of the necessary state for FreeBSD to boot:
o Copy EFI memory tables
o Create custom page tables needed for the kernel to find itself
o Simplify the passing of args to the trampoline by putting them
  on the stack rather than in dedicated memory.

This is only partially successful... we get only part way through the
amd64 startup code before dying. However, it's much further than before
the changes.

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38259

17 months agokboot: aarch64 trampoline implementation
Warner Losh [Fri, 3 Feb 2023 15:40:04 +0000 (08:40 -0700)]
kboot: aarch64 trampoline implementation

Update exec.c (copyied from efi/loader/arch/arm64/exec.c) to allow
execution of aarch64 kernels. This includes a new trampoline code that
handles copying the UEFI memory map, if available from the Linux FDT
provided PA. This is a complete implementation now, able to boot from
the LinuxBoot environment on an aarch64 server that only offers
LinuxBoot (though a workaround for the gicv3 inability to re-init is not
yet in FreeBSD). Many 'fit and finish' issues will be addressed in
subsequent commits.

Sponsored by: Netflix
Reviewed by: tsoome, kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38258

17 months agostand: share bootinfo.c between EFI and KBOOT
Warner Losh [Fri, 3 Feb 2023 15:39:55 +0000 (08:39 -0700)]
stand: share bootinfo.c between EFI and KBOOT

Connect efi's bootinfo.c to the kboot build, and adjust to use
the kboot specific routines.

The getrootmount() call is independent of EFI. Remove ifdefs so it's
called for kboot too.

The differences between the kboot and efi bootinfo.c files are now tiny.
This could use some more refactoring, but this is a working checkpoint.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38350

17 months agokboot: aarch64 bi_loadsmap
Warner Losh [Fri, 3 Feb 2023 15:39:46 +0000 (08:39 -0700)]
kboot: aarch64 bi_loadsmap

Since aarch64 is different, it needs a different smap. We first see if
we have the PA of the table from the FDT info. If so, we copy that and
quit. Otherwise, we do the best we can in translating the /proc/iomap
into EFI Memory Table format.

We also send the system table to the kernel.

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38255

17 months agokboot: bi_loadsmap for amd64
Warner Losh [Fri, 3 Feb 2023 15:39:39 +0000 (08:39 -0700)]
kboot: bi_loadsmap for amd64

Copy the EFI memory tables we were able to get into the MODINFOMD_SMAP
metadata area for the kernel.

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38254

17 months agokboot: Powerpc provide bi_loadsmap
Warner Losh [Fri, 3 Feb 2023 15:39:31 +0000 (08:39 -0700)]
kboot: Powerpc provide bi_loadsmap

It's just a stub, since the kernel learns of memory via FDT.

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38253

17 months agokboot: Define bi_loadsmap for loading memory maps
Warner Losh [Fri, 3 Feb 2023 15:39:24 +0000 (08:39 -0700)]
kboot: Define bi_loadsmap for loading memory maps

Each architecture will soon be required to provide this to load memory
maps as metadata for the platforms that require it (or a stub function
for those that don't).

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38252

17 months agokboot: Call enumerate_memory_arch()
Warner Losh [Fri, 3 Feb 2023 15:39:16 +0000 (08:39 -0700)]
kboot: Call enumerate_memory_arch()

Now that all architectures provide this, enumerate the platform's memory
before we go to interact(). This needs to be done only once, but relies
on our ability to open host: files on some platforms, so it needs to be
done after devinit().

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38251

17 months agokboot: Update amd64 to use enumerate_memory_arch()
Warner Losh [Fri, 3 Feb 2023 15:39:06 +0000 (08:39 -0700)]
kboot: Update amd64 to use enumerate_memory_arch()

Move memory enumeration to the enumerate_memory_arch(), tweak the code a
bit to make that fit into that framework.

Also fix a bug in the name of the end location. The old code never found
memory (though amd64 doesn't yet work, this lead to using fallback
addresses that were good enough for QEMU...).

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38250

17 months agokboot: aarch64 memory enumeration enumerate_memory_arch()
Warner Losh [Fri, 3 Feb 2023 15:38:36 +0000 (08:38 -0700)]
kboot: aarch64 memory enumeration enumerate_memory_arch()

We have an odd situation with aarch64 memory enumeration. The fdt that
we can get has a PA of the UEFI memory map, as modified by the current
running Linux kernel so it can retain those pages it needs for EFI and
other services. We have to pass in this EFI tablem but don't have access
to it in the boot loader. We do in the trampoline code, so a forthcoming
commit will copy it there for the kernel to use. All for want of /dev/mem
in the target environment sometimes.

However, we also have to find a place to load the kernel, so we have to
fallback to /proc/iomem when we can't read the UEFI memory map directly
from /dev/mem. It will give us good enough results to do this task. This
table isn't quite suitable to be converted to the EFI table, so we use
both methods. We'll fall back to this method also if there's no EFI
table advertised in the fdt. There's no /sys file on aarch64 that has
this information, hence using the old-style /proc/iomem. We're unlikely
to work if there's no EFI, though.

Note: The underlying Linux mechanism is different than the amd64 method
which seems like it should be MI, but unimplemented on aarch64.

Sponsored by: Netflix
Discussed with: kevans
Differential Revision: https://reviews.freebsd.org/D38249

17 months agokboot: Add powerpc stub for enumerate_memory_arch()
Warner Losh [Fri, 3 Feb 2023 15:38:29 +0000 (08:38 -0700)]
kboot: Add powerpc stub for enumerate_memory_arch()

Add stub for new MI interface for enumerating memory. Right now powerpc
looks in the FDT table at a later point in boot since we don't need to
pass a specific memory table to the kernel. Leave it like that for now,
but note plans for the future.

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38248

17 months agokboot: space_avail -- how much space exists from 'start' to end of segment
Warner Losh [Fri, 3 Feb 2023 15:38:22 +0000 (08:38 -0700)]
kboot: space_avail -- how much space exists from 'start' to end of segment

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38313

17 months agokboot: Add parsing of /proc/iomem into seg.c
Warner Losh [Fri, 3 Feb 2023 15:38:14 +0000 (08:38 -0700)]
kboot: Add parsing of /proc/iomem into seg.c

We'll be using this code for most / all of the platforms since iomem is
the only interface that can tell us of the reserved to the linux kernel
areas that we cannot place the new kernel into, but that we are free to
use once we hit trampoline. aarch64 will use this shortly, and similar
code in amd64 will be refactored when I make that platform work.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38309

17 months agokboot: Create segment handling code at main level
Warner Losh [Fri, 3 Feb 2023 15:37:53 +0000 (08:37 -0700)]
kboot: Create segment handling code at main level

Create segment handling code up to the top level. Move it all into
seg.c, and make necessary adjustments for it being in a new file,
including inventing print_avail() and first_avail() to print the array
and find the first large enough memory hole.  aarch64 will use this,
and I'll refactor the other platforms to use it as I make them work.

Sponsored by: Netflix
Discussed with: kevans
Differential Revision: https://reviews.freebsd.org/D38308

17 months agokboot: MI part of the memory enumeration code
Warner Losh [Fri, 3 Feb 2023 15:37:45 +0000 (08:37 -0700)]
kboot: MI part of the memory enumeration code

enumerate_memory_arch is called once early in kboot's startup to allow
us to discover the memory layout, reserved areas, etc of the system
memory. Add the MI interface part of this.

Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38247

17 months agokboot: Add aarch64 fdt fixup
Warner Losh [Fri, 3 Feb 2023 15:37:39 +0000 (08:37 -0700)]
kboot: Add aarch64 fdt fixup

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38256

17 months agokboot: Probe all disks and partitions for a kernel
Warner Losh [Fri, 3 Feb 2023 15:37:31 +0000 (08:37 -0700)]
kboot: Probe all disks and partitions for a kernel

Guess where to boot from when bootdev= isn't on the command line or
other config. Search all the disks and partitions for one that looks
like it could be a boot partition (same as we do when probing
zpools). Return the first one we find.

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38319

17 months agocp: Minor code cleanup.
Dag-Erling Smørgrav [Fri, 3 Feb 2023 15:37:24 +0000 (16:37 +0100)]
cp: Minor code cleanup.

* Fix includes in utils.c, cf. style(9).
* Fix type mismatch: readlink(2) returns ssize_t, not int.
* It is not necessary to set errno to 0 as fts_read(3) already does it.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D38369

17 months agoMechanically convert wg(4) to IfAPI
Justin Hibbits [Fri, 13 Jan 2023 16:22:11 +0000 (11:22 -0500)]
Mechanically convert wg(4) to IfAPI

Reviewed By: jhb
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38307

17 months agolinsysfs: Use IfAPI accessors
Justin Hibbits [Thu, 2 Feb 2023 21:43:56 +0000 (16:43 -0500)]
linsysfs: Use IfAPI accessors

Replace the only two ifnet member accesses with IfAPI accessor calls.

Sponsored by: Juniper Networks, Inc.

17 months agolinprocfs: Migrate to IfAPI
Justin Hibbits [Thu, 2 Feb 2023 21:48:22 +0000 (16:48 -0500)]
linprocfs: Migrate to IfAPI

Summary:
Migrate linprocfs to use the IfAPI interfaces instead of direct ifnet
accesses.

Reviewed by: dchagin
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D38358

17 months agoIfAPI: Add iterator to loop over all interfaces
Justin Hibbits [Wed, 1 Feb 2023 21:28:11 +0000 (16:28 -0500)]
IfAPI: Add iterator to loop over all interfaces

Summary:
Sometimes it's useful to iterate over all interfaces in the current
VNET, as the linuxulator does in several places.

Unlike other iterators in the IfAPI this propagates any error received
up to the caller, instead of returning a count.

Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38348

17 months agoefiserialio: use port settings (sio->Mode) for initial setup
Toomas Soome [Thu, 2 Feb 2023 14:01:02 +0000 (16:01 +0200)]
efiserialio: use port settings (sio->Mode) for initial setup

Use serial port setup done by system firmware.
ARM64 Hyper-V does hung if we attempt to override the defaults,
therefore we should default to use settings from firmware.

Tested by: schakrabarti@microsoft.com
PR: 266248
MFC after: 1 week

17 months agopf tests: improve pfsync:basic_defer test
Kristof Provost [Fri, 3 Feb 2023 03:10:32 +0000 (04:10 +0100)]
pf tests: improve pfsync:basic_defer test

Create state on output only, to ensure we trigger the defer code.

MFC after: 2 weeks

17 months agopfsync: add missing bucket lock
Kristof Provost [Thu, 2 Feb 2023 09:34:57 +0000 (10:34 +0100)]
pfsync: add missing bucket lock

pfsync_q_ins() expects us to hold the bucket lock, but when we enter it
from pfsync_state_import() we don't.

MFC after: 2 weeks

17 months agohastctl: use zlib's crc32 implementation.
Xin LI [Fri, 3 Feb 2023 08:30:08 +0000 (00:30 -0800)]
hastctl: use zlib's crc32 implementation.

X-MFC-with: 6998572a74a
MFC after:      2 weeks

17 months agohastd: use zlib's crc32 implementation.
Xin LI [Fri, 3 Feb 2023 07:14:21 +0000 (23:14 -0800)]
hastd: use zlib's crc32 implementation.

Reviewed by: pjd
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D35767

17 months agokern_prot.c p_candebug(): Remove single-use variable.
Pawel Jakub Dawidek [Tue, 31 Jan 2023 00:15:04 +0000 (16:15 -0800)]
kern_prot.c p_candebug(): Remove single-use variable.

Reviewed by: allanjude, oshogbo
Approved by: allanjude, oshogbo
Differential Revision: https://reviews.freebsd.org/D38288

17 months agonv.9: Improve style in one of the examples.
Pawel Jakub Dawidek [Mon, 7 Nov 2022 08:10:16 +0000 (00:10 -0800)]
nv.9: Improve style in one of the examples.

Reviewed by: allanjude, oshogbo
Approved by: allanjude, oshogbo
Differential Revision: https://reviews.freebsd.org/D38287

17 months agowhitespace: rewrap to match case directly above
Brooks Davis [Fri, 3 Feb 2023 00:37:31 +0000 (00:37 +0000)]
whitespace: rewrap to match case directly above

It's easier to visually diff the two case blocks if there aren't
gratutious whitespace differences.

Sponsored by: DARPA

17 months agovfs_export: Add checks for correct prison when updating exports
Rick Macklem [Fri, 3 Feb 2023 00:20:58 +0000 (16:20 -0800)]
vfs_export: Add checks for correct prison when updating exports

mountd(8) basically does the following:
getmntinfo()
for each mount
      delete_exports
using nmount(2) to do the creation/deletion of individual exports.

For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo()
returns all mount points, including ones being used within other prisons.
This can cause confusion if the same file system is specified in the
exports(5) file for multiple prisons.

This patch adds a perminent identifier to each prison
and marks which prison did the exports in a field of
the mount structure called mnt_exjail.  This field can
then be compared to the perminent identifier for the
prison that the thread's credentials is in.
Also required was a new function called prison_isalive_permid()
which returns if the prison is alive, so that the check can be
ignored for prisons that have been removed.

This prepares the system to allow mountd(8) to run in multiple
prisons, including prison0.

Future commits will complete the modifications to allow mountd(8)
to run in vnet prisons.  Until then, these changes should not affect
semantics.

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D38144

17 months agotarfs: Remove unused code.
Dag-Erling Smørgrav [Thu, 2 Feb 2023 23:11:38 +0000 (23:11 +0000)]
tarfs: Remove unused code.

Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.

17 months agotarfs: Fix non-ZSTDIO build.
Dag-Erling Smørgrav [Thu, 2 Feb 2023 22:23:52 +0000 (23:23 +0100)]
tarfs: Fix non-ZSTDIO build.

Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.

17 months agosctp: improve delivery of stream reset notifications
Michael Tuexen [Thu, 2 Feb 2023 13:46:10 +0000 (14:46 +0100)]
sctp: improve delivery of stream reset notifications

Two functions are not called via sctp_ulp_notify() and therefore
need additional checks when being called.

Reported by: syzbot+eb888d3a5a6c54413de5@syzkaller.appspotmail.com
MFC after: 3 days

17 months agokboot: Remove kboot_loadaddr
Warner Losh [Thu, 2 Feb 2023 21:08:15 +0000 (14:08 -0700)]
kboot: Remove kboot_loadaddr

Turns out that the loadaddr interface is not sufficiently expressive to
do the loading we need to do. Instead, we'll emulate some of its
features with inline math in copyin/copyout.

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38260

17 months agokboot: Assert errno is negative
Warner Losh [Thu, 2 Feb 2023 21:08:03 +0000 (14:08 -0700)]
kboot: Assert errno is negative

When converting from a Linux error to a FreeBSD errno, assert that the
value passed in is negative, as is Linux's custom.

Suggested by: brooks
Sponsored by: Netflix
Reviewed by: tsoome, brooks
Differential Revision: https://reviews.freebsd.org/D38357

17 months agotarfs: Fix 32-bit build.
Dag-Erling Smørgrav [Thu, 2 Feb 2023 20:36:01 +0000 (21:36 +0100)]
tarfs: Fix 32-bit build.

Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.

17 months agokboot: Parse memory usage
Warner Losh [Thu, 2 Feb 2023 20:11:57 +0000 (13:11 -0700)]
kboot: Parse memory usage

To properly size segments, we have to know how much memory we have in
the system, as well as how much this process can allocate.  Due to our
inability to overcommit, we need to know how much memory is
available. commit_limit is the grand total allowed. committed_as is the
current memory used. mem_avail is what Linux tells us is available. Find
these from /proc/meminfo. We'll use them later to allocate the biggest
possible segment sizes, but for now print the raw numbers.

Sponsored by: Netflix
Reviewed by: kevans (earlier version)
Differential Revision: https://reviews.freebsd.org/D38267

17 months agokboot: For hostfs, return better errors from read, where possible.
Warner Losh [Thu, 2 Feb 2023 20:06:24 +0000 (13:06 -0700)]
kboot: For hostfs, return better errors from read, where possible.

Translate the Linux error return from read to a FreeBSD errno. We use a
simplified translation: 1-34 are the same between the systems, so any of
those will be returned directly. All other errno map to EINVAL. This
will suffice for some code that reads /dev/mem in producing the right
diagnostic.

A fully generalized version is much harder. Linux has a number of errno
that don't translate well and has architecture dependent
encodings. Avoid this mess with a simple macro for now. Add comment
explaining why we use the simple method we do.

Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38265

17 months agokboot: Fix hostdisk fmtdev
Warner Losh [Thu, 2 Feb 2023 20:03:39 +0000 (13:03 -0700)]
kboot: Fix hostdisk fmtdev

The device name was totally wrong. It should be "/dev/mumble:" not just
"mumble".

Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38318

17 months agokboot: Trim initial allocation to 64MB
Warner Losh [Thu, 2 Feb 2023 20:03:28 +0000 (13:03 -0700)]
kboot: Trim initial allocation to 64MB

We only need 64MB to read off ZFS pools. Since Linux doesn't do
ovecommit by default, the extra 64MB is 64MB less we can allocate for
things like RAM disks.

Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38268

17 months agostand: only compute symidx on x86
Warner Losh [Thu, 2 Feb 2023 20:03:10 +0000 (13:03 -0700)]
stand: only compute symidx on x86

We only use symidx on x86, so only compute it on x86 to fix a set but
not used warning on aarch64.

Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38246

17 months agogh-bc: don't force CFLAGS to -O0 -g
Dimitry Andric [Thu, 2 Feb 2023 18:20:16 +0000 (19:20 +0100)]
gh-bc: don't force CFLAGS to -O0 -g

Otherwise, CFLAGS passed in via bsd.sys.mk or the user's environment are
not respected, and this leads to link errors on riscv64sf.

17 months agoipfilter: Fix use after free on packet with broken lengths
Cy Schubert [Thu, 2 Feb 2023 00:49:08 +0000 (16:49 -0800)]
ipfilter: Fix use after free on packet with broken lengths

Under the scenario with a packet with length of 67 bytes, a header length
using the default of 20 bytes and a TCP data offset (th_off) of 48 will
cause m_pullup() to fail to make sure bytes are arragned contiguously.
m_pullup() will free the mbuf chain and return a null. ipfilter stores
the resultant mbuf address (or the resulting NULL) in its fr_info_t
structure. Unfortuntely the eroneous packet is not flagged for drop.
This results in a kernel page fault at line 410 of sys/netinet/ip_fastfwd.c
as it tries to use a now previously freed, by m_pullup(), mbuf.

PR: 266442
Reported by: Robert Morris <rtm@lcs.mit.edu>
MFC after: 1 week

17 months agoipfilter: Correctly type ipf_pullup()
Cy Schubert [Tue, 31 Jan 2023 19:09:00 +0000 (11:09 -0800)]
ipfilter: Correctly type ipf_pullup()

ipf_pullup() outputs a pointer to ip_t. Though returning a pointer to
void does work, it is imprecise and not completely correct.

MFC after: 1 week

17 months agotimeout: Move from /usr/bin to /bin
Mateusz Piotrowski [Wed, 1 Feb 2023 15:24:59 +0000 (16:24 +0100)]
timeout: Move from /usr/bin to /bin

timeout(1) is used by /etc/rc.d/zfskeys. Unfortunately, having
timeout(1) installed in /usr/bin causes problems when /usr is an
encrypted ZFS partition.

Implementing timeout(1) in sh(1) is not trivial. A more elegant solution
is to move timeout(1) to /bin so that it is available to early services
in the boot process.

PR: 265221
Reviewed by: allanjude, des, imp
Approved by: allanjude, des, imp
Reported by: Ivan <r4@sovserv.ru>
Fixes: 33ff39796ffe Add zfskeys rc.d script for auto-loading encryption keys
MFC after: 1 week
Relnotes: yes
Sponsored by: Modirum MDPay
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D38344

17 months agorescue: Fix link order of SSL libraries and fetch.
John Baldwin [Thu, 2 Feb 2023 17:23:02 +0000 (09:23 -0800)]
rescue: Fix link order of SSL libraries and fetch.

ld.bfd requires libraries to be linked in order.  libssl requires
libcrypto.  libfetch requires libssl.  To fix the latter, move fetch
up above tar rather than listing the ssl libraries twice.

Reviewed by: delphij
Fixes: ea34aa4780e5 rescue: Add fetch(1) to the rescue tool.
Differential Revision: https://reviews.freebsd.org/D38304

17 months agolinux(4): Remove stale comment that no longer applies.
Dmitry Chagin [Thu, 2 Feb 2023 17:21:37 +0000 (20:21 +0300)]
linux(4): Remove stale comment that no longer applies.

MFC after: 1 week

17 months agolinux(4): Microoptimize rt_sendsig() on amd64.
Dmitry Chagin [Thu, 2 Feb 2023 17:21:37 +0000 (20:21 +0300)]
linux(4): Microoptimize rt_sendsig() on amd64.

Drop proc lock earlier, before copying user stuff.

Pointed out by: kib
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D38326
MFC after: 1 week

17 months agolinux(4): Preserve fpu fxsave state across signal delivery on amd64.
Dmitry Chagin [Thu, 2 Feb 2023 17:21:37 +0000 (20:21 +0300)]
linux(4): Preserve fpu fxsave state across signal delivery on amd64.

PR: 240768
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D38302
MFC after: 1 week

17 months agoAdd tarfs, a filesystem backed by tarballs.
Dag-Erling Smørgrav [Thu, 2 Feb 2023 17:18:41 +0000 (18:18 +0100)]
Add tarfs, a filesystem backed by tarballs.

Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: pauamma, imp
Differential Revision: https://reviews.freebsd.org/D37753

17 months agoRead the arm64 far early in el0 exceptions
Andrew Turner [Wed, 25 Jan 2023 17:47:39 +0000 (17:47 +0000)]
Read the arm64 far early in el0 exceptions

When handling userspace exceptions on arm64 we need to dereference the
current thread pointer. If this is being promoted/demoted there is a
small window where it will cause another exception to be hit. As this
second exception will set the fault address register we will read the
incorrect value in the userspace exception handler.

Fix this be always reading the fault address before dereferencing the
current thread pointer.

Reported by: olivier@
Reviewed by: markj
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D38196

17 months agoLimit where we disable the Arm generic timer
Andrew Turner [Thu, 2 Feb 2023 16:26:25 +0000 (16:26 +0000)]
Limit where we disable the Arm generic timer

Only disable the Arm generic timer on arm64 when entering the kernel
through EL2. There is no guarantee it will be enabled if we are running
under a hypervisor.

Sponsored by: Arm Ltd

17 months agoCheck for the IORT before adding the ITS driver
Andrew Turner [Mon, 19 Dec 2022 14:19:26 +0000 (14:19 +0000)]
Check for the IORT before adding the ITS driver

Before adding the ITS interrupt controller driver to handle MSI/MSI-X
interrupts check if it is present in the IO Remapping Table (IORT).
If not don't attach as devices expect to use this table to find the
correct MSI interrupt controller.

Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D37772

17 months agoixgbe: Do not count L3/L4 checksum errors as input errors
Andrew Gallatin [Thu, 2 Feb 2023 15:02:44 +0000 (10:02 -0500)]
ixgbe: Do not count L3/L4 checksum errors as input errors

NIC input errors have traditionally indicated problems at the link
level (crc errors, runts, etc).  People tend to build monitoring
infrastructure  around such errors in order to monitor for bad network
hardware. When L3/L4 checksum errors are included in the category of
input errors, it breaks such monitoring, as these errors can originate
anywhere on the internet, and do not necessarily indicate faulty
local network hardware.

Reviewed by: erj, glebius
Differential Revision: https://reviews.freebsd.org/D38346
Sponsored by: Netflix

17 months agolinux(4): Deduplicate linux_trans_osrel().
Dmitry Chagin [Thu, 2 Feb 2023 14:58:07 +0000 (17:58 +0300)]
linux(4): Deduplicate linux_trans_osrel().

MFC after: 1 week

17 months agolinux(4): Deduplicate linux_copyout_strings().
Dmitry Chagin [Thu, 2 Feb 2023 14:58:07 +0000 (17:58 +0300)]
linux(4): Deduplicate linux_copyout_strings().

It is still present in the 32-bit Linuxulator on amd64.

MFC after: 1 week

17 months agolinux(4): Deduplicate linux_fixup_elf().
Dmitry Chagin [Thu, 2 Feb 2023 14:58:07 +0000 (17:58 +0300)]
linux(4): Deduplicate linux_fixup_elf().

Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().

MFC after: 1 week

17 months agolinux(4): Add coredump support to i386.
Dmitry Chagin [Thu, 2 Feb 2023 14:58:06 +0000 (17:58 +0300)]
linux(4): Add coredump support to i386.

MFC after: 1 week

17 months agolinux(4): Use COMPAT_LINUX32 enstead of __ELF_WORD_SIZE.
Dmitry Chagin [Thu, 2 Feb 2023 14:58:06 +0000 (17:58 +0300)]
linux(4): Use COMPAT_LINUX32 enstead of __ELF_WORD_SIZE.

COMPAT_LINUX32 option is defined for case when building 32-bit Linuxulator
for the 64-bit host. Usage of __ELF_WORD_SIZE is wrong here as it is equal to 32
on i386 too.

MFC after: 1 week

17 months agolinux(4): Microoptimize linux_elf.h for future use.
Dmitry Chagin [Thu, 2 Feb 2023 14:58:06 +0000 (17:58 +0300)]
linux(4): Microoptimize linux_elf.h for future use.

In order to reduce code duplication move coredump support definitions
into the appropriate header and hide private definitions.

MFC after: 1 week

17 months agocp: Simplify the common case.
Dag-Erling Smørgrav [Wed, 1 Feb 2023 20:06:28 +0000 (21:06 +0100)]
cp: Simplify the common case.

* The allocated buffer is only used in the fallback case, so move it
  there.  The argument for passing it in from the caller was that if
  malloc(3) were to fail, we'd want it to fail before we started
  copying anything, but firstly, it was already not in the right place
  to ensure that, and secondly, malloc(3) never fails (except in very
  contrived circumstances, such as an unreasonable RLIMIT_AS or
  RLIMIT_DATA).

* Remove the mmap(2) option.  It is almost never beneficial,
  especially when the alternative is copy_file_range(2), and it adds
  needless complexity and indentation.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: rmacklem, mav
Differential Revision: https://reviews.freebsd.org/D38291

17 months agocp: Add tests involving sparse files.
Dag-Erling Smørgrav [Wed, 1 Feb 2023 20:06:24 +0000 (21:06 +0100)]
cp: Add tests involving sparse files.

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D38290

17 months agolibc: Add tests for strchrnul(3).
Dag-Erling Smørgrav [Wed, 1 Feb 2023 20:06:20 +0000 (21:06 +0100)]
libc: Add tests for strchrnul(3).

MFC after: 1 week
Sponsored by: Klara, Inc.
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D38286

17 months agoRevert "sys/kbio.h: support Unicode key codes in vt keymap files"
Stefan Eßer [Thu, 2 Feb 2023 08:05:43 +0000 (09:05 +0100)]
Revert "sys/kbio.h: support Unicode key codes in vt keymap files"

It has been pointed out, that this change causes ABI breakage for
[GP]IO_DEADKEYMAP. I'll create a review on phabricator.

Since the 8 bit limit on keycodes causes issues for certain keymaps,
a fix should be committed in time to allow a MFC to 13.2.

This reverts commit 1e0853ee84031e4131a0b8cc8737696f199d3d4c.

Reported by:      Jessica Clarke

17 months agoatrtc: expose power loss as sysctl
Corvin Köhne [Thu, 8 Dec 2022 07:28:42 +0000 (08:28 +0100)]
atrtc: expose power loss as sysctl

Exposing the a power loss of the rtc as an sysctl makes it easier to
detect an empty cmos battery.

Reviewed by: manu
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D38325

17 months agoprison_check_nfsd: Add check for enforce_statfs != 0
Rick Macklem [Thu, 2 Feb 2023 00:02:20 +0000 (16:02 -0800)]
prison_check_nfsd: Add check for enforce_statfs != 0

Since mountd(8) will not be able to do exports
when running in a vnet prison if enforce_statfs is
set to 0, add a check for this to prison_check_nfsd().

Reviewed by: jamie, markj
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D38189

17 months agolibthr pshared: correct a bug in allocation
Konstantin Belousov [Wed, 1 Feb 2023 20:12:45 +0000 (22:12 +0200)]
libthr pshared: correct a bug in allocation

When __thr_pshared_offpage() is called for allocation, it must not use
the cached offpage for the key.  Instead, the cached offpage must be
unmapped and removed from the cache, if any.

It is legitimate for the user code to unmap the shared lock object without
destroying it, and then mapping something over the freed VA to carry
another shared lock.  In this case the cached offpage must be un-cached.

PR: 269277
Reported by: rau8344@gmail.com
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38345

17 months agolibthr: add pshared_destroy() helper
Konstantin Belousov [Wed, 1 Feb 2023 21:06:04 +0000 (23:06 +0200)]
libthr: add pshared_destroy() helper

Rewviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38345

17 months agokstack_contains(): account for struct pcb on stack
Konstantin Belousov [Tue, 31 Jan 2023 23:49:54 +0000 (01:49 +0200)]
kstack_contains(): account for struct pcb on stack

for arm64, arm, powerpc, and riscv

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38320

17 months agoi386 trap_check_kstack(): use kstack_contains()
Konstantin Belousov [Wed, 1 Feb 2023 00:30:20 +0000 (02:30 +0200)]
i386 trap_check_kstack(): use kstack_contains()

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38320

17 months agoi386 kstack_contains(): account for pcb/fpu save area
Konstantin Belousov [Tue, 31 Jan 2023 23:43:23 +0000 (01:43 +0200)]
i386 kstack_contains(): account for pcb/fpu save area

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38320

17 months agoMove kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h
Konstantin Belousov [Tue, 31 Jan 2023 22:47:40 +0000 (00:47 +0200)]
Move kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38320

17 months agoarm_smcc_1_2_*: Don't trash SP and X19 if no return value structure.
John Baldwin [Wed, 1 Feb 2023 21:54:09 +0000 (13:54 -0800)]
arm_smcc_1_2_*: Don't trash SP and X19 if no return value structure.

Jumping direct to ret was not restoring the saved value of x19 and was
also not adjusting sp to discard the two saved registers.

Reviewed by: andrew
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D37922

17 months agosockets: in sousrsend() pass down the error to aio(4)
Gleb Smirnoff [Wed, 1 Feb 2023 21:03:10 +0000 (13:03 -0800)]
sockets: in sousrsend() pass down the error to aio(4)

This somewhat undermines the initial goal of sousrsend() to have all
the special error handling for a write on a socket in a single place.
The aio(4) needs to see EWOULDBLOCK to re-schedule the job.  Because
aio(4) handles return from soreceive() and sousrsend() with the same
code, we can't check for (error == 0 && done < job_nbytes).  Keeping
this exclusion for aio(4) seems a lesser evil.

Fixes: 7a2c93b86ef75390a60a4b4d6e3911b36221dfbe

17 months agolinux(4): Deduplicate MI futex structures.
Dmitry Chagin [Wed, 1 Feb 2023 18:57:04 +0000 (21:57 +0300)]
linux(4): Deduplicate MI futex structures.

MFC after: 1 week