CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

zfs: bring per_txg_dirty_frees_percent back to 30

This cherry-picks upstream eb9bec0a5d19abf9404f52081424fbb814e6188a

    The current value causes significant artificial slowdown during mass
    parallel file removal, which can be observed both on FreeBSD and Linux
    when running real workloads.

    Sample results from Linux doing make -j 96 clean after an allyesconfig
    modules build:

    before: 4.14s user 6.79s system 48% cpu 22.631 total
    after:  4.17s user 6.44s system 153% cpu 6.927 total

    FreeBSD results in the ticket.

See https://github.com/openzfs/zfs/issues/13932

Apply llvm fix for assertion/crash building math/vtk

Merge commit 307ace7f20d5 from llvm git (by David Sherwood):

  [LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs

  When vectorising ordered reductions we call a function
  LoopVectorizationPlanner::adjustRecipesForReductions to replace the
  existing VPWidenRecipe for the fadd instruction with a new
  VPReductionRecipe. We attempt to insert the new recipe in the same
  place, but this is wrong because createBlockInMask may have
  generated new recipes that VPReductionRecipe now depends upon. I
  have changed the insertion code to append the recipe to the
  VPBasicBlock instead.

  Added a new RUN with tail-folding enabled to the existing test:

    Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

  Differential Revision: https://reviews.llvm.org/D129550

Reported by: yuri
PR: 264834
MFC after: 3 days

name2oid: use find_oidname

In name2oid, use sysctl _find_oidname instead of re-implementing it.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36765

usb: musb_otg_allwinner: de-constify bus_space_tags

The SAN interceptors simply take a bus_space_tag_t, so we're
dropping qualifiers here. const semantics with a ptr typedef mean we'd
have to drop or change the bus_space_tag_t abstraction used in the SAN
sanitizers in order to make a compatible change there, which likely
isn't worth it.

Reviewed by: andrew, markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36764

nfsclient: access v_mount only after the vnode is locked

and we checked that it is not reclaimed.

Reviewed by: markj, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36722

sysctl_search_oid: remove useless tests

sysctl_search_old makes several tests in a loop that can be removed.

The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.

The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.

Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.

Restructure and add comments that makes clearer that this is a basic
depth-first search.

Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36741

snd_uaudio(4): Add some examples accessing USB MIDI devices.

Differential Revision: https://reviews.freebsd.org/D36195
MFC after: 1 week
Sponsored by: NVIDIA Networking

Tcp progress timeout

Rack has had the ability to timeout connections that just sit idle automatically. This
feature of course is off by default and requires the user set it on (though the socket option
has been missing in tcp_usrreq.c). Lets get the progress timeout fully supported in
the base stack as well as rack.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36716

Fix CVE-2020-10188

Reviewed by: emaste
Obtained from: NetBSD 6cc1539c8028b
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D36732

register_oid: fix duplicate oid after d3f96f661050

sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.

Reviewed by: asomers
MFC with: d3f96f661050
Differential Revision: https://reviews.freebsd.org/D36743

sysctl(3): Implement SYSCTL_FOREACH() to iterate all OIDs in a sysctl list.

To avoid using the sysctl list macros directly in external kernel modules.

Reviewed by: asomers, manu and asiciliano
Differential Revision: https://reviews.freebsd.org/D36748
MFC after: 1 week
Sponsored by: NVIDIA Networking

kasan: disable kasan_mark() after a violation

Specifically, when we receive a violation and we're configured to panic,
kasan_enabled gets unset before we descend into panic(). At this point,
there's no longer any reason to allow marking as kasan_shadow_check() is
disabled -- we have some inherent risk of faulting or panicking if the
system's in a bad enough state with no benefit.

Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36742

When taking a snapshot on a UFS/FFS filesystem, it must be mounted.

The "update" mount option must be specified when the "snapshot"
mount option is used. Return EINVAL if the "snapshot" option is
specified without the "update" option also requested.

Reported by:  Robert Morris
Reviewed by:  kib
PR:           265362
MFC after:    2 weeks
Sponsored by: The FreeBSD Foundation

arm64 pmap: per-domain pv chunk list

As with amd64 use a per-domain pv chunk lock to reduce contention as
chunks get created and removed all the time.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36307

arm64 pmap: batch chunk removal in pmap_remove_pages

As with amd64 batch chunk removal in pmap_remove_pages to move it out
of the pv list lock. This is one of the main contested locks when
running poudriere on a 160 core Ampere Altra server.

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36305

vt(4): Make sure vt_switch_timer() has a sleepable context.

Fixes the following panic backtrace:

panic()
usbhid_sync_xfer()
usbhid_set_report()
hid_set_report()
hidbus_write()
hid_write()
hkbd_set_leds()
hkbd_ioctl_locked()
hkbd_ioctl_locked()
hkbd_ioctl()
kbdmux_ioctl()
vt_window_switch()
vt_switch_timer()

Differential Revision: https://reviews.freebsd.org/D36715
MFC after: 1 week
Sponsored by: NVIDIA Networking

Reorder pmap_bootstrap_state to reduce holes

Reduce holes in pmap_bootstrap_state by moving freemempos after the
pointers as they are more likely to change size in any future ABI.

Sponsored by: The FreeBSD Foundation

Remove unneeded variables in the arm64 pmap bootstrap

These are now unneeded after cleaning up the pmap bootstrap process.
Remove them and the variables that set them.

Sponsored by: The FreeBSD Foundation

if_epair: refactor interface creation and enqueue code.

* Factor out queue selection (epair_select_queue()) and mbuf
preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
epair_menq() implementation and reduces the amount of dependencies
on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
epair_clone_create(). It simplifies the clone creation logic.

Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689

pf: fix memory leak retrieving Ethernet rules

Remember to free the nvlist we've added to our main nvlist.

Sponsored by: Rubicon Communications, LLC ("Netgate")

netinet6: factor interface addition code to the dedicated function

Summary:
Move SIOCAIFADDR_IN6 (current "primary" ioctl to add an IPv6
interface address) handling code to the dedicated in6_addifaddr()
function and make it a part of KPI. This allows in-kernel users to
add/delete interfaces addresses without relying on ioctl interface.

Subscribers: imp, ae, glebius

Differential Revision: https://reviews.freebsd.org/D36713

Remove duplicate arm64 pmap bootstrap code

The table bootstrap functions can now be used for non-DMAP uses. Use
them for all early page table creation.

Sponsored by: The FreeBSD Foundation

Allow changing arm64 table attributes when bootstrapping pmap

Only the DMAP region can be mapped with PXN. Allow the table attributes
to be changed for other mappings.

Sponsored by: The FreeBSD Foundation

Allow the arm64 pmap table bootstrap to work in more places

Rework the pmap_bootstrap table generation so we can use it with
partially filled out page tables after the DMAP has been bootstrapped.
This allows it to be reused by the later bootstrap code.

Sponsored by: The FreeBSD Foundation

Make the arm64 pmap bootstrap state global

So it can be reused by KASAN

Sponsored by: The FreeBSD Foundation

Rename arm64 pmap bootstrap to not be dmap specific

This will be used by KASAN and possibly the kernel in general, rename
so we aren't confused by it.

Sponsored by: The FreeBSD Foundation

bsdinstall: replace ntpdate by ntpd_sync_on_start

  * change current NTP services offered by the FreeBSD Installer;
  * no longer offer ntpdate to be enabled and started on boot;
  * start offering the option to make ntpd set the date and time on boot itself.

The motivation for this change comes from the ntpdate(8) manpage:

  Note: The functionality of this program is now available in the ntpd(8)
  program. See the -q command line option in the ntpd(8) page. After a
  suitable period of mourning, the ntpdate utility is to be retired from
  this distribution.

Approved by: cy (src), dteske (src)
Differential Revision: https://reviews.freebsd.org/D36206

stress2: Fixed double word in comment
Reported by: kib

Fix the build with SCHED_STATS after d3f96f661050

MFC with: d3f96f661050e9bd21fe29931992a8b9e67ff189
Sponsored by: Axcient

Fix O(n^2) behavior in sysctl

Sysctl OIDs were internally stored in linked lists, triggering O(n^2)
behavior when userland iterates over many of them.  The slowdown is
noticeable for MIBs that have > 100 children (for example, vm.uma).  But
it's unignorable for kstat.zfs when a pool has > 1000 datasets.

Convert the linked lists into RB trees.  This produces a ~25x speedup
for listing kstat.zfs with 4100 datasets, and no measurable penalty for
small dataset counts.

Bump __FreeBSD_version for the KPI change.

Sponsored by: Axcient
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D36500

cxgbe: Use secq(9) to manage the timestamp generations.

This is mostly cosmetic, but it also doesn't leave a gap of time where
no structures are valid. Instead, we permit the ISR to continue to
use the previous structure if the write to update cal_current isn't
yet visible.

Reviewed by: gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D36669

cxgbe: Compute timestamps via sbintime_t.

This uses fixed-point math already used elsewhere in the kernel for
sub-second time values.  To avoid overflows this does require updating
the calibration once a second rather than once every 30 seconds.  Note
that the cxgbe driver already queries multiple registers once a second
for the statistics timers.  This version also uses fewer instructions
with no branches (for the math portion) in the per-packet fast path.

Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D36663

copy_file_range: truncate write if it would exceed RLIMIT_FSIZE

PR: 266611
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36706

LinuxKPI: add DMA_MAPPING_ERROR

While we deal with 0 returned, some drivers directly use and check for
DMA_MAPPING_ERROR. Add the case and check for both in dma_mapping_error().

MFC after: 1 week
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36686

LinuxKPI: add struct dmi_header and unsupported dmi_walk()

Add a structure definition as well as a dummy dmi_walk for now
which returns an error as not supported. Our current dmi implementation
is special but does not give access to all details but rather only
information from kenv which does not suffice all use cases.

MFC after: 1 week
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36687

LinuxKPI: add the "dummy" includes directory to builds

While we could add the dummy includes directory manually to only the
drivers needing it, it seems a lot easier to simply add it to all
without any expected harm.

This is needed for more drivers (and to remove some #ifdef in current
ones) with empty header files being present not yielding errors.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: hselasky, imp
Differential Revision: https://reviews.freebsd.org/D36684

LinuxKPI: define LINUXKPI_INCLUDES for module builds as well

While for in-kernel we already have LINUXKPI_INCLUDES in kern.pre.mk
for kmod builds we've not had a common define to use leading to various
spellings of include paths.

In order for the include list to be expanded more easily in the future,
e.g., adding the "dummy" includes (for all) and to harmonize code,
duplicate LINUXKPI_INCLUDES to kmod.mk and use it for all module Makefiles.

MFC after: 1 week
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36683

pci_host_generic: stop address translation in bus_alloc_resource

Translating the provided range prior to rman_reserve_resource(9) is
decidedly wrong; the caller may be trying to do a wildcard allocation,
for which the implementation is expected to DTRT and clamp the range to
what's actually feasible.

We don't use the resulting translation here anyways, so just remove it
entirely -- the rman in the default implementation is derived from
sc->ranges, so the translation should trivially succeed every time as
long as the reservation succeeded. If something has gone awry in a
derived driver, we'll detect it when we translate prior to activation,
so there's likely no diagnostic value in retaining the translation after
reservation either.

Reviewed by: andrew
Noticed by: jhb
Differential Revision: https://reviews.freebsd.org/D36618

kasan: provide bus peek/poke definitions

Reviewed by: andrew, markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36700

arm64: bus: provide bus_space_set_{multi,region}_stream definitions

Reviewed by: andrew
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36719

arm64: bus: unhide bus_space definition with sanitizers included

We'll only be redefining the various bus_* macros, not the definition of
struct bus_space.

Reviewed by: andrew
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36718

TCP complete end status work.

The ending of a connection can tell us a lot about what happened i.e. did
it fail to setup, did it timeout, was it a normal close. Often times this is
useful information to help analyze and debug issues. Rack has had
end status for some time but the base stack as not. Lets go a ahead
and add in the missing bits to populate the end status.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36712

TCP rack does not work properly with cubic.

Right now if you use rack with cubic (the new default cc) you will have
improper results. This is because rack uses different variables than
the base stack (or bbr) and thus tcp_compute_pipe() always returns
so that cubic will choose a 30% backoff not the 50% backoff it should
when it is newreno compatibility mode. The fix is to allow a stack (rack)
to override its own compute_pipe.

Reviewed by: tuexen, rscheff
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36711

telnetd: fix two-byte input crash

Move initialization of the slc table earlier so it doesn't get
accessed before that happens.

For details on the issue, see:
https://pierrekim.github.io/blog/2022-08-24-2-byte-dos-freebsd-netbsd-telnetd-netkit-telnetd-inetutils-telnetd-kerberos-telnetd.html

Reviewed by: cy
Obtained from: NetBSD via cy
Differential Revision: https://reviews.freebsd.org/D36680

rb_tree: add augmentation comments

Add comments to better explain why augmentation is done in several places.
Reviewed by: alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36646

netinet: pass cred instead of the curthread to ifaddr manipulation funcs.

Pass the credentials directly to the functions, so non-ioctl kernel
users can also performan address manipulations.

MFC after: 2 weeks

man9: Add MLINKs for bus_space_{peek,poke}

MFC after: 1 week

posixshm tests: Map the large pages in the madvise test

This improves test coverage and was unintentionally omitted when the
tests were written.

MFC after: 1 week

arm64: Fix an assertion in pmap_enter_largepage()

The intent is to assert that either no mapping exists at the given VA,
or that the existing L1 block mapping maps the same PA.

Fixes: 36f1526a598c ("Add experimental 16k page support on arm64")
Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36698

arm64: Handle 1GB mappings in pmap_enter_quick_locked()

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36697

if_ovpn: fix address family check when traffic class bits are set

When the tunneled (IPv6) traffic had traffic class bits set (but only >=
16) the packet got lost on the receive side.

This happened because the address family check in ovpn_get_af() failed
to mask correctly, so the version check didn't match, causing us to drop
the packet.

While here also extend the existing 6-in-6 test case to trigger this
issue.

PR: 266598
Sponsored by: Rubicon Communications, LLC ("Netgate")

tcp: make RACK loadable again using the default configuration

Without this patch, loading the RACK stack required the newreno
CC module to be compiled into the kernel. This is not the case
anymore since CUBIC is the default now.

Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D36707

stress2: Added regression tests

arm64: Enabling new hypercalls using HvCallSetVpRegisters and HvCallGetVpRegisters

Enabling HvCallSetVpRegisters and HvCallGetVpRegisters for hypercalls to
read and write to specific MSRs. This is required for implementing wrmsr
and rdmsr, which is required for Hyper-V vmbus driver for ARM64.

Also we need to use arm smccc hvc 1.2 version as we need to access
registers beyond X0-X3 for HvCallGetVpRegisters. Currently scoping
it only for Hyper-V.

Reviewed by: lwhsu, andrew, whu
Tested by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Signed-off-by: Souradeep Chakrabarti <schakrabarti@microsoft.com>
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D36256

scandir(3): Rename alphasort_thunk to scandir_thunk_cmp to
reflect that it is not alphasort-specific.

Reported by: emaste
Reviewed by: emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36708

lindebugfs,rtw88,rtw89: correct module dependencies

In f697b9432d9c7aa4c5ab5f5445ef5dc1bd40ce00 the name of the PSEUDOFS
was changed from debugsfs to lindebugfs but the in-tree consumers
were not updated now leaving the drivers not loading if compiled
with debugfs support due to missing dependencies.

MFC after: 3 days
X-MFC-with-after: f697b9432d9c7aa4c5ab5f5445ef5dc1bd40ce00

iwlwifi: enforce FreeBSD specific (expected) behaviour

iwlwifi can return early from probe (in FreeBSD attach) while a separate
thread is still grinding loading the firmware and setting things up.
For us this means that kldload succeeded but we may not have a physical
wireless interface (com) yet but the rc framework might already try to
configure a vap on one.

Wait until we get a firmware completion event from the other thread
(on success or error) and block returning. That way we can ensure that
the "hw" (or com in net80211 terms) is there when we return from attach
matching the expected FreeBSD driver behaviour.

Reported by: J.R. Oldroyd (jr opal.com)
Reported by: probably inderectly showing as other problem
Tested by: J.R. Oldroyd (jr opal.com)
Sponsored by: The FreeBSD Foundation
MFC after: 3 days

fusefs: truncate write if it would exceed RLIMIT_FSIZE

PR: 164793
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36703

fusefs: respect RLIMIT_FSIZE during truncate

PR: 164793
MFC after: 2 weeks
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D36703

snp(4): implement detach

PR: 257964
Reported by: Bertrand Petit <bsdpr@phoe.frmug.org>
Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36690

snp(4): properly report detached/revoked ttys

PR: 257964
Reported by: Bertrand Petit <bsdpr@phoe.frmug.org>
Reviewed by: imp, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36690

Do not delete objdump when WITH_LLVM_BINUTILS is true

WITH_LLVM_BINUTILS links /usr/bin/objdump to llvm-objdump, and similarly
for the man page. Do not delete them in `make delete-old`.

PR: 266603
Sponsored by: The FreeBSD Foundation

clang: remove as(1) cross-reference from man page

PR: 265232
Reviewed by: dim
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differntial Revision: https://reviews.freebsd.org/D36634

contrib/bsddialog: Import version 0.4

Improvements and changes to integrate bsddialog(1) with scripts in BASE.
Overview:

* New options. --and-widget, --keep-tite, --calendar.
* Change output format. Menus and --print-maxsize.
* Redefine sizing. Fixed rows, cols and menurows became at the most.
* Add DIAGNOSTICS. Error messages for bad arguments and options.
* Add keys. Space for --menu, fast keys for --msgbox and --yesno.
* Text. Change default text modification, add --cr-wrap.

See /usr/src/contrib/bsddialog/CHANGELOG '2022-09-24 Version 0.4'
for more detailed information.

Merge commit '9f24fda5a8e7ab8243e71473c7e2dc98b4877e64'

contrib/bsddialog: Import version 0.4

Improvements and changes to integrate bsddialog(1) with scripts in BASE.
Overview:

* New options. --and-widget, --keep-tite, --calendar.
* Change output format. Menus and --print-maxsize.
* Redefine sizing. Fixed rows, cols and menurows became at the most.
* Add DIAGNOSTICS. Error messages for bad arguments and options.
* Add keys. Space for --menu, fast keys for --msgbox and --yesno.
* Text. Change default text modification, add --cr-wrap.

See /usr/src/contrib/bsddialog/CHANGELOG '2022-09-24 Version 0.4'
for more detailed information.

ObsoleteFiles.inc: fix a silly typo

Remove /usr/share/zoneinfo/SystemV, not /usr/share/zoneinfo.
The former was clearly wishful thinking. :-)

MFC after: 3 days
Pointy hat to: philip

share/zoneinfo: don't build obsolete SystemV zones

The /usr/share/zoneinfo/SystemV directory has been empty on FreeBSD
since 2006. The upstream source file was removed in 2020. Also stop
passing yearisdate to zic(8). This has not been necessary for years.
The script has been removed upstream since 2020.

MFC after: 3 days

contrib/tzdata: import tzdata 2022d

Changes: https://github.com/eggert/tz/blob/2022d/NEWS

MFC after: 3 days

Import tzdata 2022d

file: upgrade to 5.43.

MFC after: 3 days

mount_nfs.8: Fix the RFC number now that it exists

The RFC for this finally got published and, therefore,
now has a number. This patch puts this RFC number
in the man page.

This is a content change.

MFC after: 1 week

ifp: add if_setdescr() / if_freedesrt() methods

Add methods for setting and removing the description from the interface,
so the external users can manage it without using ioctl API.

MFC after: 2 weeks

if_clone: add ifc_link_ifp() / ifc_unlink_ifp() to the KPI

Factor cloner ifp addition/deletion into separate functions and
make them public. This change simlifies the current cloner code
and paves the way to the other upcoming cloner / epair changes.

MFC after: 2 weeks

tmpfs: truncate write if it would exceed the fs max file size or RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

msdosfs: truncate write if it would exceed the fs max file size or RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

FFS: truncate write if it would exceed the fs max file size or RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res()

The vn_rlimit_fsizex() function:
- checks that the write does not exceed RLIMIT_FSIZE limit and fs
maximum supported file size
- truncates write length if it exceeds the RLIMIT_FSIZE or max file size,
but there are some bytes to write
- sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise

POSIX mandates the truncated write in case when some bytes can be
written but whole write request fails the RLIMIT_FSIZE check.

The function is supposed to be used from VOP_WRITE()s. Due to
pecularity in the VFS generic write syscall layer, uio_resid must
correctly reflect the written amount (noted by markj). Provide the dual
vn_rlimit_fsizex_res() function to correct uio_resid after the clamp
done in vn_rlimit_fsizex() on VOP_WRITE() return.

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

tmpfs: disallow truncation to set file size past RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

msdosfs: disallow truncation to set file size past RLIMIT_FSIZE

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

UFS: disallow truncation to set file size past RLIMIT_FSIZE

This is mandated by POSIX.

PR: 164793
Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

Add vn_rlimit_trunc()

Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

filesystems: return error from vn_rlimit_fsize() instead of EFBIG

Reviewed by: asomers, jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

tmpfs_subr.c: some style

Use 'td' as the local thread name.
Wrap long lines.
Remove unneeded blank lines.

Reviewed by: asomers, jah, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36625

arm64: Ignore 1GB mappings in pmap_advise()

For the same reason as commit 4c224f8e5f36cfad5a9af8db7c7acdecc3d4c7b5.

MFC after: 1 week

amd64: Ignore 1GB mappings in pmap_advise()

This assertion can be triggered by usermode since vm_map_madvise()
doesn't force advice to be applied to an entire largepage mapping. I
can't see any reason not to permit it, however, since MADV_DONTNEED and
_FREE are advisory and we can simply do nothing when a 1GB mapping is
encountered.

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36675

amd64: Handle 1GB mappings in pmap_enter_quick_locked()

This code path can be triggered by applying MADV_WILLNEED to a 1GB
mapping.

Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36674

amd64: Make it possible to grow the KERNBASE region of KVA

pmap_growkernel() may be called when mapping a region above KERNBASE,
typically for a kernel module.  If we have enough PTPs left over from
bootstrap, pmap_growkernel() does nothing.  However, it's possible to
run out, and in this case pmap_growkernel() will try to grow the kernel
map all the way from kernel_vm_end to somewhere past KERNBASE, which can
easily run the system out of memory.  This happens with large kernel
modules such as the nvidia GPU driver.  There is also a WIP dtrace
provider which needs to map KVA in the region above KERNBASE (to provide
trampolines which allow a copy of traced kernel instruction to be
executed), and its allocations could potentially trigger this scenario.

This change modifies pmap_growkernel() to manage the two regions
separately, allowing them to grow independently.  The end of the
KERNBASE region is tracked by modifying "nkpt".

PR: 265019
Reviewed by: alc, imp, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36673

smr: Fix synchronization in smr_enter()

smr_enter() must publish its observed read sequence number before
issuing any subsequent memory operations. The ordering provided by
atomic_add_acq_int() is insufficient on some platforms, at least on
arm64, because it permits reordering of subsequent loads with the store
to c_seq.

Thus, use atomic_thread_fence_seq_cst() to issue a store-load barrier
after publishing the read sequence number.

On x86, take advantage of the fact that memory operations are not
reordered with locked instructions to improve code density: we can store
the observed read sequence and provide a store-load barrier with a
single operation.

Based on a patch from Pierre Habouzit <pierre@habouzit.net>.

PR: 265974
Reviewed by: alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36370

Vendor import of file 5.43.

sched_4bsd: Fix a racy thread state modification

When a thread switching off-CPU is migrating to a remote CPU,
sched_switch() may trigger a rescheduling of the thread currently
running on that CPU. When doing so, it must ensure that that thread is
locked before modifying thread state. If the thread's lock is not the
scheduler lock, then the thread is in the process of switching off-CPU
and no extra effort is needed, and the initiator does not hold the
thread's lock and thus should not modify any thread state.

Reported and tested by: Steve Kargl
MFC after: 1 week

rpc.tlsclntd.8: Fix the RFC number now that it exists

The RFC for this finally got published and, therefore,
now has a number. This patch puts this RFC number
in the man page.

This is a content change.

MFC after: 1 week

rpc.tlsservd.8: Fix the RFC number now that it exists

The RFC for this finally got published and, therefore,
now has a number. This patch puts this RFC number
in the man page.

This is a content change.

MFC after: 1 week

nfscl: Fix parameter order in the calls to MGET().

Reviewed by: imp, rmacklem
Differential Revision: https://reviews.freebsd.org/D36644

arm64: don't loop forever if first option in kern.cfg.order not available

strchr returns a pointer to the ',', so if the first option in the list
isn't available, we need to step over the , to look at the next
option. So if kern.cfg.order="acpi,fdt" and we have no acpi, we'd loop
forever with order=',fdt'.

Sponsored by: Netflix
Reviewed by: andrew, jhb
Differential Revision: https://reviews.freebsd.org/D36682

cpuset(9): Refer to CPU_SETSIZE not MAXCPU

The maximum CPU number of a cpuset_t is determined by CPU_SETSIZE. In
the kernel this is MAXCPU, but in userspace it is CPU_MAXSIZE unless
CPU_SETSIZE is defined before including sys/_cpuset.h. CPU_MAXSIZE is
256 and in userspace MAXCPU is generally 1 because it being set to a
larger MD value is gated on SMP being defined (not generally the case in
userspace).

Reported by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
Reviewed by: cem, jhb
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D36679

release: Use standard mount points for arm MBR boot images

Traditionally, we've used /boot/msdos for the MBR mount point for the SD
images that we produced. For GPT and bsdinstall, we've used
/boot/efi. Migrate to using /boot/efi for MBR as well and add a
/boot/msdos -> /boot/efi symlink for compatibility (which may disappear
before 14.0, but will remain on the stable branches).

When we first created the arm images, there was no EFI booting and the
FAT partion on an MBR image was used to hold the firmware, uboot.bin,
SoC config files and ubldr. When we transitioned to uboot with EFI, we
put the loader files in the same partition. Later we standardized on
/boot/efi at about the same time we added GPT support to the RE produced
images. We left the MRB case as /boot/msdos for legacy reasons and since
it wasn't always EFI. Later, we dropped support of non-EFI booting on
the RE produced images, so the duality of /boot/msdos diminished even
more. Since so little secondary meaning remains, putting it all in
/boot/efi standardizes the location and reflects the RE images
better as using efi-only booting.

In addition, always label the msdosfs partion 'efi'. While a small
misnomer on some systems that store other files in the ESP, it was
requested in review for more consistency for similar reasons to the
mountpoint rename. There was no way to have an 'alias' or 'second label'
here, so this breaks compatibility. Since the images are self-contained,
this was judged to be an acceptable change.

Sponsored by: Netflix
Reviewed by: manu, allanjude, emaste, gjb
Differential Revision: https://reviews.freebsd.org/D36635

Fix the spelling of interrupt in the GICv3 driver

Reported by: jrtc27
Sponsored by: Innovate UK

Make adding children consistant in the GICv3 drivers

Reorder statements to make them consistant between the ACPI and FDT
GICv3 attachments.

Reported by: jrtc27
Sponsored by: Innovate UK

Teach the GICv3 driver to translate memory ranges

As with the GICv1/2 driver teach the GICv3 driver to translate memory
ranges of children. This allows us to create a common
bus_alloc_resource implementation for bot hACPI and FDT attachments.

Sponsored by: The FreeBSD Foundation

Move the GICv3 bus_print_child function to the parent

This should be common for both ACPI and FDT. Move this to the common
part of the driver.

Sponsored by: The FreeBSD Foundation