Wojciech Macek [Fri, 9 Apr 2021 07:28:44 +0000 (09:28 +0200)]
pci_dw: Trim ATU windows bigger than 4GB
The size of the ATU MEM/IO windows is implicitly casted to uint32_t.
Because of that some window sizes were silently demoted to 0 and ignored.
Check the size if its too large, trim it to 4GB and print a warning message.
Rick Macklem [Thu, 8 Apr 2021 21:04:22 +0000 (14:04 -0700)]
nfsd: fix replies from session cache for retried RPCs
Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.
The FreeBSD server failec to reply using the cached
reply in the session slot when an RPC was retried on
the session slot, as indicated by same slot sequence#.
This patch fixes this. It should also fix a similar
failure for NFSv4.0 mounts, when the sequence# in
the open/lock_owner requires a reply be done from
an entry locked into the DRC.
This fix affects the fairly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation. Note that retries only occur after the
client has needed to create a new TCP connection.
Gleb Smirnoff [Mon, 22 Mar 2021 20:51:42 +0000 (13:51 -0700)]
tcp_hostcache: add bool argument for tcp_hc_lookup() to tell are we
looking to only read from the result, or to update it as well.
For now doesn't affect locking, but allows to push stats and expire
update into single place.
tcp: Prepare PRR to work with NewReno LossRecovery
Add proper PRR vnet declarations for consistency.
Also add pointer to tcpopt struct to tcp_do_prr_ack, in preparation
for it to deal with non-SACK window reduction (after loss).
Avoid -pedantic warnings about using _Generic in __fp_type_select
When compiling parts of math.h with clang using a C standard before C11,
and using -pedantic, it will result in warnings similar to:
bug254714.c:5:11: warning: '_Generic' is a C11 extension [-Wc11-extensions]
return !isfinite(1.0);
^
/usr/include/math.h:111:21: note: expanded from macro 'isfinite'
^
/usr/include/math.h:82:39: note: expanded from macro '__fp_type_select'
^
This is because the block that enables use of _Generic is conditional
not only on C11, but also on whether the compiler advertises support for
C generic selections via __has_extension(c_generic_selections).
To work around the warning without having to pessimize the code, use the
__extension__ keyword, which is supported by both clang and gcc. While
here, remove the check for __clang__, as _Generic has been supported for
a long time by gcc too now.
bhyve: fix regression in legacy virtio-9p config parsing
Commit 621b5090487de9fed1b503769702a9a2a27cc7bb introduced a regression
in legacy virtio-9p config parsing by not initializing *sharename to
NULL. As a result, "sharename != NULL" check in the first iteration fails
and bhyve exits with "virtio-9p: more than one share name given".
Andrew Turner [Thu, 1 Apr 2021 14:38:09 +0000 (14:38 +0000)]
arm64: Fix finding the pmc event ID
The lower pmc event bits were masked off to find the PMC event ID.
The doesn't work when there are more events. Switch it to use the
offser relative to the first event while also checking the ID is
in the expected range.
Reviewed by: gnn, ray
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29600
Andrew Turner [Tue, 23 Mar 2021 18:23:47 +0000 (18:23 +0000)]
Discard the arm64 VFP state before resetting it
When resetting the VFP state we need to discard any old state so we don't
try to save it on a context switch. Move this first so resetting the pcb
is safe to perform outside a critical section.
Reviewed by: arichardson
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29401
Ryan Libby [Wed, 7 Apr 2021 19:39:05 +0000 (12:39 -0700)]
shared shadow vm object invalidation regression test
Add a regression test for a scenario where a shadow vm object is shared
by multiple mappings. If a page COW occurs through one of the mappings,
then the virtual-to-physical mapping may become invalidated.
In situations when the current file name wasn't the first element on
the list we were cleaning the current name too early.
This might cause us to pre-cache the same file twice.
Yongbo Yao [Wed, 7 Apr 2021 18:33:22 +0000 (13:33 -0500)]
Loader: support booting OS from memory disk (MD)
Until now, the boot image can be embedded into the loader with
/sys/tools/embed_mfs.sh, and memory disk (MD) is already supported
in loader source. But due to memory disk (MD) driver isn't registered
to the loader yet, the boot image can't be boot from embedded memory
disk.
Mark Johnston [Wed, 7 Apr 2021 18:19:52 +0000 (14:19 -0400)]
capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.
Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.
Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423
When we find a state for packets that was created by a reply-to rule we
still need to process the packet. The state may require us to modify the
packet (e.g. in rdr or nat cases), which we won't do with the shortcut.
Kristof Provost [Thu, 25 Mar 2021 12:59:14 +0000 (13:59 +0100)]
libnv: Allow use in non-sleepable contexts
44c125c4cebc2fd87c6260b90eddae11201f5232 switched the nvlist allocations
to be M_WAITOK, but this precludes the use in non-sleepable contexts.
(E.g. with a nonsleepable lock held).
All callers for these allocation functions already cope with memory
alloation failures, so there's no reason to allow sleeping during
allocations.
pf tests: make synproxy and nat work correctly even if inetd is running
tests/sys/netfil/pf/synproxy fails if inetd has been running
outside of the jail because pidfile_open() fails with EEXIST.
tests/sys/netfil/pf/nat has the same problem but the test succeeds
because whether inetd is running is not so important.
Fix the problem by changing the pidfile path from the default
location.
Ka Ho Ng [Wed, 7 Apr 2021 11:00:31 +0000 (19:00 +0800)]
Document vnode_pager_setsize(9)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Reviewed by: bcr
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29408
When the FreeBSD testsuite runs the libarchive tests it checks that stderr
is empty. Since #1382 this is no longer the case. This change restores
the behaviour of silencing bunzip2 stderr but doesn't bring back the
output text check.
Alexander Motin [Tue, 6 Apr 2021 21:27:16 +0000 (17:27 -0400)]
Introduce "soft" serseq variant.
With new ZFS prefetcher improvements it is no longer needed to fully
serialize reads to reach decent prediction hit rate. Softer variant
only creates small time window to reduce races instead of completely
blocking following reads while previous is running. It much less
hurts the performance in case of prediction miss.
Allocate extra inodes in makefs when leaving free space in UFS images.
By default, makefs(8) has very few spare inodes in its output images,
which is fine for static filesystems, but not so great for VM images
where many more files will be added. Make makefs(8) use the same
default settings as newfs(8) when creating images with free space --
there isn't much point to leaving free space on the image if you
can't put files there. If no free space is requested, use current
behavior of a minimal number of available inodes.
Reviewed by: manu
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29492
Eric van Gyzen [Tue, 6 Apr 2021 14:36:52 +0000 (09:36 -0500)]
uefisign: fix handling of errors from child proc
Close the unused pipe file descriptors so the parent will notice if
the child exits prematurely. Previously, the parent would block
forever on a read from the pipe.
Marcin Wojtas [Tue, 6 Apr 2021 16:50:36 +0000 (18:50 +0200)]
pci_user: fix build for 32-bit platforms
Commit: f2f1ab39c040 ("pci_user: call bus_translate_resource before BAR mmap")
broke build for 32-bit platforms due to rman_res_t and vm_paddr_t
incompatible types. Fix that.
Marcin Wojtas [Tue, 6 Apr 2021 15:10:04 +0000 (17:10 +0200)]
pci_user: call bus_translate_resource before BAR mmap
On some armv8 machines it is possible that the mapping between CPU
and PCI bus BAR base addresses is not 1:1. In case a BAR is allocated
in kernel using bus_alloc_resource_any this translation is handled in
ofw_pci_activate_resource.
Do the same in pci_user.c by calling bus_translate_resource devmethod.
This fixes mmaping BARs to userspace on Marvell SoCs (Armada 7k8k/CN913x)
and possibly many other platforms.
Marcin Wojtas [Tue, 6 Apr 2021 15:00:05 +0000 (17:00 +0200)]
pciconf: Use VM_MEMATTR_DEVICE on supported architectures
Some architectures - armv7, armv8 and riscv use VM_MEMATTR_DEVICE
when mapping device registers in kernel. Do the same in pciconf.
On armada8k SoC all reads from BARs mapped with hitherto attribute
(VM_MEMATTR_UNCACHEABLE) return 0xff's.
Radix MMU code was missing TLB invalidations when some Level 3 PDEs were
modified. This caused TLB multi-hit machine check interrupts when
superpages were enabled.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D29511
Stefan Eßer [Tue, 6 Apr 2021 09:57:07 +0000 (11:57 +0200)]
[bc] Update to version 4.0.0
This version fixes an issue (missing pop of top-of-stack value in the
"P" command of the dc program).
This issue did not affect the bc program, since it does not use dc as
an back-end to actually perform the calculations as was the case with
the traditional bc and dc programs.
The major number has been bumped due to Windows support that has been
added to this version. It does not correspond to a major change that
might affect FreeBSD.
This version fixes an issue (missing pop of top-of-stack value in the
"P" command of the dc program).
This issue did not affect the bc program, since it does not use dc as
an back-end to actually perform the calculations as was the case with
the traditional bc and dc programs.
The major number has been bumped due to Windows support that has been
added to this version. It does not correspond to a major change that
might affect FreeBSD.
Warner Losh [Tue, 6 Apr 2021 05:55:08 +0000 (23:55 -0600)]
gptboot.efi: Add man page
Add a man page for gptboot.efi. Describe when and how to use this as it differs
from the BIOS cases. Include cross reference for the preferred method described
in efibootmgr(8) as well as cross links in both gptboot(8) and gptboot.efi(8) to
the other.
This man page was heavily copied from the gptboot.8 man page by Warren Block.
They are different enough to need separate man pages for clarity, but there's
enough similarity that I worry about the duplication. In the really long term,
gptboot(8) will disappear, so having the same info here will help when that
day comes. In the short to medium term, the information is likely to not
change in gptboot(8) and any changes to gptboot.efi(8) will be easier to
make in a separate copy.
loader.efi(8) needs a complete rewrite from scratch, otherwise I'd have
referenced gptboot.efi(8) from there.
ixl(4): Add tunable to override Flow Control settings
Add flow_control to hw.ixl tunables tree to let override
initial flow control configuration for all interfaces.
Keep using configuration set by NVM by default.
Ed Maste [Sun, 4 Apr 2021 00:57:26 +0000 (20:57 -0400)]
freebsd-update: improve mandoc db generation
freebsd-update compares the dates on man pages with mandoc.db, and if
any newer pages are found it regenerates mandoc.db.
Previously, if mandoc.db did not already exist the check failed and
freebsd-update then failed to create one. Now, check that mandoc.db
exists before performing the check for newer pages.
Reported by: bdrewery (in D10482)
Reviewed by: gordon
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29575
Ed Maste [Mon, 5 Apr 2021 17:16:01 +0000 (13:16 -0400)]
release: move installworld before installkernel
To support -DNO_ROOT work. The top-level installworld target creates a
new METALOG starting with `#mtree 2.0` so it needs to be first, to avoid
overwriting installkernel METALOG entries.
Reviewed by: gjb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29582
Chuck Tuffli [Fri, 5 Mar 2021 16:13:23 +0000 (08:13 -0800)]
wait for device mounts in zpool and dumpon
If the root file system is composed from multiple devices, wait for
devices to be ready before running zpool and dumpon rc scripts.
An example of this is if the bulk of the root file system exists on a
fast device (e.g. NVMe) but the /var directory comes from a ZFS dataset
on a slower device (e.g. SATA). In this case, it is possible that the
zpool import may run before the slower device has finished being probed,
leaving the system in an intermediate state.
Fix is to add root_hold_wait to the zpool and dumpon (which has a
similar issue) rc scripts.
This fixes a problem where ctld(8) would refuse to start on boot
with a specific IP address to listen on configured in ctl.conf(5).
It also fixes a problem where ctld(8) would fail to start with
some network interfaces which require a sysctl.conf(5) tweak
to configure them, eg to switch them from InfiniBand to IP mode.
PR: 232397
Reported By: Mahmoud Al-Qudsi <mqudsi at neosmart.net>
Submitted By: Jeremy Faulkner <gldisater at gmail.com>
Reviewed By: mav
Differential Revision: https://reviews.freebsd.org/D29578
Alexander Motin [Mon, 5 Apr 2021 14:34:40 +0000 (10:34 -0400)]
Set PCIe device's Max_Payload_Size to match PCIe root's.
Usually on boot the MPS is already configured by BIOS. But we've
found that on hot-plug it is not true at least for our Supermicro
X11 boards. As result, mismatch between root's configuration of
256 bytes and device's default of 128 bytes cause problems for some
devices, while others seem to work fine.
pf: change pf_route so pf only runs when packets enter and leave the stack.
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.
just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.
the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.
another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.
the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.
discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@
Follow-up change to a6d768d845c173823785c71bb18b40074e7a8998.
This change adds iflib support for netmap offsets, enabling
applications to use offsets on any driver backed by iflib.
Rick Macklem [Mon, 5 Apr 2021 01:15:54 +0000 (18:15 -0700)]
nfsd: make the server repeat CB_RECALL every couple of seconds
Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. This will fix the back channel, but the
first attempt at a callback like CB_RECALL will already have
failed. Without this patch, a CB_RECALL will not be retried
and that can result in a 5 minute delay until the delegation
times out.
This patch modifies the code so that it will retry the
CB_RECALL every couple of seconds, often avoiding the
5 minute delay.
This is not critical for correct behaviour, but avoids
the 5 minute delay for the case where the Linux client
re-binds the back channel via BindConnectionToSession.
Ed Maste [Mon, 5 Apr 2021 01:01:28 +0000 (21:01 -0400)]
readelf: return error in case of invalid file
GNU readelf exits with an error for a number of invalid file cases.
Previously ELF Tool Chain readelf always exited with 0. Now we exit 1
upon detecting an error with one or more input files, but in any case
all of them are processed.
This should catch common failure cases. We still do not report an error
for some types of malformed ELF files, but this is consistent with GNU
readelf.
PR: 252727
Reviewed by: jkoshy, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29377
Rick Macklem [Sun, 4 Apr 2021 22:05:39 +0000 (15:05 -0700)]
nfsd: fix BindConnectionToSession so that it clears "cb path down"
Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly
binding the back channel to a new TCP connection so that it
conforms to RFC5661, for NFSv4.1/4.2. An effect of this
for the Linux NFS client is that it will do a
BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN
set in a sequence reply. It will do this for every RPC
reply until it no longer sees the flag.
Without that patch, this will happen until the client does
an Open, which will clear LCL_CBDOWN.
This patch clears LCL_CBDOWN right away, so that
NFSV4SEQ_CBPATHDOWN will no longer be sent to the client
in Sequence replies and the Linux client will not repeat
the BindConnectionToSession RPCs.
This is not critical for correct behaviour, but reduces
RPC overheads for cases where the Open will not be done
for a while.