Mark Johnston [Mon, 1 Nov 2021 13:27:35 +0000 (09:27 -0400)]
uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled. For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.
For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation. uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.
So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain. Specifically,
fall back to a round-robin slab allocation. This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]
Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful. In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.
Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again. Call vm_wait_domain() before retrying.
Reported by: mjg, tuexen [1]
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Rick Macklem [Sat, 30 Oct 2021 03:35:02 +0000 (20:35 -0700)]
nfscl: Use NFSMNTP_DELEGISSUED in two more functions
Commit 5e5ca4c8fc53 added a NFSMNTP_DELEGISSUED flag to indicate when
a delegation has been issued to the mount. For the common case
where an NFSv4 server is not issuing delegations, this flag
can be checked to avoid acquisition of the NFSCLSTATEMUTEX.
This patch adds checks for NFSMNTP_DELEGISSUED being set
to two more functions.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.
Rick Macklem [Sat, 30 Oct 2021 23:35:02 +0000 (16:35 -0700)]
nfscl: Fix race between Lookup and Setattr of size
PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.
The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).
Here's what seems to happen:
Child Parent
does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time
blocks waiting for locked file does ftruncate(), which
vnode does VOP_SETATTR() of Size,
changing the file's size
while holding an exclusive
lock on the file's vnode
releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
does a read() which returns
wrong data size
This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.
With this patch the test program works as expected.
Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code. I suspect a
fairly recent change to the name caching code exposed this
bug.
Rick Macklem [Sun, 31 Oct 2021 00:08:28 +0000 (17:08 -0700)]
nfscl: Add setting n_localmodtime to the Write RPC code
Similar to commit 2be417843a, I believe there could be a race between
the NFS client VOP_LOOKUP() and file Writing that could result in stale
file attributes being loaded into the NFS vnode by VOP_LOOKUP().
I have not been able to reproduce a failure due to this race, but
I believe that there are two possibilities:
The Lookup RPC happens while VOP_WRITE() is being executed and loads
stale file attributes after VOP_WRITE() returns when it has already
completed the Write/Commit RPC(s).
--> For this case, setting the local modify timestamp at the end of
VOP_WRITE() should ensure that stale file attributes are not loaded.
The Lookup RPC occurs after VOP_WRITE() has returned, while
asynchronous Write/Commit RPCs are in progress and then is
blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which
will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the
NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP()
then acquires the NFS vnode lock and fills in stale file attributes.
--> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf()
when they clear NMODIFIED should ensure that stale file attributes
are not loaded.
Rick Macklem [Sun, 31 Oct 2021 23:31:31 +0000 (16:31 -0700)]
nfscl: Do pNFS layout return_on_close synchronously
For pNFS servers that specify that Layouts are to be returned
upon close, they may expect that LayoutReturn to happen before
the associated Close.
This patch modifies the NFSv4.1/4.2 pNFS client so that this
is done. This only affects a pNFS mount against a non-FreeBSD
NFSv4.1/4.2 server that specifies return_on_close in LayoutGet
replies.
Found during a recent IETF NFSv4 working group testing event.
pf: ensure we have the correct source/destination IP address in ICMP errors
When we route-to a packet that later turns out to not fit in the
outbound interface MTU we generate an ICMP error.
However, if we've already changed those (i.e. we've passed through a NAT
rule) we have to undo the transformation first.
Andriy Gapon [Thu, 4 Nov 2021 11:56:12 +0000 (13:56 +0200)]
htu21: allow configuration via hints on FDT-based systems
On-board devices should be configured via the FDT and overlays.
Hints are primarily useful for external and temporarily attached devices.
Adding hints is much easier and faster than writing and compiling
an overlay.
Andriy Gapon [Sat, 6 Nov 2021 16:47:32 +0000 (18:47 +0200)]
htu21: don't needlessly bother hardware when measurements are not needed
sysctl(8) first queries a sysctl to get a size of its value even if the
sysctl is of a fixed size, e.g. it has an integer type.
Only after that sysctl(8) queries an actual value of the sysctl.
Previosuly the driver would needlessly read a sensor in the first step.
Peter Grehan [Mon, 1 Nov 2021 13:35:43 +0000 (23:35 +1000)]
igc: Use hardware routine for PHY reset
Summary:
The previously used software reset routine wasn't sufficient
to reset the PHY if the bootloader hadn't left the device in
an initialized state. This was seen with the onboard igc port
on an 11th-gen Intel NUC.
The software reset isn't used in the Linux driver so all related
code has been removed.
Tested on: Netgate 6100 onboard ports, a discrete PCIe I225-LM card,
and an 11th-gen Intel NUC.
ip_divert: calculate delayed checksum for IPv6 adress family
Before passing an IPv6 packet to application apply delayed checksum
calculation. Mbuf flags will be lost when divert listener will return a
packet back, so we will not be able to do delayed checksum calculation
later. Also an application will get a packet with correct checksum.
Felix Johnson [Mon, 8 Nov 2021 06:14:58 +0000 (01:14 -0500)]
find(1): Update date format reference and remove cvs(1) references
cvs(1) is not installed by default. Change the date format reference to
note that find(1) understands ISO8601 and RFC822 date formats. Also
remove references to cvs(1).
Mark Johnston [Wed, 3 Nov 2021 19:09:17 +0000 (15:09 -0400)]
scsi_cd: Improve TOC access validation
1. During CD probing, we read the TOC header to find the number of
entries, then read the TOC itself. The header determines the number
of entries, which determines the amount of data to read from the
device into the softc in the CD_STATE_MEDIA_TOC_FULL state. We
hard-code a limit of 99 tracks (plus one for the lead-out) in the
softc, but were not validating that the size reported by the media
would fit in this hard-coded limit. Kernel memory corruption could
occur if not.[1] Add validation to check this, and refuse to cache
the TOC if it would not fit.
2. The CDIOCPLAYTRACKS ioctl uses caller provided track numbers to index
into the TOC, but we only validate the starting index. Add
validation of the ending index.
Also, raise the hard-coded limit from 100 tracks to 170, per a
suggestion from Ken.
Reported by: C Turt <ecturt@gmail.com> [1]
Reviewed by: ken, avg
Sponsored by: The FreeBSD Foundation
Rick Macklem [Tue, 26 Oct 2021 02:09:14 +0000 (19:09 -0700)]
nfscl: Add a missing delegation lock release
There was a case in nfscl_doiods() where the function would return
without releasing the delegation shared lock, if it was aquired by
the call to nfscl_getstateid(). This patch adds that release.
I have never observed a failure due to this missing release, so I
do not know if it ever happens in practice. However, since the pNFS
client is not yet heavily used, it might be the case.
Found by code inspection during a recent NFSv4 IETF working group
testing event.
Ed Maste [Thu, 2 Sep 2021 20:43:59 +0000 (16:43 -0400)]
openssh: restore local change to gssapi include logic
/usr/include/gssapi.h claims that it is deprecated, and gssapi/gssapi.h
should be used instead. So, test HAVE_GSSAPI_GSSAPI_H first falling
back to HAVE_GSSAPI_H.
This will be submitted upstream.
Fixes: 6eac665c8126 ("openssh: diff reduction against...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31810
Dimitry Andric [Fri, 5 Nov 2021 21:26:16 +0000 (22:26 +0100)]
Partially revert ac76bc1145dd because it is no longer necessary
In ac76bc1145dd, I added a few volatiles to work around ctrig_test
failures with {inf,inf}. This is not necessary anymore now, since in 3b00222f156d we added -fp-exception-behavior=maytrap for clang >= 10 in
libm's Makefile. (The flag tells clang to use stricter floating point
semantics, which libm depends on.)
Ed Maste [Mon, 25 Oct 2021 21:25:26 +0000 (17:25 -0400)]
strip/objcopy: handle empty file as unknown
Previously strip reported a somewhat cryptic error for empty files:
strip: elf_begin() failed: Invalid argument
Add a special case to treat empty files as with an unknown file format.
This is consistent with llvm-strip. GNU strip produces no output which
does not seem like useful behaviour (but it does exit with status 1).
Reported by: andrew
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32648
This makes it consistent with other date(1) implementations. Also, it
feels more consistent since hours and minutes are already represented as
HH and MM respectively.
- Use Cm instead of Ar or Sq for command modifiers of the -v flag.
- Remove unnecessary "Ar ..." from the synopsis. It's not clear what it
was referring to.
- Add missing arguments to the -f and -v flags.
- Stylize the dot before "ss" with Cm in the default format in the -f
flag description.
- Set LC_ALL=C in the last example so that the output format of
date(1) always matches the specified format of the -f flag not matter
the locale.
- List the -f flag as optional in all usage lines in the synopsis.
Leandro Lupori [Wed, 20 Oct 2021 18:48:33 +0000 (15:48 -0300)]
powerpc64le: stand fixes
Fix boot1 and loader on PowerPC64 little-endian (LE).
Due to endian issues, boot1 couldn't find the UFS boot partition
and loader wasn't able to load the kernel. Most of the issues
happened because boot1 and loader were BE binaries trying to access
LE UFS partitions and because loader expects the kernel ELF image
to use the same endian as itself.
To fix these issues, boot1 and loader are now built as LE binaries
on PPC64LE. To support this, the functions that call OpenFirmware
were enhanced to correctly perform endian conversion on its input
and output arguments and to change the CPU into BE mode before
making the calls, as OpenFirmware always runs in BE. Besides that,
some other small fixes were needed.
Submitted by: bdragon (initial version)
Reviewed by: alfredo, jhibbits
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D32160
Leandro Lupori [Thu, 14 Oct 2021 13:39:52 +0000 (10:39 -0300)]
powerpc64: fix OFWFB with Radix MMU
Current implementation of Radix MMU doesn't support mapping
arbitrary virtual addresses, such as the ones generated by
"direct mapping" I/O addresses. This caused the system to hang, when
early I/O addresses, such as those used by OpenFirmware Frame Buffer,
were remapped after the MMU was up.
To avoid having to modify mmu_radix_kenter_attr just to support this
use case, this change makes early I/O map use virtual addresses from
KVA area instead (similar to what mmu_radix_mapdev_attr does), as
these can be safely remapped later.
Reviewed by: alfredo (earlier version), jhibbits (in irc)
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31232
5fcdc19a8111 didn't fully resolve the issue. There remains a report
that an ifconfig wlan0 up by itself is insufficient. Ifconfig down
must precede it.
Reported by: Filipe da Silva Santos <contact _ shiori_com_br>
Fixes: 5fcdc19a8111
Some installations may experience CTRL-EVENT-SCAN-FAILED when
associating to an AP. Installations that specify
ifconfig_wlan0="WPA ... up" in rc.conf do not experience
the problem whereas those which specify ifconfig_wlan0="WPA" without
the "up" will experience CTRL-EVENT-SCAN_FAILED.
However those that specify "up" in ifconfig_wlan0 will be able to
reproduce this problem by service netif stop wlan0;
service netif start wlan0. Interestingly The service netif stop/start
problem is reproducible on the older wpa 2.9 as well.
Reported by: dhw
Reported by: "Oleg V. Nauman" <oleg _ theweb_org_ua>
Reported by: Filipe da Silva Santos <contact _ shiori_com_br>
Reported by: Jakob Alvermark <jakob _ alvermark_net>
RSN Preauthentication allows a station autnetnicate to an AP that
it is not associated with yet while associated with a different AP.
This allows athentication to multiple APs simulteneously.
Ed Maste [Thu, 21 Oct 2021 15:09:58 +0000 (11:09 -0400)]
iscsid: set max_recv_data_segment_length to what we advertise
Previously we updated the conection's conn_max_recv_data_segment_length
only when we received a response containing MaxRecvDataSegmentLength
from the target. If the target did not send MaxRecvDataSegmentLength
then we left conn_max_recv_data_segment_length at the default (i.e.,
8192). A target could then send more data than that defult (up to our
advertised maximum), and we would drop the connection.
RFC 7143 specifies that MaxRecvDataSegmentLength is Declarative, not
negotiated. Just set conn_max_recv_data_segment_length to our
advertised value in login_negotiate().
PR: 259355
Reviewed by: mav
MFC after: 1 week
Fixes: a15fbc904a4d ("Alike to r312190 decouple iSCSI...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32605
Mark Johnston [Thu, 21 Oct 2021 15:46:25 +0000 (11:46 -0400)]
vm_page: Break reservations to handle noobj allocations
vm_reserv_reclaim_*() will release pages to the default freepool, not
the direct freepool from which noobj allocations are drawn. But if both
pools are empty, the noobj allocator variants must break reservations to
make progress.
Reported by: cy
Reviewed by: kib (previous version)
Fixes: b498f71bc56a ("vm_page: Add a new page allocator interface for unnamed pages")
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 20 Oct 2021 00:24:21 +0000 (20:24 -0400)]
Introduce vm_page_alloc_noobj_contig()
This is the same as vm_page_alloc_noobj(), but allocates physically
contiguous runs of memory. For now it is implemented in terms of
vm_page_alloc_contig(), with the difference that
vm_page_alloc_noobj_contig() implements VM_ALLOC_ZERO by zeroing the
page.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Mark Johnston [Wed, 20 Oct 2021 00:23:39 +0000 (20:23 -0400)]
Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Mark Johnston [Wed, 20 Oct 2021 00:22:12 +0000 (20:22 -0400)]
vm_page: Add a new page allocator interface for unnamed pages
The diff adds vm_page_alloc_noobj() and vm_page_alloc_noobj_domain().
These mostly correspond to vm_page_alloc() and vm_page_alloc_domain()
when no VM object is specified, with the exception that they handle
VM_ALLOC_ZERO by zeroing the page, rather than by preserving PG_ZERO.
This simplifies callers and will permit simplification of the
vm_page_alloc_domain() definition.
Since the new allocator variant is similar to vm_page_alloc_freelist(),
implement both of them using a common backend allocator function. No
functional change intended.
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Ryan Stone [Fri, 29 Jan 2021 21:13:57 +0000 (16:13 -0500)]
Add a VM flag to prevent reclaim on a failed contig allocation
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request. On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.
In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.
Mark Johnston [Wed, 20 Oct 2021 00:50:06 +0000 (20:50 -0400)]
vlapic: Schedule callouts on the local CPU
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0. On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.
Modify vlapic to schedule callouts on the local CPU instead. This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.
Reviewed by: grehan, jhb
Sponsored by: The FreeBSD Foundation