Mark Johnston [Wed, 3 Nov 2021 19:09:17 +0000 (15:09 -0400)]
scsi_cd: Improve TOC access validation
1. During CD probing, we read the TOC header to find the number of
entries, then read the TOC itself. The header determines the number
of entries, which determines the amount of data to read from the
device into the softc in the CD_STATE_MEDIA_TOC_FULL state. We
hard-code a limit of 99 tracks (plus one for the lead-out) in the
softc, but were not validating that the size reported by the media
would fit in this hard-coded limit. Kernel memory corruption could
occur if not.[1] Add validation to check this, and refuse to cache
the TOC if it would not fit.
2. The CDIOCPLAYTRACKS ioctl uses caller provided track numbers to index
into the TOC, but we only validate the starting index. Add
validation of the ending index.
Also, raise the hard-coded limit from 100 tracks to 170, per a
suggestion from Ken.
Reported by: C Turt <ecturt@gmail.com> [1]
Reviewed by: ken, avg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32803
Gleb Smirnoff [Thu, 28 Oct 2021 07:07:02 +0000 (00:07 -0700)]
udp_input: remove a BSD stack relict
I should had removed it 9 years ago in 8ad458a471ca. That commit
left save_ip as a write-only variable.
With save_ip removed we got one case when IP header can be modified:
the calculation of IP checksum with zeroed out header. This place
already has had a header saver char b[9]. However, the b[9] saver
didn't cover the ip_sum field, which we explicitly overwrite aliased
as (struct ipovly *)->ih_len. This was fine in cb34210012d4e, since
checksum doesn't need to be restored if packet is consumed. Now we
need to extend up to ip_sum field.
In collaboration with: ae
Differential revision: https://reviews.freebsd.org/D32719
Mark Johnston [Wed, 3 Nov 2021 16:28:48 +0000 (12:28 -0400)]
kasan: Disable validation of function parameters passed by value
It appears that the emitted code in the caller does not update shadow
state for values passed on the stack to the callee, which it seemingly
ought to do after pushing values on the stack and prior to the call
itself. This leaves open a window where an interrupt handler can cause
regions of the stack containing these values to be poisoned, resulting
in rare false positive reports. This happens particularly in the amd64
TLB invalidation code, where we liberally pass cpuset_t's around by
value.
LLVM has a flag to disable validation of accesses of function parameters
passed by value. Such validation is itself a relatively new feature.
Turn it off for now.
Reported by: pho, syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
ofwfb: fix vga/hdmi console with ASMEDIA during boot on powerpc64(le)
On recent OpenBMC firmware, the onboard ASMEDIA video card framebuffer
address was removed from device tree for security purposes (value is set
to zero to avoid leaking the address).
This patch works around the problem by taking framebuffer base address
from the "ranges" property of a parent node.
Reviewed by: luporl, jhibbits (on IRC)
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30626
fexecve(2): restore the attempts to calculate the executable path
vn_fullpath() call was not converted to pass newtextvp, instead it used
imgp->vp which is still NULL there. As result vn_fullpath() always
returned EINVAL and execpath was recorded from the value of arg0.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
This makes it consistent with other date(1) implementations. Also, it
feels more consistent since hours and minutes are already represented as
HH and MM respectively.
- Use Cm instead of Ar or Sq for command modifiers of the -v flag.
- Remove unnecessary "Ar ..." from the synopsis. It's not clear what it
was referring to.
- Add missing arguments to the -f and -v flags.
- Stylize the dot before "ss" with Cm in the default format in the -f
flag description.
- Set LC_ALL=C in the last example so that the output format of
date(1) always matches the specified format of the -f flag not matter
the locale.
- List the -f flag as optional in all usage lines in the synopsis.
Zhenlei Huang [Wed, 3 Nov 2021 11:46:48 +0000 (12:46 +0100)]
devfs.rules: Correctly unhide pf in vnet jails
Revision 9e9be081d8 introduced a new devfs rule devfsrules_jail_vnet. It
includes rule devfsrules_jail which include other rules. Unfortunately
devfs could not recursively parse the action include and thus
devfsrules_jail_vnet will expose all nodes.
ip_divert: calculate delayed checksum for IPv6 adress family
Before passing an IPv6 packet to application apply delayed checksum
calculation. Mbuf flags will be lost when divert listener will return a
packet back, so we will not be able to do delayed checksum calculation
later. Also an application will get a packet with correct checksum.
In preparation for machine-independent sys/compat/linux/linux_ptrace.c,
rename the i386-specific Linux ptrace(2) implementation. No functional
changes.
Kyle Evans [Thu, 28 Oct 2021 04:40:08 +0000 (23:40 -0500)]
kern: physmem: improve region coalescing logic
The existing logic didn't take into account newly inserted mappings
wholly contained by an existing region (or vice versa), nor did it
account for weird overlap scenarios. The latter is probably unlikely
to happen, but the former may happen in UEFI: BootServicesData allocated
within a large chunk of ConventionalMemory. This situation blows up vm
initialization.
While we're here, remove the "exact match" logic as it's likely wrong;
if an exact match exists with conflicting flags, for instance, then we
should probably be doing something else. The new logic takes into
account exact matches as part of the overlapping efforts.
Rick Macklem [Wed, 3 Nov 2021 00:28:13 +0000 (17:28 -0700)]
nfscl: Check for a forced dismount in nfscl_getref()
The nfscl_getref() function is called within nfscl_doiods() when
the NFSv4.1/4.2 pNFS client is doing I/O on a DS. As such,
nfscl_getref() needs to check for a forced dismount.
This patch adds that check.
Found during a recent IETF NFSv4 working group testing event.
Michal Meloun [Sun, 17 Oct 2021 17:36:33 +0000 (19:36 +0200)]
arm: Fix handling of undefined instruction aborts in THUMB2 mode.
Correctly recognize NEON/SIMD and VFP instructions in THUMB2 mode and pass
these to the appropriate handler. Note that it is not necessary to filter
all undefined instruction variant or register combinations, this is a job
for given handler.
Bjoern A. Zeeb [Sat, 9 Oct 2021 14:09:04 +0000 (16:09 +0200)]
if_epair: rework
Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.
Rick Macklem [Tue, 2 Nov 2021 00:21:31 +0000 (17:21 -0700)]
nfscl: Use a smaller initial delay time for NFSERR_DELAY
For NFS RPCs that receive a NFSERR_DELAY reply, the delay time
is initially 1sec and then increases exponentially to NFS_TRYLATERDEL.
It was found that this delay time is excessive for some NFSv4
servers, which work well with a 1msec delay.
A 1sec delay resulted in very slow performance for Remove and
Rename when delegations and pNFS were enabled.
This patch decreases the initial delay time to 1msec.
Found during a recent IETF NFSv4 working group testing event.
Ed Maste [Wed, 27 Oct 2021 20:45:35 +0000 (16:45 -0400)]
ssh: disble internal security key support in generated config.h
We want to set ENABLE_SK_INTERNAL only when building with USB support.
We'll leave it off in config.h and enble it via our bespoke build's
Makefile.inc.
John Baldwin [Mon, 1 Nov 2021 18:28:10 +0000 (11:28 -0700)]
ktls: Add simple transmit tests of kernel TLS.
Note that these tests test the kernel TLS functionality directly.
Rather than using OpenSSL to perform negotiation and generate keys,
these tests generate random keys send data over a pair of TCP sockets
manually decrypting the TLS records generated by the kernel.
Marius Halden [Sun, 31 Oct 2021 20:18:42 +0000 (21:18 +0100)]
carp: deal with negative net.inet.carp.demotion
Given nodes 1 and 2, where node 1 has an advskew of 0 and node 2 has an
advskew of 100, making them master and backup respectively.
If net.inet.carp.demotion is set to a negative value on node 1, node 2
might become master while node 1 still retains it master status. Wether
or not node 2 becomes master seems to depend on the nodes advskew and
what the demotion sysctl was set to on node 1.
The reason for node 2 becoming master seems to be that the calculated
advskew taking demotion into account is truncated to a single unsigned
byte when copied into the carp header for sending, and node 1 stays
master since it takes uses the whole non-truncated calculated advskew
when deciding wether to stay master.
Mark Johnston [Mon, 1 Nov 2021 13:27:52 +0000 (09:27 -0400)]
uma: Fix handling of reserves in zone_import()
Kegs with no items reserved have uk_reserve = 0. So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted. Then, rather than go and allocate a fresh slab, we return to
the cache layer.
The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first. Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.
Fixes: 1b2dcc8c54a8 ("uma: Avoid depleting keg reserves when filling a bucket")
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32516
Mark Johnston [Mon, 1 Nov 2021 13:27:35 +0000 (09:27 -0400)]
uma: Improve M_USE_RESERVE handling in keg_fetch_slab()
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled. For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.
For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation. uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.
So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain. Specifically,
fall back to a round-robin slab allocation. This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]
Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful. In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.
Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again. Call vm_wait_domain() before retrying.
Reported by: mjg, tuexen [1]
Reviewed by: alc
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32515
Andrew Turner [Mon, 1 Nov 2021 11:19:57 +0000 (11:19 +0000)]
Print the correct register for the arm64 elr
In 7ec86b6609912 ("Also print symbols when printing arm64 registers")
a new function was created to print most registers. Unfortunately the
Link Register (LR) was being printed when we should have printed the
Exception Link Register (ELR).
Steve Kargl [Sun, 31 Oct 2021 22:26:20 +0000 (00:26 +0200)]
sinpi[fl] etc: Fix the ld128 implementations
Mark Murray graciously provided access to an aarch64 system
to test the ld128 implementations. This patch address
* Misuses of copysignl() in sinpil() and tanpil().
* Redo the splitting of argument 'x' into an integer part and
remainder. The remainder must satify 0 <= r < 1.
* Update the reduction of the integer part to something that can
easily be seen as even or odd, e.g., sin(pi*x) = (-1)^n*sin(pi*r)
with n <= 2^112 and we an reduce n by subtracting integer powers
of 2.
* In s_cospil.c, fix typos where 'x' is used where 'ax', the
remainder, is required.
* In tanpil(), fix the use of an uninitialized variable, ax = fabsl(ax),
ax should be x in fabsl().
One item of note, in the limited tested on aarch64, the max ULP
for sinpil() and cospil() were less than 1.1 ULP, which is higher
that the desired max ULP less than 1. This was traced to the
kernel for cosl() in the fundamental interval [0,pi/4].
The coefficients in the minmax polynomial likely need refinement.
Rick Macklem [Sun, 31 Oct 2021 23:31:31 +0000 (16:31 -0700)]
nfscl: Do pNFS layout return_on_close synchronously
For pNFS servers that specify that Layouts are to be returned
upon close, they may expect that LayoutReturn to happen before
the associated Close.
This patch modifies the NFSv4.1/4.2 pNFS client so that this
is done. This only affects a pNFS mount against a non-FreeBSD
NFSv4.1/4.2 server that specifies return_on_close in LayoutGet
replies.
Found during a recent IETF NFSv4 working group testing event.
Rick Macklem [Sun, 31 Oct 2021 00:08:28 +0000 (17:08 -0700)]
nfscl: Add setting n_localmodtime to the Write RPC code
Similar to commit 2be417843a, I believe there could be a race between
the NFS client VOP_LOOKUP() and file Writing that could result in stale
file attributes being loaded into the NFS vnode by VOP_LOOKUP().
I have not been able to reproduce a failure due to this race, but
I believe that there are two possibilities:
The Lookup RPC happens while VOP_WRITE() is being executed and loads
stale file attributes after VOP_WRITE() returns when it has already
completed the Write/Commit RPC(s).
--> For this case, setting the local modify timestamp at the end of
VOP_WRITE() should ensure that stale file attributes are not loaded.
The Lookup RPC occurs after VOP_WRITE() has returned, while
asynchronous Write/Commit RPCs are in progress and then is
blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which
will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the
NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP()
then acquires the NFS vnode lock and fills in stale file attributes.
--> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf()
when they clear NMODIFIED should ensure that stale file attributes
are not loaded.
Rick Macklem [Sat, 30 Oct 2021 23:46:14 +0000 (16:46 -0700)]
nfscl: Set n_localmodtime in Deallocate
Commit 2be417843a04 added n_localmodtime, which is used by Lookup
and ReaddirPlus to check to see if the file attributes in an RPC
reply might be stale. This patch sets n_localmodtime in Deallocate.
Done as a separate commit, since Deallocate is not in stable/13.
Rick Macklem [Sat, 30 Oct 2021 23:35:02 +0000 (16:35 -0700)]
PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.
The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).
Here's what seems to happen:
Child Parent
does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time
blocks waiting for locked file does ftruncate(), which
vnode does VOP_SETATTR() of Size,
changing the file's size
while holding an exclusive
lock on the file's vnode
releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
does a read() which returns
wrong data size
This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.
With this patch the test program works as expected.
Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code. I suspect a
fairly recent change to the name caching code exposed this
bug.
linux: Add additional ptracestop only if the debugger is Linux
In 6e66030c4c0, additional ptracestop was added in order
to implement PTRACE_EVENT_EXEC. Make it only apply to cases
where the debugger is a Linux processes; native FreeBSD
debuggers can trace Linux processes too, but they don't
expect that additonal ptracestop.
Rick Macklem [Sat, 30 Oct 2021 03:35:02 +0000 (20:35 -0700)]
nfscl: Use NFSMNTP_DELEGISSUED in two more functions
Commit 5e5ca4c8fc53 added a NFSMNTP_DELEGISSUED flag to indicate when
a delegation has been issued to the mount. For the common case
where an NFSv4 server is not issuing delegations, this flag
can be checked to avoid acquisition of the NFSCLSTATEMUTEX.
This patch adds checks for NFSMNTP_DELEGISSUED being set
to two more functions.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.
Randall Stewart [Fri, 29 Oct 2021 21:37:49 +0000 (17:37 -0400)]
tcp: Rack might retransmit forever.
If we get a Sacked peer with an MTU change we can retransmit forever if the
last bytes are sacked and the client goes away (think power off). Then we
never see the end condition and continually retransmit.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32671
Ed Maste [Fri, 29 Oct 2021 00:49:12 +0000 (20:49 -0400)]
Don't build sanitizer runtimes under WITHOUT_CXX
In the past we built the sanitizer runtimes when building Clang
(and using Clang as the compiler) but 7676b388adbc changed this to
be conditional only on using Clang, to make the runtimes available
for external Clang.
They fail to build when WITHOUT_CXX is set though, so add MK_CXX
as part of the condition.
Reported by: Michael Dexter, Build Option Survey
Reviewed by: imp, jrtc27
Fixes: 7676b388adbc ("Always build the sanitizer runtimes...")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32731
The pic_* interface was used.
Only edge interrupts are supported by this controller.
Driver mutex had to be converted to a spin lock so that it can
be used in the interrupt filter context.
Two types of intr_map_data are supported - INTR_MAP_DATA_GPIO and
INTR_MAP_DATA_FDT. This way interrupts can be allocated using the
userspace gpio interrupt allocation method, as well as directly from
simplebus. The latter can be used by devices that have its irq routed
to a GPIO pin.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32587