David Chisnall [Sat, 10 Jul 2021 16:19:52 +0000 (17:19 +0100)]
Pass the syscall number to capsicum permission-denied signals
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Randall Stewart [Fri, 16 Jul 2021 10:07:13 +0000 (06:07 -0400)]
tcp: Lro needs to validate that it does not go beyond the end of the mbuf as it parses.
Currently the LRO parser, if given a packet that say has ETH+IP header but the TCP header
is in the next mbuf (split), would walk garbage. Lets make sure we keep track as we
parse of the length and return NULL anytime we exceed the length of the mbuf.
Kevin Bowling [Fri, 16 Jul 2021 06:50:14 +0000 (23:50 -0700)]
ixgbe: Print FW NVM and Option ROM versions
It can be useful for system operators to see this kind of information
when correlating issues or requesting support from the OEM or Intel for
hardware and firmware issues.
Dave Fullard [Fri, 16 Jul 2021 04:02:48 +0000 (23:02 -0500)]
freebsd-update: create a ZFS boot environment on install
Updated freebsd-update to allow it to create boot environments using
bectl should the system support it. The bectl utility was updated in
r352211 (490e13c1403f) to support a 'check' to determine if the system
supports boot environments. If UFS is used, the bectl check will fail
then no attempt will be made to create the boot environment.
If freebsd-update is run inside a jail, no attempt will be made to
create a boot environment.
The boot environment function will create a new environment using the
format: current FreeBSD kernel version and date/timestamp, example:
12.0-RELEASE-p10_2019-10-03_185233
This functionality can be disabled by setting 'CreateBootEnv' in
freebsd-update.conf to 'no'.
Mark Johnston [Fri, 16 Jul 2021 02:38:46 +0000 (22:38 -0400)]
lio_listio: Don't post a completion notification if none was requested
One is allowed to use LIO_NOWAIT without specifying a sigevent. In this
case, lj->lioj_signal is left uninitialized, but several code paths
examine liov_signal.sigev_notify to figure out which notification to
post. Unconditionally initialize that field to SIGEV_NONE.
Add a dumb test case which triggers the bug.
Reported by: KMSAN+syzkaller
Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31197
Mark Johnston [Fri, 16 Jul 2021 02:35:28 +0000 (22:35 -0400)]
libc: Use the initial-exec TLS model
This permits more efficient accesses of thread-local variables, which
are heavily used at least by jemalloc and locale-aware code. Note that
on amd64 and i386, jemalloc's thread-local variables already have their
TLS model overridden by defining JEMALLOC_TLS_MODEL.
For now the change is applied only to tested platforms, but should in
principle be enabled everywhere.
PR: 255840
Suggested by: jrtc27
Reviewed by: kib
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31070
Mark Johnston [Fri, 16 Jul 2021 02:26:25 +0000 (22:26 -0400)]
rtld/arm64: Remove checks for undefined symbols when processing TPREL64
lld emits several GOT relocations referencing the null sumbol in libc.so
when compiled with -ftls-model=initial-exec. This symbol is specified
to be undefined.
We generally do not handle dynamic TLS relocations against weak,
undefined symbols, so avoid printing a warning here. This makes it
possible to compile libc.so using the initial-exec TLS model on arm64.
Reviewed by: jrtc27, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31069
This fixes a kernel panic when probing for vmd_bus on Intel TigerLake on
14-CURRENT. Apparently, vmd_bus is a type of PCI bus, but was registered
as a separate device class.
Warner Losh [Fri, 16 Jul 2021 00:30:53 +0000 (18:30 -0600)]
UPDATING: Not unusual side effect of the awk bug fixed in d4d252c49976
You might not be able to build the kernel if you have an awk between
Jul 7th and today. It does not affect all platforms due to the nature
of the bug (so amd64 is unaffected in stable/13 or current, but
is affected in stable/12. i386 seems to be affected everywhere).
Warner Losh [Thu, 15 Jul 2021 22:46:06 +0000 (16:46 -0600)]
awk: revert upstream's attempt to disallow hex strings
Upstream one-true-awk decided to disallow hex strings as numbers. This
is in line with awk's behavior prior to C99, and allowed by the POSIX
standard. The standard, however, allows them to be treated as numbers
because that's what the standard said in the 2001 through 2004 editions.
Since 2001, the nawk in FreeBSD has treated them as numbers, so restore
that behavior, allowed by the standard.
A number of scripts in the FreeBSD tree depend on this interpretation,
including scripts to build the kernel which had mysteriously started
failing for some people and not others. By re-allowing 0x hex numbers,
this fixes those scripts and restores POLA.
Upstream issue: https://github.com/onetrueawk/awk/issues/126
Sponsored by: Netflix
Reviewed by: kevans
MFC After: asap due to regression alrady merged to stable
Differential Revision: https://reviews.freebsd.org/D31199
Warner Losh [Thu, 15 Jul 2021 22:17:23 +0000 (16:17 -0600)]
nvme: Enable interrupts after qpair fully constructed
To guard against the ill effects of a spurious interrupt during
construction (or one that was bogusly pending), enable interrupts after
the qpair is completely constructed. Otherwise, we can die with null
pointer dereferences in nvme_qpair_process_completions. This has been
observed in at least one pre-release NVMe drive where the MSIX interrupt
fired while the queue was being created, before we'd started the NVMe
controller card.
The alternative of only turning on the interrupts after the rest was
tried, but was insufficient to work around this bug and made the code
more complicated w/o benefit.
nanobsd: Use gpart and create code image before full disk image
The attached patch brings two main changes to the nanobsd script:
1- gpart is used instead of fdisk;
2- the code image is created first, and then used to ``assemble'' the
full disk image.
The patch was first proposed on the freebsd-embedded list:
http://lists.freebsd.org/pipermail/freebsd-embedded/2012-June/001580.html
and is currently under discussion:
http://lists.freebsd.org/pipermail/freebsd-embedded/2014-January/002216.html
Another effect is that the -f option ("suppress code slice extraction")
now imples the -i option ("suppress disk image build").
imp@ applied Patch by hand to new legacy.sh, plus tweaked for NANO_LOG vs
NANO_OBJ confusion in original.
Mark Johnston [Thu, 15 Jul 2021 16:23:04 +0000 (12:23 -0400)]
eli: Zero pad bytes that arise when certain auth algorithms are used
When authentication is configured, GELI ensures that the amount of data
per sector is a multiple of 16 bytes. This is done in
eli_metadata_softc(). When the digest size is not a multiple of 16
bytes, this leaves some extra pad bytes at the end of every sector, and
they were not being zeroed before being written to disk. In particular,
this happens with the HMAC/SHA1, HMAC/RIPEMD160 and HMAC/SHA384 data
authentication algorithms.
This change ensures that they are zeroed before being written to disk.
Reported by: KMSAN
Reviewed by: delphij, asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31170
Mark Johnston [Thu, 15 Jul 2021 16:18:17 +0000 (12:18 -0400)]
nfsclient: Avoid copying uninitialized bytes into statfs
hst will be nul-terminated but the remaining space in the buffer is left
uninitialized. Avoid copying the entire buffer to ensure that
uninitialized bytes are not leaked via statfs(2).
Reported by: KMSAN
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31167
This can be used for variables which are only used with either
INVARIANTS or WITNESS. Without any annotation they run into dead store
warnings from cc --analyze and always annotating with __unused may hide
bad vars when it should not.
Send a zero-length-packet first when opening a BULK endpoint for USB serial
port devices. If it gets eaten it is fine. Many USB device side implementations
don't properly support the clear endpoint halt command and if they do, data is lost
because the transmit FIFO is typically reset when this command is received.
Andrew Turner [Thu, 8 Jul 2021 12:14:56 +0000 (12:14 +0000)]
Update the SCTLR_EL1 register definitions
They are valid as of the ARMv8.7 XML.
While here remove SCTLR_RES0 as it's unused and depends on which CPU
the kernel is running on and switch to shifted values as they are
easier to compare with the documentation.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31120
Warner Losh [Thu, 15 Jul 2021 03:06:08 +0000 (21:06 -0600)]
loader: make sure CPUTYPE is ignored when building
CPUTYPE?=native causes -march=native to be added to the command
line. When the host machine is haswell, this causes some versions of
clang to generate code that can't execute in the efi boot loader
environment. Set _CPUCFLAGS= to undo what's done bsd.cpu.mk. bsd.cpu.mk
is included too early to control with NO_CPU_CFLAGS here. The only other
option is to put that in all the Makefiles, and this is less tedious and
error prone.
Warner Losh [Wed, 14 Jul 2021 22:43:25 +0000 (16:43 -0600)]
loader: Create loader_simp(8) to document simple version of loader
loader_simp is a much simplified version of loader that will process a
linear sequence of commands from loader.rc. It has neither Forth nor Lua
built in and is much smaller. Document it. This is largely copied from
loader.8 since it implements those built-in commands. Future revisions
will fix this duplication.
Alfonso Gregory [Wed, 14 Jul 2021 21:48:35 +0000 (15:48 -0600)]
Remove incorrect __restricted labels from strcspn
strcspn should never have had the __restrict keywords. While both of
these strings are const, it may have unindended side effects. While this
is the kernel, the POSIX definition also omits restrict.
Rick Macklem [Wed, 14 Jul 2021 20:33:37 +0000 (13:33 -0700)]
nfscl: Avoid KASSERT() panic in cache_enter_time()
Commit 844aa31c6d87 added cache_enter_time_flags(), specifically
so that the NFS client could specify that cache enter replace
any stale entry for the same name. Doing so avoids a KASSERT()
panic() in cache_enter_time(), as reported by the PR.
This patch uses cache_enter_time_flags() for Readdirplus, to
avoid the panic(), since it is impossible for the NFS client
to know if another client (or a local process on the NFS server)
has replaced a file with another file of the same name.
This patch only affects NFS mounts that use the "rdirplus"
mount option.
There may be other places in the NFS client where this needs
to be done, but no panic() has been observed during testing.
Alan Cox [Mon, 12 Jul 2021 23:25:37 +0000 (18:25 -0500)]
pmap: Micro-optimize pmap_remove_pages() on amd64 and arm64
Reduce the live ranges for three variables so that they do not span the
call to PHYS_TO_VM_PAGE(). This enables the compiler to generate
slightly smaller machine code.
Mark Johnston [Tue, 13 Jul 2021 21:49:39 +0000 (17:49 -0400)]
uart: Fix an out-of-bounds read in ns8250_bus_probe()
The problem is that ns8250_bus_probe() accesses a field from the
ns8250_softc, which embeds the generic UART softc, but the ns8250_softc
hasn't yet been allocated because we're still probing.
This is a regression from commit 0aefb0a63c50. This fixed a problem
where one of the upper four IER bits, which are usually reserved, needs
to be set in order to get RX interrupts before the RX FIFO is full. At
the same time, we avoid clearing those reserved bits (see commit 58957d87173, though other UART drivers I looked at do not bother with
this).
So, copy what ns8250_init() does to disable interrupts, since we don't
know what the "right" mask is at this point.
Reported by: syzbot+f256beefd0df9eb796e7@syzkaller.appspotmail.com
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31124
Mark Johnston [Tue, 13 Jul 2021 21:47:27 +0000 (17:47 -0400)]
blist: Correct the node count computed in blist_create()
Commit bb4a27f927a1 added the ability to allocate a span of blocks
crossing a meta node boundary. To ensure that blst_next_leaf_alloc()
does not walk past the end of the tree, an extra all-zero meta node
needs to be present at the end of the allocation, and
blst_next_leaf_alloc() is implemented such that the presence of this
node terminates the search.
blist_create() computes the number of nodes required. It had two
problems:
1. When the size of the blist is a power of BLIST_RADIX, we would
unnecessarily allocate an extra level in the tree.
2. When the size of the blist is a multiple of BLIST_RADIX, we would
fail to allocate a terminator node. In this case,
blst_next_leaf_alloc() could scan beyond the bounds of the
allocation. This was found using KASAN.
Modify blist_create() to handle these cases correctly.
Mark Johnston [Tue, 13 Jul 2021 21:45:59 +0000 (17:45 -0400)]
gconcat: Zero the metadata block before writing
Ensure that string buffers and pad bytes are zero-filled before writing
gconcat metadata. Also make sure to zero the full block buffer before
encoding the metadata and writing.
Fix some style bugs in g_concat_write_metadata() while here.
Reported by: KMSAN
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 13 Jul 2021 21:45:57 +0000 (17:45 -0400)]
gmirror: Zero the metadata block before writing
The mirror metadata fields contain string buffers and pad bytes, neither
were being zeroed before metadata was written to disk. Also, the
metadata structure is smaller than the sector size, and in one case
gmirror was failing to zero-fill the full buffer before writing.
Fix these problems by pre-zeroing the metadata structure and the sector
buffer.
Reported by: KMSAN
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Mark Johnston [Tue, 13 Jul 2021 21:45:49 +0000 (17:45 -0400)]
fifo: Explicitly initialize generation numbers when opening
The fi_rgen and fi_wgen fields are generation numbers used when sleeping
waiting for the other end of the fifo to be opened. The fields were not
explicitly initialized after allocation, but this was harmless. To
avoid false positives from KMSAN, though, ensure that they get
initialized to zero.
Reported by: KMSAN
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
These issues have low impact because they require precise circumstances
to trigger one of them. The disk must be > 2 TiB in size and either:
- The primary GPT header is dammaged.
- The freebsd-boot partiton is located farther than the first 2 TiB of
the disc and one of its sectors takes place at a lba value that makes
the higher 32 bits of this very value change.
Errors and corrections folow:
- decl and incl don't affect CF, so replace with subl/addl $1
- repe uses %cx, so move size to it with movw
- moving a 64-bit value with %cx of 2 (should be 4) so addresses
> 2TB will work.
PR: 233180
Reviewed by: imp@ (applied patch using description in bug)
Differential Revision: https://reviews.freebsd.org/D31100
Warner Losh [Tue, 13 Jul 2021 06:00:33 +0000 (00:00 -0600)]
cam_iosched: use tunable flag and make a bool really a bool
kern.cam.do_dynamic_iosched is really a bool, so change its type to
bool. While I'm here, also use the CTLFLAG_TUN flag instead of a
separate tunable line for it and kern.cam.iosched_alpha_bits.
Young Xiao [Tue, 21 May 2019 07:36:29 +0000 (15:36 +0800)]
Fix potential NULL pointer dereference of device physical path
In ata_dev_advinfo() and nvme_dev_advinfo(), if the physical path is
being stored and there is a malloc failure (malloc(9) is called with
M_NOWAIT), we could wind up in a situation where the device's
physpath_len is set to the length the user provided, but the physpath
itself is NULL.
If another context then comes in to fetch the physical path value, we
would wind up trying to memcpy a NULL pointer into the caller's buffer.
So, set the physpath_len to 0 when we free the physpath on entry into
the store case for the physical path. Reset the length to a non-zero
value only after we've successfully malloced a buffer to hold it.
This code mirrors scsi_xpt.c does already as well.
Signed-off-by: Young Xiao <92siuyang@gmail.com>
Reviewed by: imp
PR: 238014
Ka Ho Ng [Tue, 13 Jul 2021 17:53:10 +0000 (01:53 +0800)]
vmm: Fix AMD-vi using wrong rid range
The ACPI parsing code around rid range was wrong on assuming there is
only one pair of start/end device id range. Besides, ivhd_dev_parse()
never work as supposed. The start/end rid info was always zero.
Restructure the code to build dynamic-sized tables for each IOMMU softc
holding device entries. The device entries are enumerated to find a
suitable IOMMU unit. Operations on devices not governed (e.g. the IOMMU
unit itself) are no-op from now on. There are also a minor fix on wrong
%b formatting string usage.
Tested on my EPYC 7282.
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D30827
Randall Stewart [Tue, 13 Jul 2021 16:45:15 +0000 (12:45 -0400)]
tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into
LRO without making sure that the checksum passes. Some drivers actually are
aware of this and do not call lro when the csum failed, others do not do this and
thus could end up sending data up that we think has a checksum passing when
it does not.
This change will fix that situation by properly verifying that the mbuf
has the correct markings (CSUM VALID bits as well as csum in mbuf header
is set to 0xffff).
Brian Behlendorf [Mon, 12 Jul 2021 20:05:50 +0000 (13:05 -0700)]
Update bug report template
- Remove the "SPL Version" line, the repositories have been merged
since the 0.8 release and we no longer need to ask about this.
- Simply ask for the kernel version / patch level and add a hint
about how to get this information on Linux and FreeBSD.
- Remove "Status: Triage Needed" from the template, in practice
we really haven't been using this label so let's step setting it.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #12340
Fix bsd.subdir.mk-related issues after 0a0f7486413c
Since bsd.prog.mk includes bsd.obj.mk, and thus bsd.subdir.mk, we must
ensure all our bsd.subdir.mk-affecting variables are set before
including bsd.prog.mk. Since sbin's various Makefile.arch files add to
SUBDIR this results in those not taking effect, and presumably we also
end up not having buildworld as parallel as it should be due to the fact
that SUBDIR_PARALLEL was not being set before including bsd.prog.mk.
mmc_cam_sim_default_action: do not touch the ccb after dispatching it
If MMC_SIM_CAM_REQUEST() is successful the ccb could be running or being
completed as the method returns. Modifying the ccb status could override
whatever status was already set by a MMC driver.
I am not sure what was the purpose of setting the status to CAM_REQ_INVALID
in the success path. I assume that it was to catch a possibility that the
ccb could be completed without its status explicitly set. So, I am keeping
the code, it's just moved to before the MMC_SIM_CAM_REQUEST call.
Without this change I was getting random and phantom EIO errors on Rock64
running off an SD card (dwmmc driver) plus occasional panics like:
Memory modified after free 0xffffa00003985800(2040) val=6 @ 0xffffa00003985854
panic: Most recently used by CAM CCB
ipoib: Fix for accessing uninitialized pointers and freed memory during attach and detach.
Call infiniband_ifdetach() early to stop ifioctl(9) calls from user-space
during device removal. Also make sure that ifioctl(9) calls are blocked from
executing until the device is fully initialized. Ideally we would delay the
infiniband_ifattach() call, but because part of the initialization is to update
the link level address, that is not possible without more significant changes.
mlx4: Map core_clock page to user space only when allowed
Currently when we map the hca_core_clock page to the user space,
there are vulnerable registers, one of which is semaphore, on
this page as well. If user read the wrong offset, it can modify the
above semaphore and hang the device.
Hence, mapping the hca_core_clock page to the user space only when
user required it specifically.
After this patch, mlx4 core_clock won't be mapped to user space by
default. Oppose to current state, where mlx4 core_clock is always mapped
to user space.
mlx4ib and mlx5ib: Set slid to zero in Ethernet completion struct
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.