Justin Hibbits [Sat, 3 Oct 2020 02:26:38 +0000 (02:26 +0000)]
MFC r366162,r366169,r366188
Fix compat32 on mips64:
* Elf32_Auxinfo is broken, using pointers in the union, which are 64-bits not
32.
* freebsd32_sysarch() doesn't update the 'user local' register when handling
MIPS_SET_TLS, leading to a NULL pointer dereference in the 32-bit
application.
Alan Somers [Thu, 1 Oct 2020 19:55:52 +0000 (19:55 +0000)]
zfs: fix "zfs receive" of interrupted stream without "-F" after r366180
The OpenZFS test suite revealed a regression with the changes made by
r366180 and r364974: "zfs recv" resuming and interrupted stream without -F
would refuse to run, complaining that "dataset exists".
Direct commit to stable/12 because head has moved to the OpenZFS code base.
Michael Tuexen [Thu, 1 Oct 2020 16:53:16 +0000 (16:53 +0000)]
MFC r366248:
Improve the input validation and processing of cookies.
This avoids setting the association in an inconsistent
state, which could result in a use-after-free situation.
This can be triggered by a malicious peer, if the peer
can modify the cookie without the local endpoint recognizing
it.
Thanks to Ned Williamson for reporting the issue.
Michael Tuexen [Thu, 1 Oct 2020 16:23:35 +0000 (16:23 +0000)]
MFC r366198:
Improve the handling of receiving unordered and unreliable user
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.
Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.
John Baldwin [Wed, 30 Sep 2020 18:09:50 +0000 (18:09 +0000)]
MFC 366175: Revert most of r360179.
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers. In
practice this broke requests submitted from userland.
There are some tests available in the NetBSD test suite, but we don't
currently pass all of those; further investigation will go into that. For
now, just add a basic test as well as a test that copies from /dev/null to a
file.
The /dev/null test confirms that the file gets created if it's empty, then
that it truncates the file if it's non-empty. This matches some usage that
was previously employed in the build and was replaced in r366042 by a
simpler shell construct.
I will also plan on coming back to expand these in due time.
Note that this does not add any functionality as IFCAP_NOMAP doesn't
exist on 12. This is mainly an attempt to get t4_sge.c closer to what's
in head (to help with other MFCs).
r349533:
Add support for IFCAP_NOMAP to cxgbe(4).
Since cxgbe(4) uses sglist instead of bus_dma, this required updates
to the code that generates scatter/gather lists for packets. Also,
unmapped mbufs are always sent via DMA and never as immediate data in
the payload of a work request.
David Bright [Tue, 29 Sep 2020 16:38:56 +0000 (16:38 +0000)]
MFC r365948:
Honor the FWUG value of some drives in nvmecontrol
nvmecontrol tries to upload firmware in chunks as large as it thinks
the device permits. It fails to take into account the FWUG value used
by some drives to advertise the size and alignment limits for firmware
chunks.
- Use the firwmare update granularity value from the
identify-controller response to set the max transfer size.
- If the granularity is not reported or not restricted, fall back to
the previously existing logic that calculates the max transfer
size based on MDTS.
- Add firmware update granularity to the identify-controller output.
Rick Macklem [Tue, 29 Sep 2020 01:52:53 +0000 (01:52 +0000)]
MFC: r366189
Bjorn reported a problem where the Linux NFSv4.1 client is
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.
For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).
For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.
Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.
- Remove trailing whitespace
- Address igor and mandoc warnings
- Sort options
- Use macros consistently (e.g., Fl for flags, Dq for quoting, Bd for code
blocks)
- Add a history section
- Fix incorrect use of macros in various places
Ed Maste [Sun, 27 Sep 2020 22:48:43 +0000 (22:48 +0000)]
MFC r356615: src.opts.mk: force KERBEROS_SUPPORT off where KERBEROS forced off
Explicitly setting WITHOUT_KERBEROS implies WITHOUT_KERBEROS_SUPPORT,
but previously other cases that forced KERBEROS off (such as
WITHOUT_CRYPT) did not also set KERBEROS_SUPPORT off. Because the
_SUPPORT dependent options (KERBEROS/KERBEROS_SUPPORT) are processed
before other dependencies (CRYPT/KERBEROS) it's not easy to make this
happen automatically. Instead just explicitly set KERBEROS_SUPPORT
off where we set KERBEROS off.
Alan Somers [Sun, 27 Sep 2020 02:59:28 +0000 (02:59 +0000)]
MFC r366121:
fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".
However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.
Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.
MFC r365918:
Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).
Alan Somers [Sat, 26 Sep 2020 02:50:28 +0000 (02:50 +0000)]
zfs: Fix resuming receive stream to dataset with mounted clone
My fix for bug 248606 (zfs receive: Input/output error accessing dataset
after resuming interrupted receive), r364412, introduced a regression:
attempting to resume a receive into a dataset with a mounted clone would
fail if that clone were in-use. This change reverts r364412 and fixes it in
a better way.
Background:
When ZFS receives a stream, it may decide to unmount and remount the
destination and all of its children. However, ever since resumable
send/receive was implemented, ZFS has skipped the unmount/remount step when
resuming a stream. I don't know why.
That let to bug 248606. When resuming the stream, ZFS didn't unmount and
remount the destination, leaving a destroyed dataset mounted.
My original fix was to always unmount and remount when resuming a receive,
but that caused other problems, like bug 249579. A better solution is to
unmount and remount when resuming a receive of a stream that would've
unmounted and remounted when it was new.
Direct commit to stable/12 because head has moved to OpenZFS. The bug
exists there, too, but a change to the OpenZFS code can't be merged to the
old ZFS code.
Colin Percival [Sat, 26 Sep 2020 00:58:27 +0000 (00:58 +0000)]
MFC r360483,360484: Make nvmecontrol work with nda like it does with
nvd, and associated bits.
This commit changes the size of 'struct ccb_pathinq_settings_nvme', which
would normally risk breaking kernel ABI; however, that structure is only
ever used as part of a union with larger structures -- so nothing really
changes size.
r360483:
Return the nvmeX device associated with the ndaX device.
Add the nvmeX device to the XPT_PATH_INQ nvme specific
information. while one could figure this out by looking up the
domain:bus:slot:function, it's a lot easier to have the SIM set it
directly since the sim knows this.
r360484:
Implement the NVME_GET_NSID and NVME_PASSTHROUGH_CMD ioctls
With these two ioctls implemented in the nda driver, nvmecontrol now
works with nda just like it does with nvd. It eliminates the need to
jump through odd hoops to get this data.
MFC r365547: Add -z "TOS" option to ping6, to test DSCP/ECN values
ping has the option to add the (deprecated) TOS byte
using the -z option. Adding the same option, with the
same (deprecated) Traffic Class Byte (nowadays actually
DSCP and ECN fields) to ping6 to validate proper QoS
processing in network switches.
During the DCTCP improvements, use of CCF_DELACK was
removed. This change is just to rename the unused flag
bit to prevent use of it, without also re-implementing
the tcp_input and tcp_output interfaces.
Rick Macklem [Thu, 24 Sep 2020 15:34:47 +0000 (15:34 +0000)]
MFC: r365895
Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence(). This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_bindconnsess() down a few lines to below where the mutexes are released.
John Baldwin [Wed, 23 Sep 2020 21:57:29 +0000 (21:57 +0000)]
MFC 365278: Don't assume objects in program sections have a size of a pointer.
The size of the object at 'addr' is unknown and might be smaller than
the size of a pointer (e.g. some x86 instructions are smaller than a
pointer). Instead, just check that the address is in the bounds of
the program header.
r347996:
Replace uses of `foo.(de|en)code('hex')` with `binascii.(un)?hexlify(foo)`
Python 3 no longer doesn't support encoding/decoding hexadecimal numbers using
the `str.format` method. The backwards compatible new method (using the
binascii module/methods) is a comparable means of converting to/from
hexadecimal format.
In short, the functional change is the following:
* `foo.decode('hex')` -> `binascii.unhexlify(foo)`
* `foo.encode('hex')` -> `binascii.hexlify(foo)`
While here, move the dpkt import in `cryptodev.py` down per PEP8, so it comes
after the standard library provided imports.
PR: 237403
r348024:
Followup to r347996
Replace uses of `foo.encode("hex")` with `binascii.hexlify(foo)` for forwards
compatibility between python 2.x and python 3.
PR: 237403
r348031:
Squash deprecation warning related to array.array(..).tostring()
In version 3.2+, `array.array(..).tostring()` was renamed to
`array.array(..).tobytes()`. Conditionally call `array.array(..).tobytes()` if
the python version is 3.2+.
PR: 237403
r348042:
Fix encoding issues with python 3
In python 3, the default encoding was switched from ascii character sets to
unicode character sets in order to support internationalization by default.
Some interfaces, like ioctls and packets, however, specify data in terms of
non-unicode encodings formats, either in host endian (`fcntl.ioctl`) or
network endian (`dpkt`) byte order/format.
This change alters assumptions made by previous code where it was all
data objects were assumed to be basestrings, when they should have been
treated as byte arrays. In order to achieve this the following are done:
* str objects with encodings needing to be encoded as ascii byte arrays are
done so via `.encode("ascii")`. In order for this to work on python 3 in a
type agnostic way (as it anecdotally varied depending on the caller), call
`.encode("ascii")` only on str objects with python 3 to cast them to ascii
byte arrays in a helper function name `str_to_ascii(..)`.
* `dpkt.Packet` objects needing to be passed in to `fcntl.ioctl(..)` are done
so by casting them to byte arrays via `bytes()`, which calls
`dpkt.Packet__str__` under the covers and does the necessary str to byte array
conversion needed for the `dpkt` APIs and `struct` module.
In order to accomodate this change, apply the necessary typecasting for the
byte array literal in order to search `fop.name` for nul bytes.
This resolves all remaining python 2.x and python 3.x compatibility issues on
amd64. More work needs to be done for the tests to function with i386, in
general (this is a legacy issue).
Rick Macklem [Wed, 23 Sep 2020 01:49:50 +0000 (01:49 +0000)]
MFC: r365789
Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures.
The code in nfsrv_checksequence() would call SVC_RELEASE() with the mutex
held. Normally this is ok, since all that happens is SVC_RELEASE()
decrements a reference count. However, if the socket has just been shut
down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep
lock during destruction of the server side krpc structure.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.
The pidx argument of isc_rxd_flush() indicates which is the last valid
receive descriptor to be used by the NIC. However, current code has
multiple issues:
- Intel drivers write pidx to their RDT register, which means that
NICs will only use the descriptors up to pidx-1 (modulo ring size N),
and won't actually use the one pointed by pidx. This does not break
reception, but it is anyway confusing and suboptimal (the NIC will
actually see only N-2 descriptors as available, rather than N-1).
Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic.
- The semantic used by Intel (RDT is one descriptor past the last
valid one) is used by most (if not all) NICs, and it is also used
on the TX side (also in iflib). Since iflib is not currently
using this semantic for RX, it must decrement fl->ifl_pidx
(modulo N) before calling isc_rxd_flush(), and then the
per-driver callback implementation must increment the index
again (to match the real semantic). This is confusing and suboptimal.
- The iflib refill function is also called at initialization.
However, in case the ring size is smaller than 128 (e.g. if_mgb),
the refill function will actually prepare all the receive
descriptors (N), without leaving one unused, as most of NICs assume
(e.g. to avoid RDT to overrun RDH). I can speculate that the code
looks like this right now because this issue showed up during
testing (e.g. with if_mgb), and it was easy to workaround by
decrementing pidx before isc_rxd_flush().
The goal of this change is to simplify the code (removing a bunch
of instructions from the RX fast path), and to make the semantic of
isc_rxd_flush() consistent across drivers. To achieve this, we:
- change the semantics of the pidx argument to the usual one (that
is the index one past the last valid one), so that both iflib and
drivers avoid the decrement/increment dance.
- fix the initialization code to prepare at most N-1 descriptors.
This changeset introduces the new libnetmap library for writing
netmap applications.
Before libnetmap, applications could either use the kernel API
directly (e.g. NIOCREGIF/NIOCCTRL) or the simple header-only-library
netmap_user.h (e.g. nm_open(), nm_close(), nm_mmap() etc.)
The new library offers more functionalities than netmap_user.h:
- Support for complex netmap options, such as external memory
allocators or per-buffer offsets. This opens the way to future
extensions.
- More flexibility in the netmap port bind options, such as
non-numeric names for pipes, or the ability to specify the netmap
allocator that must be used for a given port.
- Automatic tracking of the netmap memory regions in use across the
open ports.
At the moment there is no man page, but the libnetmap.h header file
has in-depth documentation.
Ian Lepore [Tue, 22 Sep 2020 14:59:05 +0000 (14:59 +0000)]
MFC r365729:
Add product ID strings for a couple Microchip usb hubs. Also, update the
vendor ID string to say just "Microchip Technology" -- the buyout of
Standard Microsystems happened in 2012 and the SMC/SMSC names are pretty
much retired at this point.
MFC r365829, r365837, r365852: certctl rehash upon install/distribute
r365829:
installworld: run `certctl rehash` after installation completes
This was originally introduced back in r360833, and subsequently reverted
because it was broken for -DNO_ROOT builds and it may not have been the
correct place for it.
While debatably this may still not be 'the correct place,' it's much cleaner
than scattering rehashes all throughout the tree. brooks has fixed the issue
with -DNO_ROOT by properly writing to the METALOG in r361397.
Do note that this is different than what was originally committed; brooks
had revisions in D24932 that made it actually use the revised unprivileged
mode and write to METALOG, along with being a little more friendly to
foreign crossbuilds and just using the certctl in-tree.
With this change, I believe we should now have a populated /etc/ssl/certs in
the VM images.
r365837:
Promote the installworld `certctl rehash` to distributeworld
Contrary to my belief, installworld is not sufficient for getting certs
installed into VM images. Promote the rehash to both installworld and
distributeworld (notably: not stageworld) and rehash the base distdir so we
end up with /etc/ssl/certs populated in the base dist archive. A future
commit will remove the rehash from bsdinstall, which doesn't really need to
happen if they're installed into base.txz.
While here, fix a minor typo: s/CERTCLTFLAGS/CERTCTLFLAGS/
r365852:
Revert r361257: bsdinstall: do a `certctl rehash` upon installation [...]
As of r365829, any given base distribution set will now include the /etc/ssl
symlinks that this rehash would've otherwise installed. This extra step is
no longer required.
MFC r365237:
Micro optimise _callout_stop_safe() by removing dead code.
The CS_DRAIN flag cannot be set at the same time like the async-drain function
pointer is set. These are orthogonal features. Assert this at the beginning
of the function.
Before:
if (flags & CS_DRAIN) {
/* FALLTHROUGH */
} else if (xxx) {
return yyy;
}
if (drain) {
zzz = drain;
}
After:
if (flags & CS_DRAIN) {
/* FALLTHROUGH */
} else if (xxx) {
return yyy;
} else {
if (drain) {
zzz = drain;
}
}
r363887:
Use the right factor when finding the best frequency and compare the
absolute value of the result.
Submitted by: kibab
r365395:
aw_clk_nm: fix incorrect use of abs()
abs() takes a (signed) int as input.
Instead, it was used with unsigned 64-bit integers.
So, add and use a new helper function to calculate a difference between
two uint64_t-s.
Rick Macklem [Mon, 21 Sep 2020 00:50:32 +0000 (00:50 +0000)]
MFC: r365703
Fix a case where the NFSv4.0 server might crash if delegations are enabled.
asomers@ reported a crash on an NFSv4.0 server with a backtrace of:
kdb_backtrace
vpanic
panic
nfsrv_docallback
nfsrv_checkgetattr
nfsrvd_getattr
nfsrvd_dorpc
nfssvc_program
svc_run_internal
svc_thread_start
fork_exit
fork_trampoline
where the panic message was "docallb", which indicates that a callback
was attempted when the ClientID is unconfirmed.
This would not normally occur, but it is possible to have an unconfirmed
ClientID structure with delegation structure(s) chained off it if the
client were to issue a SetClientID with the same "id" but different
"verifier" after acquiring delegations on the previously confirmed ClientID.
The bug appears to be that nfsrv_checkgetattr() failed to check for
this uncommon case of an unconfirmed ClientID with a delegation structure
that no longer refers to a delegation the client knows about.
This patch adds a check for this case, handling it as if no delegation
exists, which is the case when the above occurs.
Although difficult to reproduce, this change should avoid the panic().
MFC r365830: make it possible recovering broken GPT after some LBAs cut off
If pre-formatted device has GPT and a partition covering
last available LBAs and the device is attached using
a bridge reducing amount of LBAs, then it could be not enough
forcing GEOM to use primary GPT. Also, we should make it possible
to recover GPT and this requires either deleting or resizing the partition.
This change enables "gpart delete" and "gpart resize" commands
on corrupted GPT with following "gpart recover".
It still does not allow modifying corrupted GPT without
preliminary setting sysctl kern.geom.part.check_integrity=0