It turns out that currently mandoc(1) is not handling Fl in Ss
correctly (maybe it never was). Let's just replace "Fl S \&Ss ..."
with "-S ...". After all, this subsection title is stylized anyway, so Fl
is not that helpful.
Mark Johnston [Fri, 24 Apr 2020 21:21:49 +0000 (21:21 +0000)]
Fix a race in pmap_emulate_modified().
pmap_emulate_modify() was assuming that no changes to the pmap could
take place between the TLB signaling the fault and
pmap_emulate_modify()'s acquisition of the pmap lock, but that's clearly
not even true in the uniprocessor case, nevermind the SMP case.
Mark Johnston [Fri, 24 Apr 2020 21:21:23 +0000 (21:21 +0000)]
Fix a race between _pmap_unwire_ptp() and MipsDoTLBMiss().
MipsDoTLBMiss() will load a segmap entry or pde, check that it isn't
zero, and then chase that pointer to a physical page. If that page has
been freed in the interim, it will read garbage and go on to populate
the TLB with it.
This can happen because pmap_unwire_ptp zeros out the pde and
vm_page_free_zero()s the ptp (or, recursively, zeros out the segmap
entry and vm_page_free_zero()s the pdp) without interlocking against
MipsDoTLBMiss(). The pmap is locked, and pvh_global_lock may or may not
be held, but this is not enough. Solve this issue by inserting TLB
shootdowns within _pmap_unwire_ptp(); as MipsDoTLBMiss() runs with IRQs
deferred, the IPIs involved in TLB shootdown are sufficient to ensure
that MipsDoTLBMiss() sees either a zero segmap entry / pde or a non-zero
entry and the pointed-to page still not freed.
Mark Johnston [Fri, 24 Apr 2020 18:47:42 +0000 (18:47 +0000)]
Remove an obsolete TODO comment from several minidump implementations.
The comment referenced a non-existent function, and these minidump
implementations already buffer discontiguous physical data pages by
mapping them into a single VA range that gets passed to the dump device,
so there is no real advantage in batching calls to blk_write().
The RISC-V and MIPS minidump implementations still write a page at a
time and so would benefit from some form of batching.
MFC after: 2 weeks
Sponsored by: Juniper Networks, Klara Inc.
Mark Johnston [Fri, 24 Apr 2020 13:53:40 +0000 (13:53 +0000)]
Stop setting PG_U in bootstrap mappings.
These mappings are never visible to userspace as they get replaced when
the amd64 pmap is bootstrapped, but there is no need to set PG_U in the
first place.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24547
UPDATING: add a note about closefrom(2) marked COMPAT12
Some of the consumers in-base may make it enticing enough to ensure that
COMPAT_FREEBSD12, which is notably a fairly light option at the moment, is
included in custom kernel configs.
acpi_iicbus: set device description in the probe method
Kernel prints the device announcement before the attach method is
called, so if the correct description is not set by the probe method,
then the announcement would have an incorrect one.
ig4: ensure that drivers always attach in correct order
Use DRIVER_MODULE_ORDERED(SI_ORDER_ANY) so that ig4's ACPI attachment
happens after iicbus and acpi_iicbus drivers are registered.
I have seen a problem where iicbus attached under ig4 instead of
acpi_iicbus when ig4.ko was loaded with kldload. I believe that that
happened because ig4 driver was a first driver to register, it attached
and created an iicbus child. Then iicbus driver was registered and,
since it was the only driver that could attach to the iicbus child
device, it did exactly that. After that acpi_iicbus driver was
registered. It would be able to attach to the iicbus device, but it was
already attached, so nothing happened.
Eric van Gyzen [Thu, 23 Apr 2020 23:57:43 +0000 (23:57 +0000)]
Update jemalloc to version 5.2.1
Revert r354606 to restore r354605.
Apply one line from jemalloc commit d01b425e5d1e1 in hash_x86_128()
to fix the build with gcc, which only allows a fallthrough attribute
to appear before a case or default label.
Submitted by: jasone in r354605
Discussed with: jasone
Reviewed by: bdrewery
MFC after: never, due to gcc 4.2.1
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24522
If the index we're trying to convert is 0 we can avoid a potentially
expensive call to getifaddrs(). No interface has an ifindex of zero, so
we can handle this as an error: set the errno to ENXIO and return NULL.
Submitted by: Nick Rogers
Reviewed by: lutz at donnerhacke.de
MFC after: 2 weeks
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D24524
Allan Jude [Thu, 23 Apr 2020 19:20:58 +0000 (19:20 +0000)]
Add VIRTIO_BLK_T_DISCARD (TRIM) support to the bhyve virtio-blk backend
This will advertise support for TRIM to the guest virtio-blk driver and
perform the DIOCGDELETE ioctl on the backing storage if it supports it.
Thanks to Jason King and others at Joyent and illumos for expanding on
my original patch, adding improvements including better error handling
and making sure to following the virtio spec.
Submitted by: Jason King <jason.king@joyent.com> (improvements)
Reviewed by: jhb
Obtained from: illumos-joyent (improvements)
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21707
The TSADC familiy is a little bit more complex than V2 and V3.
Early revision do not use syscon and do not use qsel (RK3288).
Next revision still do not use syscon but uses qsel (RK3328).
Final revision use both.
dumpon(8) has not accepted 1024-bit RSA keys since prior to r339784 (2018-10).
The manual page language was not updated at that time (oops). The minimum
accepted is 2048 bits, which is also a good default choice.
EKCD: Preload error strings, PRNG seed; use OAEP padding
Preload OpenSSL ERR string data so that the formatted error messages are
vaguely meaningful. Add OpenSSL error information to the RSA_public_encrypt()
operation failure case in one-time key generation.
For obsolescent OpenSSL versions (*cough* FIPS *cough*), pre-seed the PRNG
before entering Cap mode, as old versions of OpenSSL are unaware of kernel
RNG interfaces aside from /dev/random (such as the long-supported kern.arnd, or
the slightly more recent getentropy(3) or getrandom(2)). (RSA_public_encrypt()
wants a seeded PRNG to randomize the "PS" portion of PKCS 1.5 padding or the
"MGF" pseudo-random function in OAEP padding.)
Switch dumpon to encrypt the one-time key with OAEP padding (recommended since
1998; RFC2437) rather than the obsolescent PKCS 1.5 padding (1993; RFC2313).
Switch decryptcore to attempt OAEP decryption first, and try PKCS 1.5
decryption on failure. This is intended only for transition convenience, and
we should obsolete support for non-OAEP padding in a release or two.
acpi_ec(4): Don't probe erroneously if success occurred
In r360131, acpi_ec probe was changed to not clobber an error status prior to
several error cases that did not explicitly set the error variable before
goto'ing the exit path. However, I did not notice that the error variable was
not set to success in the success path. That caused all successful probes to
fail, which is obviously undesirable.
PR: 245778
Reported by: Neel Chauhan <neel AT neelc.org>, Evilham <contact AT evilham.com>
Tested by: Evilham
X-MFC-With: r360131
The segfault fix was originally developed by our upstream, sqlite.org,
to address S/390 and Sparc segfaults, both of which are big endian.
Our PowerPC is also big endian, which this patch also fixes.
Reported by: Mark Millard <marklmi at yahoo.com>
Tested by: Mark Millard <marklmi at yahoo.com>
Obtained from: https://www.sqlite.org/src/vinfo/04885763c4cd00cb?diff=1
https://sqlite.org/forum/forumpost/672291a5b2
MFC after: 1 month
X-MFC with: r360221, 360221
Convert rtentry field accesses into nhop field accesses.
One of the goals of the new routing KPI defined in r359823 is to entirely
hide`struct rtentry` from the consumers. It will allow to improve routing
subsystem internals and deliver more features much faster.
This commit is mostly mechanical change to eliminate direct struct rtentry
field accesses.
The only notable difference is AF_LINK gateway encoding.
AF_LINK gw is used in routing stack for operations with interface routes
and host loopback routes.
In the former case it indicates _some_ non-NULL gateway, as the interface
is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting.
In the latter case the interface index inside gateway was used by the IPv6
datapath to verify address scope for link-local interfaces.
Kernel uses struct sockaddr_dl for this type of gateway. This structure
allows for specifying rich interface data, such as mac address and interface
name. However, this results in relatively large structure size - 52 bytes.
Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside
in the first 8 bytes of the structure.
In the new KPI, struct nhop_object tries to be cache-efficient, hence
embodies gateway address inside the structure. In the AF_LINK case it
stores stortened version of the structure - struct sockaddr_dl_short,
which occupies 16 bytes. After D24340 changes, the data inside AF_LINK
gateway will not be used in the kernel at all, leaving rtsock as the only
potential concern.
(new)
got message of size 200 on Sun Apr 19 09:46:32 2020
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 link#5 255.255.255.0
Note 40 bytes different (52-16 + alignment).
However, gateway is still a valid AF_LINK gateway with proper data filled in.
It is worth noting that these particular messages (interface routes) are mostly
ignored by routing daemons:
* bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages.
* quagga/frr ignores routes without gateway
More detailed overview on how rtsock messages are used by the
routing daemons to reconstruct the kernel view, can be found in D22974.
r360139 made compiling with NO_HISTORY work. This #define does not remove
the fc and bind builtins completely but makes them always write an error
message.
However, there was also some code in builtins.def and mkbuiltins to remove
the fc builtin entirely (but not the bind builtin). The additional build
system complication to make this work seems not worth it, so remove that
code.
Rick Macklem [Wed, 22 Apr 2020 21:00:14 +0000 (21:00 +0000)]
Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant.
RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION
should first consist of a CreateSession operation using the extant ClientID.
If that fails, then a full recovery beginning with the ExchangeID operation
is to be done.
Without this patch, the FreeBSD client did not attempt the CreateSession
operation with the extant ClientID and went directly to a full recovery
beginning with ExchangeID. I have had this patch several years, but since
no extant NFSv4.n server required the CreateSession with extant ClientID,
I have never committed it.
I an committing it now, since I suspect some future NFSv4.n server will
require this and it should not negatively impact recovery for extant NFSv4.n
servers, since they should all return NFSERR_STATECLIENTID for this first
CreateSession.
The patched client has been tested for recovery against both the FreeBSD
and Linux NFSv4.n servers and no problems have been observed.
John Baldwin [Wed, 22 Apr 2020 20:43:18 +0000 (20:43 +0000)]
Update blake2 accelerated software tests to work after OCF refactoring.
- Lookup device drivers to test by name instead of assuming that the
software / hardware flags will select specific drivers.
- Set the sysctl to permit software /dev/crypto requests when testing
the accelerated software blake2 driver.
John Baldwin [Wed, 22 Apr 2020 19:44:33 +0000 (19:44 +0000)]
Deprecate 3des support in IPsec for FreeBSD 13.
RFC 8221 does not outright ban 3des as the algorithms deprecated for
13 in r348205, but it is listed as a SHOULD NOT and will likely be a
MUST NOT by the time 13 ships.
For PIE binaries, ldd(1) performs dlopen(RTLD_TRACE) on the binary.
It is legal for binary to use initial exec TLS mode, but when such
binary (actually dso) is dlopened, we might not have enough free space
in the finalized static TLS segment. Make ldd operational by skipping
TLS space allocation, we are not going to execute any code from the
dso anyway.
Reported by: tobik
PR: 245677
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Factor code in LinuxKPI to allow attach and detach using any BSD device.
This allows non-LinuxKPI based infiniband device drivers to attach
correctly to ibcore.
In the below-referenced PR, a case is attached of a simple reproducer that
exhibits suboptimal behavior: EVFILT_READ and EVFILT_WRITE being set in the
same kevent(2) call will only honor the first one. This is, in-fact, how
it's supposed to work.
A read of the manpage leads me to believe we could be more clear about this;
right now there's a logical leap to make in the relevant statement: "When
passed as input, it forces EV_ERROR to always be returned." -- the logical
leap being that this indicates the caller should have allocated space for
the change to be returned with EV_ERROR indicated in the events, or
subsequent filters will get dropped on the floor.
Another possible workaround that accomplishes similar effect without needing
space for all events is just setting EV_RECEIPT on the final change being
passed in; if any errored before it, the kqueue would not be drained. If we
made it to the final change with EV_RECEIPT set, then we would return that
one with EV_ERROR and still not drain the kqueue. This would seem to not be
all that advisable.
Mike Karels [Wed, 22 Apr 2020 00:42:10 +0000 (00:42 +0000)]
Add genet driver for Raspberry Pi 4B Ethernet
Add driver for Broadcom "GENET" version 5, as found in BCM-2711 on
Raspberry Pi 4B. The driver is derived in part from the bcmgenet.c
driver in NetBSD, along with bcmgenetreg.h.
Reviewed by: manu
Obtained from: in part from NetBSD
Relnotes: yes, note addition
Differential Revision: https://reviews.freebsd.org/D24436
John Baldwin [Tue, 21 Apr 2020 23:38:54 +0000 (23:38 +0000)]
Don't pass a user buffer pointer as the data pointer in a CCB.
Allocate a temporary buffer in the kernel to serve as the CCB data
pointer for a pass-through transaction and use copyin/copyout to
shuffle the data to/from the user buffer.
John Baldwin [Tue, 21 Apr 2020 17:47:05 +0000 (17:47 +0000)]
Don't access a user buffer directly from the kernel.
The handle_string callback for the ENCIOC_SETSTRING ioctl was passing
a user pointer to memcpy(). Fix by using copyin() instead.
For ENCIOC_GETSTRING ioctls, the handler was storing the user pointer
in a CCB's data_ptr field where it was indirected by other code. Fix
this by allocating a temporary buffer (which ENCIOC_SETSTRING already
did) and copying the result out to the user buffer after the CCB has
been processed.
John Baldwin [Tue, 21 Apr 2020 17:42:32 +0000 (17:42 +0000)]
Retire two unused background fsck sysctls.
These two sysctls were added to support UFS softupdates journalling
with snapshots. However, the changes to fsck to use them were never
committed and there have never been any in-tree uses of these sysctls.
More details from Kirk:
When journalling got added to soft updates, its journal rollback freed
blocks that it thought were no longer in use. But it does not take
snapshots into account (i.e., if a snapshot is still using it, then it
cannot be freed). So I added the needed logic to fsck by having the
free go through the kernel's blkfree code so it could grab blocks that
were still needed by snapshots. That is done using the setbufoutput
hack. I never got that code working reliably, so it is still sitting
in my work directory. Which also explains why you still cannot take
snapshots on filesystems running with journalling...
In looking over my use of this feature, and in particular the troubles
I was having with it, I conclude that it may be better to extract the
code from the kernel that handles freeing blocks claimed by snapshots
and putting it into fsck directly. My original intent was that it is
complex and at the time changing, so only having to maintain it in one
place was appealing. But at this point it has not changed in years and
the hacks like setinode and setbufoutput to be able to use the kernel
code is sufficiently ugly, that I am leaning towards just extracting
it.
John Baldwin [Tue, 21 Apr 2020 17:38:07 +0000 (17:38 +0000)]
Handle non-dtrace-triggered kernel breakpoint traps in mips.
If DTRACE is enabled at compile time, all kernel breakpoint traps are
first given to dtrace to see if they are triggered by a FBT probe.
Previously if dtrace didn't recognize the trap, it was silently
ignored breaking the handling of other kernel breakpoint traps such as
the debug.kdb.enter sysctl. This only returns early from the trap
handler if dtrace recognizes the trap and handles it.
The current situation results in intermittent breakage if data gets split up
with the sign bit set on the data1 half of it, as PAIR32TO64 will then:
data1 | (data2 << 32) -> resulting in data1 getting sign-extended when it's
implicitly widened and clobbering the result. AFAICT, there's no compelling
reason for these to be signed.
This was most exposed by flakiness in the kqueue timer tests under compat32
after the ABSTIME test got switched over to using a better clock and
microseconds.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24518
Mark Johnston [Tue, 21 Apr 2020 16:01:44 +0000 (16:01 +0000)]
Factor out the kmem contig page alloc and reclamation code.
kmem_alloc_attr_domain() and kmem_alloc_contig_domain() duplicated each
other's page allocation and reclamation logic. Place it in a single
function to make it easier to add additional consumers. No functional
change intended.
Reviewed by: jeff, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24475
Correctly set up the initial TCP congestion window
in all cases, by adjust snd_una right after the
connection initialization, to include the one byte
in sequence space occupied by the SYN bit.
This does not change the regular ACK processing,
while making the BYTES_THIS_ACK macro to work properly.
This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.
John Baldwin [Mon, 20 Apr 2020 22:57:15 +0000 (22:57 +0000)]
Update comments about IVs used in IPsec ESP.
Add some prose and a diagram describing the layout of the cipher IV
for AES-CTR and AES-GCM and how it relates to the ESP IV stored in the
packet after the ESP header. Also, remove an XXX comment about the
initial block counter value used for AES-CTR in esp_output as the
current code matches the RFC (and the equivalent code in esp_input
didn't have the XXX comment).
John Baldwin [Mon, 20 Apr 2020 22:24:49 +0000 (22:24 +0000)]
Retire the CRYPTO_F_IV_GENERATE flag.
The sole in-tree user of this flag has been retired, so remove this
complexity from all drivers. While here, add a helper routine drivers
can use to read the current request's IV into a local buffer. Use
this routine to replace duplicated code in nearly all drivers.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24450
John Baldwin [Mon, 20 Apr 2020 22:20:26 +0000 (22:20 +0000)]
Generate IVs directly in esp_output.
This is the only place that uses CRYPTO_F_IV_GENERATE. All crypto
drivers currently duplicate the same boilerplate code to handle this
case. Doing the generation directly removes complexity from drivers.
It also simplifies support for separate input and output buffers.
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24449
Merge commit 64b31d96d from llvm git (by Nemanja Ivanovic):
[PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without
FPCVT
We call the function that attempts to reuse the conversion without
checking whether the target matches the constraints that the callee
expects. This patch adds the check prior to the call.
This should fix 'Assertion failed: ((Op.getOpcode() == ISD::FP_TO_SINT
|| Subtarget.hasFPCVT()) && "i64 FP_TO_UINT is supported only with
FPCVT"), function LowerFP_TO_INTForReuse, file
/usr/src/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp, line 7276'
when building the devel/libslang2 port (and a few others) for PowerPC64.
acpi_ec(4): Do not probe "successfully" if an error occurred
All of the 'goto out;' cases in this probe routine without explicit
initialization of 'ret' indicate error cases and were clearly intended
to use the initial definition of 'ret' with ENXIO. However, 'ret' was
accidentally squashed by reuse for a subroutine call near the beginning
of probe.
Use a different variable for the subroutine status to preserve ENXIO ret
for the 'goto out's as a minimal solution to the panic reported at attach
for now.