This commits add option 'M' which attempts to forcibly unmount the
dataset. Thanks to this we can enforce receiving snapshots in a
single step.
Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option. This is the intended VFS behavior.
Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org>
Differential Revision: https://reviews.freebsd.org/D22306
Tested on stable r359316 @ Sleep mode on custom hw, Power off on BBB and PB
OFF bit [1] in status register control the pmic behaviour when PWR_EN pin
is pulled low.
On most AM335x hardware [beaglebone *] the desired behaviour are in fact
power off due to some hardware designs - read more in the comments around
pmic in sys/gnu/dts/arm/am335x-bone-common.dtsi
This patch let the device-tree decide with ti,pmic-shutdown-controller[2]
the state of off bit in status register.
GPIO drivers who supports interrupts report them in the caps
(obtain via the getcaps method) but gpioctl doesn't know
how to interpret this and print "UNKNOWN" for each one of them.
Even if we don't have userland gpio interrupts support for now
let gpioctl print the supported caps.
Split their functionality by moving random seed allocation
to SYSINIT and calling (new) generic multipath function from
standard IPv4/IPv5 RIB init handlers.
powerpc/booke: Use power-of-two mappings in 64-bit pmap_mapdev
Summary:
This reduces the precious TLB1 entry consumption (64 possible in
existing 64-bit cores), by adjusting the size and alignment of a device
mapping to a power of 2, to encompass the full mapping and its
surroundings.
One caveat with this: If a mapping really is smaller than a power of 2,
it's possible to get a machine check or hang if the 'missing' physical
space is accessed. In practice this should not be an issue for users,
as devices overwhelmingly have physical spaces on power-of-two sizes and
alignments, and any design that includes devices which don't follow this
can be addressed by undefining the POW2_MAPPINGS guard.
powerpc/booke: Add pte_find_next() to find the next in-use PTE
Summary:
Iterating over VM_MIN_ADDRESS->VM_MAXUSER_ADDRESS can take a very long
time iterating one page at a time (2**(log_2(SIZE)-12) operations),
yielding possibly several days or even weeks on 64-bit Book-E, even for
a largely empty, which can happen when swapping out a process by
vmdaemon. Speed this up by instead finding the next PTE at or equal to
the given VA.
powerpc/booke: Change Book-E 64-bit pmap to 4-level table
Summary:
The existing page table is fraught with errors, since it creates a hole
in the address space bits. Fix this by taking a cue from the POWER9
radix pmap, and make the page table 4 levels, 52 bits.
Inode check-hash errors were being reported after system crashes.
Trace the cause down to journalled soft updates recovery code in
fsck failing to recompute the check-hash after updating an inode.
As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.
Rick Macklem [Fri, 10 Apr 2020 22:42:14 +0000 (22:42 +0000)]
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
John Baldwin [Fri, 10 Apr 2020 22:27:45 +0000 (22:27 +0000)]
Use both crypto engines on a T6.
A T6 adapter contains two crypto engines on separate channels. This
commit distributes sessions between the two engines. Previously, only
the first engine was used.
Rick Macklem [Fri, 10 Apr 2020 21:25:35 +0000 (21:25 +0000)]
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
Mark Johnston [Fri, 10 Apr 2020 20:42:11 +0000 (20:42 +0000)]
sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs.
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs. uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.
Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.
Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24333
Mark Johnston [Fri, 10 Apr 2020 20:41:59 +0000 (20:41 +0000)]
Properly handle disconnected sockets in uipc_ready().
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer. This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.
Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect. This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.
Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs. Only free the
mbufs if we fail this search.
Reviewed by: jah, kib
Reported and tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24332
Ed Maste [Fri, 10 Apr 2020 18:38:42 +0000 (18:38 +0000)]
hdac: show which command timed out
There are several reports of "hdac0: Command timeout on address 2"
messages emitted during playback on a variety of contemporary machines.
Show the command that timed out in case it might provide a clue in
finding the cause.
PR: 229190
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Ed Maste [Fri, 10 Apr 2020 18:13:29 +0000 (18:13 +0000)]
hdac: update comment on reset duration
The comment previously stated the delay must be at least 250us but that
was insufficient and so should be doubled, but the delay was actually
1000. The HDA spec actually says the delay must be 521 us (25 frames)
so update the comment to match.
Split rtrequest1_fib() into smaller manageable chunks.
No functional changes.
* Move route addition / route deletion code from rtrequest1_fib()
to add_route() and del_route() respectively.
* Rename rtrequest1_fib_change() to change_route() for consistency.
* Shrink the scope of ugly info #defines.
userland build: replace -fno-common with ${CFCOMMONFLAG}
This change allows any downstream or otherwise consumer to easily override
the new -fno-common default on a temporary basis without having to hack into
src.sys.mk, and also makes it a bit easier to search for these specific
cases where -fno-common must be overridden with -fcommon or else the build
will fail.
The gdb build, the only program requiring -fcommon on head/, is switched
over as an example usage. It will need it on all branches, so this does not
harm future mergability.
Forced rw unmounts and remounts from rw to ro already suspend
filesystem, which closes races with writers instantiating new vnodes
while unmount flushes the queue. Original intent of not including
non-forced unmounts into this regime was to allow such unmounts to
fail if writer was active, but this did not worked well.
Similar change, but causing all unmount, even involving only ro
filesystem, were proposed in D24088, but I believe that suspending ro
is undesirable, and definitely spends CPU time.
Reported by: markj
Discussed with: chs, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Fixing the soft update macros in -r359612 triggered a previously
hidden bug in the file truncation code. Until that bug is tracked
down and fixed, revert to the old behavior.
Reported by: Peter Holm
Reviewed by: kib, Chuck Silvers
When running with a kernel compiled with DEBUG_LOCKS, before
panic'ing for recusing on a non-recursive lock, print out the
kernel stack where the lock was originally acquired.
This is an application of the kernel overflow fix from r357948 to
userspace, based on the algorithm developed by Bruce Evans. To keep
the ABI of the vds_timekeep stable, instead of adding the large_delta
member, MSB of both multipliers are added to quickly estimate the overflow.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Rick Macklem [Thu, 9 Apr 2020 23:11:19 +0000 (23:11 +0000)]
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
Rick Macklem [Thu, 9 Apr 2020 14:44:46 +0000 (14:44 +0000)]
Remove the old NFS lock device driver that uses Giant.
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.
With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.
Rick Macklem [Wed, 8 Apr 2020 01:12:54 +0000 (01:12 +0000)]
Fix an interoperability issue w.r.t. the Linux client and the NFSv4 server.
Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client
was able to open and write to a file when the file's permissions were
not set to allow the owner write access.
Since NFS servers check file permissions on every write RPC, it is standard
practice to allow the owner of the file to do writes, regardless of
file permissions. This provides POSIX like behaviour, since POSIX only
checks permissions upon open(2).
The traditional way NFS clients handle this is to check access via the
Access operation/RPC and use that to determine if an open(2) on the
client is allowed.
It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a
POSIX open) operation to fail with NFSERR_ACCES if the file is not being
created and file permissions do not allow owner access, unlike NFSv3.
Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this
behaviour, this patch changes the FreeBSD NFSv4 server to do the same.
A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the
NFSv4 server to its previous behaviour.
Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct
behaviour with the access check for file owner in Open enabled, it is enabled
by default.
In the past changes have been made to smbios->minor without updating the
smbios->bcdrev value.
Correct that by calculating bcdrev from the major/minor values.
Warner Losh [Tue, 7 Apr 2020 22:23:22 +0000 (22:23 +0000)]
Now that we don't have special-case geom hacking defined in md_var.h, stop
including it. sparc64 was the last straggler here, but these weren't removed at
the time.
David Bright [Tue, 7 Apr 2020 20:26:42 +0000 (20:26 +0000)]
Add a basic test for nvmecontrol
I recently made some bug fixes in nvmecontrol. It occurred to me that
since nvmecontrol lacks any kyua tests, I should convert the informal
testing I did into a more formal automated test. The test in this
change should be considered just a starting point; it is neither
complete nor thorough. While converting the test to ATF/kyua, I
discovered a small bug in nvmecontrol; the nvmecontrol devlist command
would always exit with an unsuccessful status. So I included the fix
for that, too, so that the test won't fail.
Although PPC OFW loader already had a LOADER_MSDOS_SUPPORT option, a few lines
were missing in conf.c, in order to support FAT filesystems.
This is useful when running FreeBSD under QEMU, to be able to easily change the
kernel and modules when running on hosts without UFS read/write support.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24328
Add VIRTIO_NET_F_MTU flag support for the bhyve virtio-net device.
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000
-fno-common will become the default in GCC10/LLVM11. Plenty of work has been
put in to make sure our world builds are no -fno-common clean, so let's slap
the build with this until it becomes the compiler default to ensure we don't
regress.
At this time, we will not be enforcing -fno-common on ports builds. I
suspect most ports will be or quickly become -fno-common clean as they're
naturally built against compilers that default to it, so this will hopefully
become a non-issue in due time. The exception to this, which is actually the
status quo, is that kmods built from ports will continue to build with
-fno-common.
As of the time of writing, I intend to also make stable/12 -fno-common
clean. What's been done will be MFC'd to stable/11 if it's easily applicable
and/or not much work to massage it into being functional, but I anticipate
adding -fcommon to stable/11 builds to maintain its ability to be built with
newer compilers for the rest of its lifetime instead of putting in a third
branch's worth of effort.
We must wrap C declarations in __BEGIN / __END_DECLS to avoid C++ name-mangling
of the declaration when including the C header; name-mangling causes the linker
to attempt to locate the wrong (C++ ABI) symbol name.
Reviewed by: markj, oshogbo (earlier version both)
Differential Revision: https://reviews.freebsd.org/D24323
Brooks Davis [Tue, 7 Apr 2020 15:32:08 +0000 (15:32 +0000)]
Allow the kernel to build with a compiler that sets -fno-common.
The mechanism that generates assym.inc and offset.inc depends on the
symbols in question being common. For now, simply force the object files
to be created with -fcommon.
Recently added/changed lines in various kernel configs have caused some
buffer overflows that went undetected. These were detected with a config
built using -fno-common as these line buffers smashed one of our arrays,
then further triaged with ASAN.
Double the sizes; this is really not a great fix, but addresses the
immediate need until someone rewrites config. While here, add some bounds
checking so that we don't need to detect this by random bus errors or other
weird failures.
- beriloader: archsw is declared extern and defined elsewhere
- ofwloader: ofw_elf{,64} are defined in elf_freebsd.c and
ppc64_elf_freebsd.c respectively
- ubldr: syscall_ptr is defined in start.S for whichever ubldr platform is
building
-fno-common will become the default in GCC10/LLVM11.
OpenFirmware (OF) method instantiate-rtas was being called with a wrong
rtas-base-address argument. It must use the memory that is already being
allocated to this end instead. This issue was causing QEMU netboot to hang
when building the FDT from OF DT.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24313
Normalize deployment tools usage and definitions by putting into one place
instead of sprinkling them out over many disjoint files. This is a follow-up
to achieve the same goal in an incomplete rev.348521.
Brooks Davis [Mon, 6 Apr 2020 23:38:46 +0000 (23:38 +0000)]
Fix compilation with upstream clang builtin headers.
By using -nobuiltininc and adding the clang builtin headers resource dir
to the end of the compiler header search path, we can still find headers
such as immintrin.h but find the FreeBSD version of stddef.h/stdarg.h/..
first.
This is a workaround until we are able to settle on and complete a plan
to harmonize guard macros with LLVM. We've mostly worked out this on
FreeBSD systems by removing select headers from the installed set of
devel/llvm*, but that isn't a good solution for cross build.
The ugly stick here is this bit in the respective headers:
#ifndef EXTERN
#define EXTERN extern
#endif
with a follow-up #define EXTERN in a single .c file to push all of their
definitions into one spot. A pass should be made over these three later to
push these definitions into the correct files instead, but this will suffice
for now and at a more leisurely pace.
Rick Macklem [Mon, 6 Apr 2020 23:21:39 +0000 (23:21 +0000)]
Fix noisy NFSv4 server printf.
Peter reported that his dmesg was getting cluttered with
nfsrv_cache_session: no session
messages when he rebooted his NFS server and they did not seem useful.
He was correct, in that these messages are "normal" and expected when
NFSv4.1 or NFSv4.2 are mounted and the server is rebooted.
This patch silences the printf() during the grace period after a reboot.
It also adds the client IP address to the printf(), so that the message
is more useful if/when it occurs. If this happens outside of the
server's grace period, it does indicate something is not working correctly.
Instead of adding yet another nd_XXX argument, the arguments for
nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *".
This is mostly two problems spread out far and wide:
- ypldap_process should be declared properly
- debug is defined differently in many programs
For the latter, just extern it and define it everywhere that actually needs
it. This mostly works out nicely for ^/libexec/ypxfr, which can remove the
assignment at the beginning of main in favor of defining it properly.
-fno-common will become the default in GCC10/LLVM11.
The location of the device-tree blob is passed to the kernel by the
previous booting stage (i.e. BBL or OpenSBI). Currently, we leave it
untouched and mark the 1MB of memory holding it as unavailable.
Instead, do what is done by other fake_preload_metadata() routines and
copy to the DTB to KVA space. This is more in line with what loader(8)
will provide us in the future, and it allows us to reclaim the hole in
physical memory.
riscv: Make sure local hart's icache is synced in pmap_sync_icache
The only way to flush the local hart's icache is with a FENCE.I (or an
equivalent SBI call); a normal FENCE is insufficient and, for the
single-hart case, unnecessary.
Summary:
The parentheses being in the wrong place means that, for L3 pages,
oldpte has all bits except PTE_V cleared, and so all the subsequent
checks against oldpte will fail, causing us to bail out and not retry
the faulting instruction after an SFENCE.VMA. This causes a WITNESS +
INVARIANTS kernel to fault on the "Chisel P3" (BOOM-based) DARPA SSITH
GFE SoC in pmap_init when writing to pv_table and, being a nofault
entry, subsequently panic with:
panic: vm_fault_lookup: fault on nofault entry, addr: 0xffffffc004e00000
Marcin Wojtas [Mon, 6 Apr 2020 19:45:26 +0000 (19:45 +0000)]
Add hwpmc support for Intel Atom Goldmont microarchitecture
Recognize new micro-architecture in hwpmc_intel driver. Based on Intel
document 325462-071US. Tested with tools/test/hwpmc/pmctest.py
on Atom E3930 SoC.
Don't drop packets having too many TCP option headers in mlx5en(4).
When using SACK it can happen there are multiple option headers.
Don't drop these packets, but instead limit the amount of inlining
to the maximum supported.
For head/, this will remain eternally default-on to maintain the status quo.
For stable/ branches, it should be flipped to default-off to maintain the
status quo.
There's value in being able to flip it one way or the other easily on head
or stable branches, whether you want to gain some performance back on head/
(for machines there's little chance you'll actually hit an assertion) or
potentially diagnose a problem with the version of llvm on an older branch.
Currently, stable branches get the CFLAGS+= -ndebug line uncommented; going
forward, they will instead have the default of LLVM_ASSERTIONS flipped.
Reviewed by: dim, emaste, re (gjb)
MFC after: 1 week
MFC note: flip the default of LLVM_ASSERTIONS
Differential Revision: https://reviews.freebsd.org/D24264
Rick Macklem [Sun, 5 Apr 2020 21:08:17 +0000 (21:08 +0000)]
Change the xid for client side krpc over UDP to a global value.
Without this patch, the xid used for the client side krpc requests over
UDP was initialized for each "connection". A "connection" for UDP is
rather sketchy and for the kernel NLM a new one is created every 2minutes.
A problem with client side interoperability with a Netapp server for the NLM
was reported and it is believed to be caused by reuse of the same xid.
Although this was never completely diagnosed by the reporter, I could see
how the same xid might get reused, since it is initialized to a value
based on the TOD clock every two minutes.
I suspect initializing the value for every "connection" was inherited from
userland library code, where having a global xid was not practical.
However, implementing a global "xid" for the kernel rpc is straightforward
and will ensure that an xid value is not reused for a long time. This
patch does that and is hoped it will fix the Netapp interoperability
problem.
adduser: allow standard IFS characters in passwords
Notably, the default IFS contains space/tab, thus any leading/trailing
whitespace characters tend to be removed.
Set IFS= for just the read lines to mitigate this, allowing the user to be
less surprised when their leading/trailing spaces weren't actually captured
in the password as they are with other means of setting a user's password.
bridge: Change lists to CK_LIST as a peparation for epochification
Prepare the ground for a rework of the bridge locking approach. We will
use an epoch-based approach in the datapath and making it safe to
iterate over the interface, span and rtnode lists without holding the
BRIDGE_LOCK. Replace the relevant lists by their ConcurrencyKit
equivalents.
No functional change in this commit.
Reviewed by: emaste, ae, philip (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24249
Make p_vaddr % p_align == p_offset % p_align for (some) TLS segments.
See https://sourceware.org/bugzilla/show_bug.cgi?id=24606 for the test case.
See https://reviews.llvm.org/D64930 for the background and more discussion.
Also this fixes another bug in malloc_aligned() where total size of
the allocated memory might be not enough to fit the aligned requested
block after the initial pointer is incremented by the pointer size.
Reviewed by: bdragon
Tested by: antoine (exp-run PR 244866), bdragon, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21163
Ed Maste [Sat, 4 Apr 2020 00:31:30 +0000 (00:31 +0000)]
vt: avoid overrun when stride is not a multiple of bytes per pixel
The reporter is developing a frame buffer driver for hardware using
3 bytes per pixel, but a stride that's a multiple of 256. Previously
this resulted in writing beyond the end of each stride. On the last
row this attempted to write past the end of the frame buffer, triggering
the assertion in vt_fb_mem_wr1().
PR: 243533
MFC after: 2 weeks
Submitted by: Thomas Skibo