close_range(min, max, flags) allows for a range of descriptors to be
closed. The Python folk have indicated that they would much prefer this
interface to closefrom(2), as the case may be that they/someone have special
fds dup'd to higher in the range and they can't necessarily closefrom(min)
because they don't want to hit the upper range, but relocating them to lower
isn't necessarily feasible.
sys_closefrom has been rewritten to use kern_close_range() using ~0U to
indicate closing to the end of the range. This was chosen rather than
requiring callers of kern_close_range() to hold FILEDESC_SLOCK across the
call to kern_close_range for simplicity.
The flags argument of close_range(2) is currently unused, so any flags set
is currently EINVAL. It was added to the interface in Linux so that future
flags could be added for, e.g., "halt on first error" and things of this
nature.
This patch is based on a syscall of the same design that is expected to be
merged into Linux.
cem [Sun, 12 Apr 2020 18:04:20 +0000 (18:04 +0000)]
Add queue(2) debug macros as build options
Add QUEUE_MACRO_DEBUG_TRACE and QUEUE_MACRO_DEBUG_TRASH as proper kernel
options. While here, alpha-sort the debug section of sys/conf/options.
Enable QUEUE_MACRO_DEBUG_TRASH in amd64 GENERIC (but not GENERIC-NODEBUG)
kernels. It is similar in nature and cost to other use-after-free pointer
trashing we do in GENERIC. It is probably reasonable to enable in any arch
GENERIC kernel that defines INVARIANTS.
dim [Sun, 12 Apr 2020 16:06:59 +0000 (16:06 +0000)]
Merge commit 30588a739 from llvm git (by Erich Keane):
Make target features check work with ctor and dtor-
The problem was reported in PR45468, applying target features to an
always_inline constructor/destructor runs afoul of GlobalDecl
construction assert when checking for target-feature compatibility.
The core problem is fixed by using the version of the check that
takes a FunctionDecl rather than the GlobalDecl. However, while
writing the test, I discovered that source locations weren't properly
set for this check on ctors/dtors. This patch also fixes constructors
and CALLED destructors.
Unfortunately, it doesn't seem too possible to get a meaningful
source location for a 'cleanup' destructor, so those are still
'frontend' level errors unfortunately. A fixme was added to the test
to cover that situation.
This should fix 'Assertion failed: (!isa<CXXConstructorDecl>(D) && "Use
other ctor with ctor decls!"), function Init, file
/usr/src/contrib/llvm-project/clang/include/clang/AST/GlobalDecl.h, line
45' when compiling the security/botan2 port.
This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .
This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.
Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.
These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous
fib[46]-generation functions.
Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.
Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:
All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.
Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.
Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.
More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232
The intended change was
sp->next.tqe_next = NULL;
sp->next.tqe_prev = NULL;
which doesn't fix the issue I'm seeing and the committed fix is
not the intended fix due to copy-and-paste.
Thanks a lot to Conrad Meyer for making me aware of the problem.
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This commits add option 'M' which attempts to forcibly unmount the
dataset. Thanks to this we can enforce receiving snapshots in a
single step.
Note that this functionality is not supported on Linux because the
VFS will prevent active mounted filesystems from being unmounted,
even with the force option. This is the intended VFS behavior.
Discussed-with: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allanjude@freebsd.org>
Differential Revision: https://reviews.freebsd.org/D22306
Tested on stable r359316 @ Sleep mode on custom hw, Power off on BBB and PB
OFF bit [1] in status register control the pmic behaviour when PWR_EN pin
is pulled low.
On most AM335x hardware [beaglebone *] the desired behaviour are in fact
power off due to some hardware designs - read more in the comments around
pmic in sys/gnu/dts/arm/am335x-bone-common.dtsi
This patch let the device-tree decide with ti,pmic-shutdown-controller[2]
the state of off bit in status register.
manu [Sat, 11 Apr 2020 15:25:40 +0000 (15:25 +0000)]
gpioctl: Print interrupts capabilities
GPIO drivers who supports interrupts report them in the caps
(obtain via the getcaps method) but gpioctl doesn't know
how to interpret this and print "UNKNOWN" for each one of them.
Even if we don't have userland gpio interrupts support for now
let gpioctl print the supported caps.
Split their functionality by moving random seed allocation
to SYSINIT and calling (new) generic multipath function from
standard IPv4/IPv5 RIB init handlers.
powerpc/booke: Use power-of-two mappings in 64-bit pmap_mapdev
Summary:
This reduces the precious TLB1 entry consumption (64 possible in
existing 64-bit cores), by adjusting the size and alignment of a device
mapping to a power of 2, to encompass the full mapping and its
surroundings.
One caveat with this: If a mapping really is smaller than a power of 2,
it's possible to get a machine check or hang if the 'missing' physical
space is accessed. In practice this should not be an issue for users,
as devices overwhelmingly have physical spaces on power-of-two sizes and
alignments, and any design that includes devices which don't follow this
can be addressed by undefining the POW2_MAPPINGS guard.
powerpc/booke: Add pte_find_next() to find the next in-use PTE
Summary:
Iterating over VM_MIN_ADDRESS->VM_MAXUSER_ADDRESS can take a very long
time iterating one page at a time (2**(log_2(SIZE)-12) operations),
yielding possibly several days or even weeks on 64-bit Book-E, even for
a largely empty, which can happen when swapping out a process by
vmdaemon. Speed this up by instead finding the next PTE at or equal to
the given VA.
powerpc/booke: Change Book-E 64-bit pmap to 4-level table
Summary:
The existing page table is fraught with errors, since it creates a hole
in the address space bits. Fix this by taking a cue from the POWER9
radix pmap, and make the page table 4 levels, 52 bits.
Inode check-hash errors were being reported after system crashes.
Trace the cause down to journalled soft updates recovery code in
fsck failing to recompute the check-hash after updating an inode.
As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
A T6 adapter contains two crypto engines on separate channels. This
commit distributes sessions between the two engines. Previously, only
the first engine was used.
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
sbappendcontrol() needs to avoid clearing M_NOTREADY on data mbufs.
If LOCAL_CREDS is set on a unix socket and sendfile() is called,
sendfile will call uipc_send(PRUS_NOTREADY), prepending a control
message to the M_NOTREADY mbufs. uipc_send() then calls
sbappendcontrol() instead of sbappend(), and sbappendcontrol() would
erroneously clear M_NOTREADY.
Pass send flags to sbappendcontrol(), like we do for sbappend(), to
preserve M_READY when necessary.
Reported by: syzkaller
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24333
Properly handle disconnected sockets in uipc_ready().
When transmitting over a unix socket, data is placed directly into the
receiving socket's receive buffer, instead of the transmitting socket's
send buffer. This means that when pru_ready is called during
sendfile(), the passed socket does not contain M_NOTREADY mbufs in its
buffers; uipc_ready() must locate the linked socket.
Currently uipc_ready() frees the mbufs if the socket is disconnected,
but this is wrong since the mbufs may still be present in the receiving
socket's buffer after a disconnect. This can result in a use-after-free
and potentially a double free if the receive buffer is flushed after
uipc_ready() frees the mbufs.
Fix the problem by trying harder to locate the correct socket buffer and
calling sbready(): use the global list of SOCK_STREAM unix sockets to
search for a sockbuf containing the now-ready mbufs. Only free the
mbufs if we fail this search.
Reviewed by: jah, kib
Reported and tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24332
There are several reports of "hdac0: Command timeout on address 2"
messages emitted during playback on a variety of contemporary machines.
Show the command that timed out in case it might provide a clue in
finding the cause.
PR: 229190
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The comment previously stated the delay must be at least 250us but that
was insufficient and so should be doubled, but the delay was actually
1000. The HDA spec actually says the delay must be 521 us (25 frames)
so update the comment to match.
Split rtrequest1_fib() into smaller manageable chunks.
No functional changes.
* Move route addition / route deletion code from rtrequest1_fib()
to add_route() and del_route() respectively.
* Rename rtrequest1_fib_change() to change_route() for consistency.
* Shrink the scope of ugly info #defines.
userland build: replace -fno-common with ${CFCOMMONFLAG}
This change allows any downstream or otherwise consumer to easily override
the new -fno-common default on a temporary basis without having to hack into
src.sys.mk, and also makes it a bit easier to search for these specific
cases where -fno-common must be overridden with -fcommon or else the build
will fail.
The gdb build, the only program requiring -fcommon on head/, is switched
over as an example usage. It will need it on all branches, so this does not
harm future mergability.
Forced rw unmounts and remounts from rw to ro already suspend
filesystem, which closes races with writers instantiating new vnodes
while unmount flushes the queue. Original intent of not including
non-forced unmounts into this regime was to allow such unmounts to
fail if writer was active, but this did not worked well.
Similar change, but causing all unmount, even involving only ro
filesystem, were proposed in D24088, but I believe that suspending ro
is undesirable, and definitely spends CPU time.
Reported by: markj
Discussed with: chs, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Fixing the soft update macros in -r359612 triggered a previously
hidden bug in the file truncation code. Until that bug is tracked
down and fixed, revert to the old behavior.
Reported by: Peter Holm
Reviewed by: kib, Chuck Silvers
When running with a kernel compiled with DEBUG_LOCKS, before
panic'ing for recusing on a non-recursive lock, print out the
kernel stack where the lock was originally acquired.
This is an application of the kernel overflow fix from r357948 to
userspace, based on the algorithm developed by Bruce Evans. To keep
the ABI of the vds_timekeep stable, instead of adding the large_delta
member, MSB of both multipliers are added to quickly estimate the overflow.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Replace mbuf macros with the code they would generate in the NFS code.
When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.
This patch should not result in any semantic change.
This conversion will be committed one file at a time.
Remove the old NFS lock device driver that uses Giant.
This NFS lock device driver was replaced by the kernel NLM around FreeBSD7 and
has not normally been used since then.
To use it, the kernel had to be built without "options NFSLOCKD" and
the nfslockd.ko had to be deleted as well.
Since it uses Giant and is no longer used, this patch removes it.
With this device driver removed, there is now a lot of unused code
in the userland rpc.lockd. That will be removed on a future commit.
Fix an interoperability issue w.r.t. the Linux client and the NFSv4 server.
Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client
was able to open and write to a file when the file's permissions were
not set to allow the owner write access.
Since NFS servers check file permissions on every write RPC, it is standard
practice to allow the owner of the file to do writes, regardless of
file permissions. This provides POSIX like behaviour, since POSIX only
checks permissions upon open(2).
The traditional way NFS clients handle this is to check access via the
Access operation/RPC and use that to determine if an open(2) on the
client is allowed.
It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a
POSIX open) operation to fail with NFSERR_ACCES if the file is not being
created and file permissions do not allow owner access, unlike NFSv3.
Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this
behaviour, this patch changes the FreeBSD NFSv4 server to do the same.
A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the
NFSv4 server to its previous behaviour.
Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct
behaviour with the access check for file owner in Open enabled, it is enabled
by default.
In the past changes have been made to smbios->minor without updating the
smbios->bcdrev value.
Correct that by calculating bcdrev from the major/minor values.
Now that we don't have special-case geom hacking defined in md_var.h, stop
including it. sparc64 was the last straggler here, but these weren't removed at
the time.
dab [Tue, 7 Apr 2020 20:26:42 +0000 (20:26 +0000)]
Add a basic test for nvmecontrol
I recently made some bug fixes in nvmecontrol. It occurred to me that
since nvmecontrol lacks any kyua tests, I should convert the informal
testing I did into a more formal automated test. The test in this
change should be considered just a starting point; it is neither
complete nor thorough. While converting the test to ATF/kyua, I
discovered a small bug in nvmecontrol; the nvmecontrol devlist command
would always exit with an unsuccessful status. So I included the fix
for that, too, so that the test won't fail.
Although PPC OFW loader already had a LOADER_MSDOS_SUPPORT option, a few lines
were missing in conf.c, in order to support FAT filesystems.
This is useful when running FreeBSD under QEMU, to be able to easily change the
kernel and modules when running on hosts without UFS read/write support.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24328
Add VIRTIO_NET_F_MTU flag support for the bhyve virtio-net device.
The flag can be enabled using the new 'mtu' option:
bhyve -s X:Y:Z,virtio-net,[tapN|valeX:N],mtu=9000
-fno-common will become the default in GCC10/LLVM11. Plenty of work has been
put in to make sure our world builds are no -fno-common clean, so let's slap
the build with this until it becomes the compiler default to ensure we don't
regress.
At this time, we will not be enforcing -fno-common on ports builds. I
suspect most ports will be or quickly become -fno-common clean as they're
naturally built against compilers that default to it, so this will hopefully
become a non-issue in due time. The exception to this, which is actually the
status quo, is that kmods built from ports will continue to build with
-fno-common.
As of the time of writing, I intend to also make stable/12 -fno-common
clean. What's been done will be MFC'd to stable/11 if it's easily applicable
and/or not much work to massage it into being functional, but I anticipate
adding -fcommon to stable/11 builds to maintain its ability to be built with
newer compilers for the rest of its lifetime instead of putting in a third
branch's worth of effort.
cem [Tue, 7 Apr 2020 16:40:41 +0000 (16:40 +0000)]
libcasper(3): Export functions to C++
We must wrap C declarations in __BEGIN / __END_DECLS to avoid C++ name-mangling
of the declaration when including the C header; name-mangling causes the linker
to attempt to locate the wrong (C++ ABI) symbol name.
Reviewed by: markj, oshogbo (earlier version both)
Differential Revision: https://reviews.freebsd.org/D24323
Allow the kernel to build with a compiler that sets -fno-common.
The mechanism that generates assym.inc and offset.inc depends on the
symbols in question being common. For now, simply force the object files
to be created with -fcommon.
Recently added/changed lines in various kernel configs have caused some
buffer overflows that went undetected. These were detected with a config
built using -fno-common as these line buffers smashed one of our arrays,
then further triaged with ASAN.
Double the sizes; this is really not a great fix, but addresses the
immediate need until someone rewrites config. While here, add some bounds
checking so that we don't need to detect this by random bus errors or other
weird failures.
- beriloader: archsw is declared extern and defined elsewhere
- ofwloader: ofw_elf{,64} are defined in elf_freebsd.c and
ppc64_elf_freebsd.c respectively
- ubldr: syscall_ptr is defined in start.S for whichever ubldr platform is
building
-fno-common will become the default in GCC10/LLVM11.
OpenFirmware (OF) method instantiate-rtas was being called with a wrong
rtas-base-address argument. It must use the memory that is already being
allocated to this end instead. This issue was causing QEMU netboot to hang
when building the FDT from OF DT.
Reviewed by: jhibbits
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D24313
Normalize deployment tools usage and definitions by putting into one place
instead of sprinkling them out over many disjoint files. This is a follow-up
to achieve the same goal in an incomplete rev.348521.
Fix compilation with upstream clang builtin headers.
By using -nobuiltininc and adding the clang builtin headers resource dir
to the end of the compiler header search path, we can still find headers
such as immintrin.h but find the FreeBSD version of stddef.h/stdarg.h/..
first.
This is a workaround until we are able to settle on and complete a plan
to harmonize guard macros with LLVM. We've mostly worked out this on
FreeBSD systems by removing select headers from the installed set of
devel/llvm*, but that isn't a good solution for cross build.
The ugly stick here is this bit in the respective headers:
#ifndef EXTERN
#define EXTERN extern
#endif
with a follow-up #define EXTERN in a single .c file to push all of their
definitions into one spot. A pass should be made over these three later to
push these definitions into the correct files instead, but this will suffice
for now and at a more leisurely pace.
Peter reported that his dmesg was getting cluttered with
nfsrv_cache_session: no session
messages when he rebooted his NFS server and they did not seem useful.
He was correct, in that these messages are "normal" and expected when
NFSv4.1 or NFSv4.2 are mounted and the server is rebooted.
This patch silences the printf() during the grace period after a reboot.
It also adds the client IP address to the printf(), so that the message
is more useful if/when it occurs. If this happens outside of the
server's grace period, it does indicate something is not working correctly.
Instead of adding yet another nd_XXX argument, the arguments for
nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *".
This is mostly two problems spread out far and wide:
- ypldap_process should be declared properly
- debug is defined differently in many programs
For the latter, just extern it and define it everywhere that actually needs
it. This mostly works out nicely for ^/libexec/ypxfr, which can remove the
assignment at the beginning of main in favor of defining it properly.
-fno-common will become the default in GCC10/LLVM11.
The location of the device-tree blob is passed to the kernel by the
previous booting stage (i.e. BBL or OpenSBI). Currently, we leave it
untouched and mark the 1MB of memory holding it as unavailable.
Instead, do what is done by other fake_preload_metadata() routines and
copy to the DTB to KVA space. This is more in line with what loader(8)
will provide us in the future, and it allows us to reclaim the hole in
physical memory.
riscv: Make sure local hart's icache is synced in pmap_sync_icache
The only way to flush the local hart's icache is with a FENCE.I (or an
equivalent SBI call); a normal FENCE is insufficient and, for the
single-hart case, unnecessary.
Summary:
The parentheses being in the wrong place means that, for L3 pages,
oldpte has all bits except PTE_V cleared, and so all the subsequent
checks against oldpte will fail, causing us to bail out and not retry
the faulting instruction after an SFENCE.VMA. This causes a WITNESS +
INVARIANTS kernel to fault on the "Chisel P3" (BOOM-based) DARPA SSITH
GFE SoC in pmap_init when writing to pv_table and, being a nofault
entry, subsequently panic with:
panic: vm_fault_lookup: fault on nofault entry, addr: 0xffffffc004e00000
Add hwpmc support for Intel Atom Goldmont microarchitecture
Recognize new micro-architecture in hwpmc_intel driver. Based on Intel
document 325462-071US. Tested with tools/test/hwpmc/pmctest.py
on Atom E3930 SoC.
Don't drop packets having too many TCP option headers in mlx5en(4).
When using SACK it can happen there are multiple option headers.
Don't drop these packets, but instead limit the amount of inlining
to the maximum supported.