kib [Sat, 22 Feb 2020 20:50:30 +0000 (20:50 +0000)]
Fix NFS client deadlock when read reports truncated node.
If node attribute returned in the reply for read rpc indicate
truncation, and it happens that the vnode is exclusively locked,
update of the node attributes would try to shrink vnode size. Since
during the read some vnode pages were busied by the reading thread,
vnode_pager_setsize() deadlocks waiting for the busy state owned by
the caller.
Use a thread-local flag to indicate that NFS read owns some (s)busy
pages states and postpone the call to vnode_pager_setsize() until the
thread relinguishes the ownership.
Diagnosed by: rlibby
Tested by: pho, rlibby
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kevans [Sat, 22 Feb 2020 16:20:04 +0000 (16:20 +0000)]
vm_radix: prefer __builtin_unreachable() to an unreachable panic()
This provides the needed hint to GCC and offers an annotation for readers to
observe that it's in-fact impossible to hit this point. We'll get hit with a
a -Wswitch error if the enum applicable to the switch above were to get
expanded without the new value(s) being handled.
kp [Sat, 22 Feb 2020 13:23:27 +0000 (13:23 +0000)]
riscv: Set MACHINE_ARCH correctly
MACHINE_ARCH sets the hw.machine_arch sysctl in the kernel. In userspace
it sets MACHINE_ARCH in bmake, which bsd.cpu.mk uses to configure the
target ABI for ports.
For riscv64sf builds (i.e. soft-float) that needs to be riscv64sf, but
the sysctl didn't reflect that. It is static.
Set the define from the riscv makefile so that we correctly reflect our
actual build (i.e. riscv64 or riscv64sf), depending on what TARGET_ARCH
we were built with.
That still doesn't satisfy userspace builds (e.g. bmake), so check if
we're building with a software-floating point toolchain there. That
check doesn't work in the kernel, because it never uses floating point.
jeff [Sat, 22 Feb 2020 03:44:10 +0000 (03:44 +0000)]
Add an atomic-free tick moderated lazy update variant of SMR.
This enables very cheap read sections with free-to-use latencies and memory
overhead similar to epoch. On a recent AMD platform a read section cost
1ns vs 5ns for the default SMR. On Xeon the numbers should be more like 1
ns vs 11. The memory consumption should be proportional to the product
of the free rate and 2*1/hz while normal SMR consumption is proportional
to the product of free rate and maximum read section time.
While here refactor the code to make future additions more
straightforward.
Name the overall technique Global Unbound Sequences (GUS) and adjust some
comments accordingly. This helps distinguish discussions of the general
technique (SMR) vs this specific implementation (GUS).
kevans [Sat, 22 Feb 2020 03:14:05 +0000 (03:14 +0000)]
sh: fix read builtin on 32-bit systems
Specifically, any system with a 32-bit size_t; -residue is calculated as a
32-bit *then* promoted to the 64-bit off_t and the result is ultimately
wrong. This resulted in what would appear to be truncated output, as only
the first line would be read.
Correct it by just making residue an off_t to begin with, since this is what
lseek will take anyways.
Reported by: antoine, dim
Triaged by: cem
Tested by: kevans
X-MFC-With: r358152
imp [Fri, 21 Feb 2020 22:44:22 +0000 (22:44 +0000)]
We pass a pointer to the flags to dabitsysctl, not an integer. Adjust the
handler to accept a poitner to a u_int. To make the type of the softc flags
stable and defined, make it a u_int. Cast the enum types to u_int for arg2 so
when passing to dabitsysctl it's a u_int.
kevans [Fri, 21 Feb 2020 18:21:57 +0000 (18:21 +0000)]
fetch(3): plug some leaks
In the successful case, sockshost is not freed prior to return.
The failure case can now be hit after fetch_reopen(), which was not true
before. Thus, we need to make sure to clean up all of the conn resources
which will also close sd. For all of the points prior to fetch_reopen(), we
continue to just close sd.
kaktus [Fri, 21 Feb 2020 16:32:17 +0000 (16:32 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (7 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
kaktus [Fri, 21 Feb 2020 16:23:00 +0000 (16:23 +0000)]
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
manu [Fri, 21 Feb 2020 09:28:45 +0000 (09:28 +0000)]
linuxkpi: Move shmem related functions in it's own file
For drmkpi (D23085) we don't want the Linux struct file as we don't emulate
everything. Also the prototypes should be in shmem_fs.h to have 100%
compatibility with Linux.
glebius [Fri, 21 Feb 2020 04:10:41 +0000 (04:10 +0000)]
Revert one half of previous change r357558. Don't enter the epoch on
sends to control socket. Control socket messages can run constructors
of nodes and other stuff that is allowed to M_WAITOK.
mjg [Fri, 21 Feb 2020 01:44:31 +0000 (01:44 +0000)]
vfs: stop duplicating vnode work in audit during path lookup
Duplicating the work was putting an avoidable requirement that the filedesc
lock is held across the entire operation (otherwise by the time audit reads
vnode pointers another thread in the same process can chdir somewhere else,
making audit log things using different vnode than the one which will be
used for actual lookup).
Do the obvious thing and pass down vnodes which will be used.
vangyzen [Thu, 20 Feb 2020 23:53:48 +0000 (23:53 +0000)]
clamp kernel dump compression level when using gzip
If the configured compression level for kernel dumps
it outside the supported range, clamp it to the closest
supported level. Previously, dumpon would fail.
zstd already does this internally, so the compressor
needs no change.
bcran [Thu, 20 Feb 2020 21:29:59 +0000 (21:29 +0000)]
dtc: remove unknown option printf, since getopt will print it
Since we don't set opterr to 0, getopt prints a message when it
encounters an unknown/invalid option. We therefore don't need to
print our own message in the default handler.
csjp [Thu, 20 Feb 2020 21:12:10 +0000 (21:12 +0000)]
- Implement -h (human readable) for the size of the underlying block disk.
Currently, the size of the swap device is unconditionally reported using
blocks, even if -h has been used.
- While here, switch to CONVERT_BLOCKS() instead of CONVERT() which will
avoid overflowing size counters (in human readable form see: r196244)
- Update the column headers to reflect that a size is being reported instead
of the block size units being used
vmaffione [Thu, 20 Feb 2020 21:07:23 +0000 (21:07 +0000)]
bhyve: enable virtio-net mergeable rx buffers for tap(4)
This patch adds a new netbe_peek_recvlen() function to the net
backend API. The new function allows the virtio-net receive code
to know in advance how many virtio descriptors chains will be
needed to receive the next packet. As a result, the implementation
of the virtio-net mergeable rx buffers feature becomes efficient,
so that we can enable it also with the tap(4) backend. For the
tap(4) backend, a bounce buffer is introduced to implement the
peeck_recvlen() callback, which implies an additional packet copy
on the receive datapath. In the future, it should be possible to
remove the bounce buffer (and so the additional copy), by
obtaining the length of the next packet from kevent data.
kp [Thu, 20 Feb 2020 17:26:08 +0000 (17:26 +0000)]
virtio: Pass the interrupt type in mmio mode
When we register an interrupt handler we need to pass the intr_type along in
bus_setup_intr().
The interrupt type matters because it is used to decide if we need to enter
NET_EPOCH. That meant that vtmmio-based if_vtnet did not, which led to panics
with INVARIANTS set.
mjg [Thu, 20 Feb 2020 16:58:19 +0000 (16:58 +0000)]
vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd
and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found
parent directory. If the terminal vnode is a directory we can resolve it using
usual means. Otherwise we can use the name saved by lookup and resolve the
parent.
kib [Thu, 20 Feb 2020 15:34:02 +0000 (15:34 +0000)]
Do not read sigfastblock word on syscall entry.
On machines with SMAP, fueword executes two serializing instructions
which can be seen in microbenchmarks.
As a measure to restore microbenchmark numbers, only read the word on
the attempt to deliver signal in ast(). If the word is set, signal is
not delivered and word is kept, preventing interruption of
interruptible sleeps by signals until userspace calls
sigfastblock(UNBLOCK) which clears the word.
This way, the spurious EINTR that userspace can see while in critical
section is on first interruptible sleep, if a signal is pending, and
on signal posting. It is believed that it is not important for rtld
and lbithr critical sections. It might be visible for the application
code e.g. for the callback of dl_iterate_phdr(3), but again the belief
is that the non-compliance is acceptable. Most important is that the
retry of the sleeping syscall does not interrupt unless additional
signal is posted.
For now I added the knob kern.sigfastblock_fetch_always to enable the
word read on syscall entry to be able to diagnose possible issues due
to spurious EINTR.
While there, do some code restructuting to have all sigfastblock()
handling located in kern_sig.c.
Reviewed by: jeff
Discussed with: mjg
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D23622
bz [Thu, 20 Feb 2020 10:56:12 +0000 (10:56 +0000)]
ip6_output: improve extension header handling
Move IPv6 source address checks from after extension header heandling
to the top of the function. If we do not pass these checks there is
no reason to do a lot of work upfront.
Fold extension header preparations and length calculations together into
a single branch and macro rather than doing them sequentially.
Likewise move extension header concatination into a single branch block
only doing it if we recorded any extension header length length.
bapt [Thu, 20 Feb 2020 09:12:07 +0000 (09:12 +0000)]
ncurses: bump shlib number to version 9
ABI has change in between ncurses 5 or 6. While theorically ncurses 6 is buildable with
backward compatibility, I fail at building in a way where the application linked against
the previous version of ncurses are rendering properly.
Let's go on the new ABI which provides all the latest features.
A compat12x package is cooking for backward compatibility
adrian [Thu, 20 Feb 2020 07:12:43 +0000 (07:12 +0000)]
[ath] Attempt to fix epoch handling.
The epoch stuff with taskqueues works fine if the driver never calls
the receive path in other contexts, but this driver does. If there was
a chip reset during active receive then part of the reset will call
the receive path to flush out any active packets before reinitialising
the receive queue and that needs to be done with the epoch held.
So:
* make the receive task a normal task again
* explicitly call epoch enter/exit around the legacy and newer DMA
receive paths
* add a couple of epoch asserts to ensure that the receive packet
path itself is called with epoch held.
This fixes it on my Atom eeepc laptop (circa 2010!) that I did
all of my initial 802.11n work in this driver and net80211.
Tested:
* AR9285, STA mode
TODO:
* Test on EDMA chipset (AR9380)
* Test in AP/adhoc modes, just to be sure (eg for beacon
receive processing in particular.)
pfg [Thu, 20 Feb 2020 03:54:07 +0000 (03:54 +0000)]
/etc/services: attempt bring the database to this century.
Document better this file, updating the URL to the IANA registry and closely
match the official services.
For system ports (0 to 1023) we now try to follow the registry closely, noting
some historical differences where applicable.
For the User ports (1024 - 49151) we try to keep some sensible balance only
of services that are likely to be found on FreeBSD/UNIX systems. This attempts
to strike a balance between complexity and usefulness.
As a side effect: drop references to unofficial Kerberos IV which was EOL'ed
on Oct 2006[1]. While it is conceivable some people may still use it in some
very old FreeBSD machines that can't be replaced easily, the use of it is
considered a security risk. Also drop the unofficial netatalk, which we
supported long ago in the kernel but was dropped long ago.
hrs [Thu, 20 Feb 2020 03:01:27 +0000 (03:01 +0000)]
Improve performance of "read" built-in command when using a seekable
fd.
The read built-in command calls read(2) with a 1-byte buffer because
newline characters need to be detected even on a byte stream which
comes from a non-seekable file descriptor. Because of this, the
following script calls >6,000 read(2) to show a 6KiB file:
while read IN; do echo "$IN"; done < /COPYRIGHT
When the input byte stream is seekable, it is possible to read a data
block and then reposition the file pointer to where a newline
character found. This change adds a small buffer to do this and
reduces the number of read(2) calls.
Theoretically, multiple built-in commands reading the same seekable
byte stream in a single pipe chain can share the buffer. However,
this change just makes a single invocation of the read built-in
allocate a buffer and deallocate it every time for simplicity.
Although this causes read(2) to read the same regions multiple times,
the performance penalty should be small compared to the reduction of
read(2) calls.
imp [Thu, 20 Feb 2020 01:33:01 +0000 (01:33 +0000)]
Don't convert all lower-layer errors to EIO.
Don't convert all lower layer errors to EIO. Instead, pass the actual error up
the stack. This will allow the upper layers that look for ENXIO to react
properly to that signal from the lower layers and, for UFS, unmount the
filesystem.
imp [Thu, 20 Feb 2020 00:46:22 +0000 (00:46 +0000)]
Move smbios.c to libsa.
smbios used to be an i386 only kinda weird quirk to the x86
architecture. But UEFI picked it up, dusted it off and now it's many
other locations. Make it base technology by moving it to libsa and
fixing up the compliation. The code has issues with unaligned access
still, but that will be addressed in a followup commit.
imp [Thu, 20 Feb 2020 00:46:16 +0000 (00:46 +0000)]
Create ptov() function.
Create a ptov() function. It's basically the same as the btx PTOV
macro, but works everywhere. smbios needs this to translate addresses,
but the translation differs between BIOS booting and EFI booting. Make
it a function so one smbios.o can be used everywhere. Provide
definitions for it in the two loaders affected.
imp [Thu, 20 Feb 2020 00:34:46 +0000 (00:34 +0000)]
Don't spam the console with an additional, and useless, error message.
There's no need to spam the console with this error message. If there's an I/O
error, the disk/cam driver will report it at the lower levels. If that's an
actual problem, the upper layers will report that.
jeff [Wed, 19 Feb 2020 22:34:22 +0000 (22:34 +0000)]
Silence a gcc warning about no return from a function that handles every
possible enum in a switch statement. I verified that this emits nothing
as expected on clang. radix relies on constant propagation to eliminate
any branching from these access routines.
dim [Wed, 19 Feb 2020 21:12:59 +0000 (21:12 +0000)]
Take LINKER_FREEBSD_VERSION from numerical field after dash
Summary:
With COMPILER_FREEBSD_VERSION, we use a numeric value that we bump each
time we make a change that requires re-bootstrapping, but with the
linker variant, we instead take the entire part after "FreeBSD", as in
this example version output:
We should only look at the numerical field we append after a dash
instead. This review attempts to make it so.
The only thing I am not happy about is the post-processing of awk output
in Makefile.inc1. I notice that our awk does not have gensub(), so it
can't substitute a numbered sub-regex with \1, \2, etc. Suggestions
welcome. :)
jeff [Wed, 19 Feb 2020 19:58:31 +0000 (19:58 +0000)]
Use SMR to provide a safe unlocked lookup for vm_radix.
The tree is kept correct for readers with store barriers and careful
ordering. The existing object lock serializes writers. Consumers
will be introduced in later commits.
jeff [Wed, 19 Feb 2020 18:48:46 +0000 (18:48 +0000)]
Use per-domain locks for the bucket cache.
This gives much better concurrency when there are a large number of
cores per-domain and multiple domains. Avoid taking the lock entirely
if it will not be productive. ROUNDROBIN domains will have mixed
memory in each domain and will load balance to all domains.
While here refactor the zone/domain separation and bucket limits to
simplify callers.
kp [Wed, 19 Feb 2020 16:44:16 +0000 (16:44 +0000)]
bridge tests: Remove unneeded 'All rights reserved.'
The FreeBSD foundation no longer requires this, as per
https://lists.freebsd.org/pipermail/svn-src-all/2019-February/177215.html and
private communications.
kevans [Wed, 19 Feb 2020 14:52:32 +0000 (14:52 +0000)]
libsysdecode: grab shmflags from sys/mman.h, add decode method
Any SHM_* flag here is (and likely will continue to be) a shmflag that may
be passed to shm_open2(), with exception to SHM_ANON. This is a prereq to
adding appropriate support to truss/kdump.
kevans [Wed, 19 Feb 2020 14:32:55 +0000 (14:32 +0000)]
kdump: decode SHM_ANON as first arg to legacy shm_open(2)
The first argument to shm_open(2) as well as shm_open2(2) may be a path or
SHM_ANON. Decode SHM_ANON, at least- paths will show up as namei results in
kdump output, which may be sufficient; in those cases, we'll have printed an
address.
Future commits will add support for shm_open2() to libsysdecode/truss/kdump.
carlavilla [Wed, 19 Feb 2020 12:49:49 +0000 (12:49 +0000)]
Add some HISTORY sections to manpages
environ(7) was in AT&T Version 7
ac(8): Add a HISTORY section
sa(8): Add a HISTORY section
sqrt(3): Add the actual sqrt function to the HISTORY section
jeff [Wed, 19 Feb 2020 08:15:20 +0000 (08:15 +0000)]
Type validating smr protected pointer accessors.
This API is intended to provide some measure of safety with SMR
protected pointers. A struct wrapper provides type checking and
a guarantee that all access is mediated by the API unless abused. All
modifying functions take an assert as an argument to guarantee that
the required synchronization is present.
hrs [Wed, 19 Feb 2020 06:28:55 +0000 (06:28 +0000)]
Add _BIX (Battery Information Extended) object support.
ACPI Control Method Batteries have a _BIF and/or _BIX object which
provide static properties of the battery. FreeBSD acpi_cmbat module
supported _BIF object only, which was deprecated as of ACPI 4.0.
_BIX is an extended version of _BIF defined in ACPI 4.0 or later.
As of writing, _BIX has two revisions. One is in ACPI 4.0 (rev.0) and
another is in ACPI 6.0 (rev.1). It seems that hardware vendors still
stick to _BIF only or _BIX rev.0 + _BIF for the maximum compatibility.
Microsoft requires _BIX rev.0 for Windows machines, so there are some
laptop machines with _BIX rev.0 only. In this case, FreeBSD does not
recognize the battery information.
After this change, the acpi_cmbat module gets battery information from
_BIX or _BIF object and internally uses _BIX rev.1 data structure as
the primary information store in the kernel. ACPIIO_BATT_GET_BI[FX]
returns an acpi_bi[fx] structure built by using information obtained
from a _BIF or a _BIX object found on the system. The revision number
field can be used to check which field is available. The acpiconf(8)
utility will show additional information if _BIX is available.
Although ABIs of ACPIIO_BATT_* were changed, the existing APIs for
userland utilities are not changed and the backward-compatible ABIs
are provided. This means that older versions of acpiconf(8) can also
work with the new kernel. The (union acpi_battery_ioctl_arg) was
padded to 256 byte long to avoid another ABI change in the future.
A _BIX object with its revision number >1 will be treated as
compatible with the rev.1 _BIX format.
rlibby [Wed, 19 Feb 2020 04:46:41 +0000 (04:46 +0000)]
powerpc: unconditionally mark SLB zones UMA_ZONE_CONTIG
PR: 244118
Reported by: Francis Little <oggy at farscape.co.uk>
Tested by: Francis Little, Mark Millard <marklmi at yahoo.com>
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23729
kevans [Wed, 19 Feb 2020 02:34:56 +0000 (02:34 +0000)]
certctl(8): switch to install(1) to fix DESTDIR support
"Oops" - ln(1) is fine and dandy, but when you're using DESTDIR...it's not-
the path will almost certainly be invalid once the root you've just
installed to is relocated, perhaps to /.
Switch to install(1) using `-l rs` to calculate the relative symlink between
the two, which should work just fine in all cases.