Ian Lepore [Sun, 18 Mar 2018 15:56:10 +0000 (15:56 +0000)]
Do not overwrite the contents of BIO_WRITE buffers. SPI inherently
transfers data in both directions at once. When writing to the device,
use a dummy buffer for the incoming data, not the same buffer as the
outgoing data. Writes are done in FLASH_PAGE_SIZE chunks, which is only
256 bytes, so just put the dummy buffer into the softc.
Here's the new development(7), which removes information that's
no longer relevant (read: most of what was there) and adds some
quick links to point newcomers in the right direction.
Mariusz Zaborski [Sun, 18 Mar 2018 15:24:45 +0000 (15:24 +0000)]
Update libcasper references to all new man pages.
Remove obsolete example. All services has they own example.
This example also show old type of limiting method which is
not recommended to use.
Mateusz Guzik [Sat, 17 Mar 2018 19:26:33 +0000 (19:26 +0000)]
locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.
Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.
This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
Jeff Roberson [Sat, 17 Mar 2018 18:14:49 +0000 (18:14 +0000)]
Move the dirty queues inside the per-domain structure. This resolves a bug
where we had not hit global dirty limits but a single queue was starved
for space by dirty buffers. A single buf_daemon is maintained for now.
Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ
keeping many bufs locked until a cg block is written. Document this with
a comment.
Fix sysctls to work with per-domain variables. Add more ddb debugging.
Warner Losh [Sat, 17 Mar 2018 17:18:29 +0000 (17:18 +0000)]
Add EFI to kernel options.
Some parts of MI modules will soon depend on whether EFI is available
or not. Add EFI to the list of kernel options so we can use it in
the modules build.
Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
that the entry is still used. This mechanism is incorporated into the
arpresolve()/nd6_resolve() routines. After the inpcb route cache
introduction, the packet path for the locally-originated packets changed,
passing cached lle pointer to the ether_output() directly. This resulted
in the arp/ndp entry expire each time exactly after the configured max_age
interval. During the small window between the ARP/NDP request and reply
from the router, most of the packets got lost.
Fix this behaviour by plugging datapath notification code to the packet
path used by route cache. Unify the notification code by using single
inlined function with the per-AF callbacks.
Warner Losh [Sat, 17 Mar 2018 16:04:06 +0000 (16:04 +0000)]
Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.
Conrad Meyer [Fri, 16 Mar 2018 22:25:33 +0000 (22:25 +0000)]
elftoolchain nm(1): Initialize allocated memory before use
In out of memory scenarios (where one of these allocations failed but
other(s) did not), nm(1) could reference the uninitialized value of these
allocations (undefined behavior).
Always initialize any successful allocations as the most expedient
resolution of the issue. However, I would encourage upstream elftoolchain
contributors to clean up the error path to just abort immediately, rather
than proceeding sloppily when one allocation fails.
Brooks Davis [Fri, 16 Mar 2018 22:23:04 +0000 (22:23 +0000)]
Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros.
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.
Conrad Meyer [Fri, 16 Mar 2018 21:10:36 +0000 (21:10 +0000)]
libdtrace: Fix another uninitialized dtt_flags UB
Like r331073, eliminate a UB by fully initializing the struct with a designated
initializer. Note that the similar src_dtt is not fully used, so a similar
treatment was not absolutely required. I chose to leave it alone. It
wouldn't hurt to do the same thing, though.
Conrad Meyer [Fri, 16 Mar 2018 18:50:26 +0000 (18:50 +0000)]
random(4): Poll for signals during large reads
Occasionally poll for signals during large reads of the /dev/u?random
devices. This allows cancellation via SIGINT of accidental invocations of
very large reads. (A 2GB /dev/random read, which takes about 10 seconds on
my 2017 AMD Zen processor, can be aborted.)
I believe this behavior was intended since 2014 (r273997), just not fully
implemented.
This is motivated by a potential getrandom(2) interface that may not
explicitly forbid extremely large reads on 64-bit platforms -- even larger
than the 2GB limit imposed on devfs I/O by default. Such reads, if they are
to be allowed, should be cancellable by the user or administrator.
Ian Lepore [Fri, 16 Mar 2018 18:16:27 +0000 (18:16 +0000)]
Use EFI RTC capabilities info when registering, add bootverbose diagnostics.
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.
Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.
Dimitry Andric [Fri, 16 Mar 2018 18:04:13 +0000 (18:04 +0000)]
Pull in r321999 from upstream clang trunk (by Ivan A. Kosarev):
[CodeGen] Fix TBAA info for accesses to members of base classes
Resolves:
Bug 35724 - regression (r315984): fatal error: error in backend:
Broken function found (Did not see access type in access path!)
https://bugs.llvm.org/show_bug.cgi?id=35724
Michael Tuexen [Fri, 16 Mar 2018 15:26:07 +0000 (15:26 +0000)]
Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.
Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.
PR: 226421
Reviewed by: bz, hiren, kib
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D13514
Ed Maste [Fri, 16 Mar 2018 14:46:38 +0000 (14:46 +0000)]
Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table. Move the table to a common
file to be used by all. Make it 'const int' to place it in .rodata.
(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)
This change should introduce no functional change; a followup will add
missing errno values.
Conrad Meyer [Fri, 16 Mar 2018 07:11:53 +0000 (07:11 +0000)]
Garbage collect unused chacha20 code
Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982)
and Apr 16 2017 (r317015). Only the latter is actually used by anything, so
just go ahead and garbage collect the unused version while it's still only
in CURRENT.
I'm not making any judgement on which implementation is better. If I pulled
the wrong one, feel free to swap the existing implementation out and replace
it with the other code (conforming to the API that actually gets used in
randomdev, of course). We only need one generic implementation.
Warner Losh [Fri, 16 Mar 2018 05:23:48 +0000 (05:23 +0000)]
Try polling the qpairs on timeout.
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.
This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.
PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix
Brooks Davis [Thu, 15 Mar 2018 21:42:49 +0000 (21:42 +0000)]
Add a request structure and make the implementation use it.
This allows compatibility translation to take place on the stack
(md_ioctl is too big) and is more suitable as a public interface within
the kernel than the kern_ioctl interface.
Except for the initialization of the md_req from the md_ioctl
(including detection of kernel md_file pointers) and the updating
of the md_ioctl prior to return, this is a mechanical replacment
of md_ioctl and mdio with md_req and mdr.
Jeff Roberson [Thu, 15 Mar 2018 19:23:07 +0000 (19:23 +0000)]
Eliminate pageout wakeup races. Take another step towards lockless
vmd_free_count manipulation. Reduce the scope of the free lock by
using a pageout lock to synchronize sleep and wakeup. Only trigger
the pageout daemon on transitions between states. Drive all wakeup
operations directly as side-effects from freeing memory rather than
requiring an additional function call.
David Bright [Thu, 15 Mar 2018 18:29:56 +0000 (18:29 +0000)]
Modify rc.d/fsck to handle new status from fsck/fsck_ffs
r328013 introduced a new error code from fsck_ffs that indicates that
it could not completely fix the file system; this happens when it
prints the message PLEASE RERUN FSCK. However, this status can happen
when fsck is run in "preen" mode and the rc.d/fsck script does not
handle that error code. Modify rc.d/fsck so that if "fsck -p"
("preen") returns the new status code (16) it will run "fsck -y", as
it currently does for a status code of 8 (the "standard error exit").
Brooks Davis [Thu, 15 Mar 2018 18:12:55 +0000 (18:12 +0000)]
Move implementation of ioctls into kern_*() functions.
Move locks from outside ioctl to the individual implementations.
This is the first step of changing the implementations to act on a
kernel-internal request struct rather than on struct md_ioctl and to
removing the use of kern_ioctl in mountroot.
The crash scenario goes like this: there's a thread waiting on "reinstate";
because it doesn't update the timeout counter it gets terminated by the
callout; at this point the maintenance thread starts the termination routine.
The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
the refcount, which allows the maintenance thread to free its resources. At
this point another thread receives a PDU. Boom.
PR: 222898, 219866
Reported by: Eugene M. Zheganin <emz at norma.perm.ru>
Tested by: Eugene M. Zheganin <emz at norma.perm.ru>
Reviewed by: mav@ (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net
Hiroki Sato [Thu, 15 Mar 2018 13:46:28 +0000 (13:46 +0000)]
Make getnameinfo(3) salen requirement less strict and
document details of salen in getnameinfo(3) manual page.
getnameinfo(3) returned EAI_FAIL when salen was not equal to
the length corresponding to the value specified by sa->sa_family.
However, POSIX or RFC 3493 does not require it and RFC 4038
Sec.6.2.3 shows an example passing sizeof(struct sockaddr_storage)
to salen.
This change makes the requirement less strict by accepting
salen up to sizeof(struct sockaddr_storage). It also includes
two more changes: one is to fix return values because both SUSv4
and RFC 3493 require EAI_FAMILY when the address length is invalid,
another is to fix sa_len dependency in PF_LOCAL.
Andriy Gapon [Thu, 15 Mar 2018 12:47:34 +0000 (12:47 +0000)]
zfs test suite: support device paths with intermediate directories
The code assumed that disks (devices) used for testing are always named
like /dev/foo, but there is no reason for that restriction and we can
easily support paths like /dev/stripe/bar.
Andriy Gapon [Thu, 15 Mar 2018 12:40:43 +0000 (12:40 +0000)]
zfs test suite: align zfs_destroy_005_neg: with upstream
The change is to account for a different order in which the recursive
destroy may be attempted. If we first try a dataset that can be destroyed
then it will be destroyed, but if we first try a dataset that cannot be
destroyed then we will not attempt to destroy the other dataset.
Andriy Gapon [Thu, 15 Mar 2018 09:16:10 +0000 (09:16 +0000)]
g_access: deal with races created by geoms that drop the topology lock
The problem is that g_access() must be called with the GEOM topology
lock held. And that gives a false impression that the lock is indeed
held across the call. But this isn't always true because many classes,
ZVOL being one of the many, need to drop the lock. It's either to
perform an I/O on the first open or to acquire a different lock (like in
g_mirror_access).
That, of course, can break many assumptions. For example,
g_slice_access() adds an extra exclusive count on the first open. As
described above, an underlying geom may drop the topology lock and that
would open a race with another thread that would also request another
extra exclusive count. In general, two consumers may be granted
incompatible accesses.
To avoid this problem the code is changed to mark a geom with special
flag before calling its access method and clear the flag afterwards. If
another thread sees that flag, then it means that the topology lock has
been dropped (either by the geom in question or downstream from it), so
it is not safe to make another access call. So, the second thread would
use g_topology_sleep() to wait until the flag is cleared and only then
would it proceed with the access.
Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea
https://www.illumos.org/issues/9164
This issue has been reported by Alan Somers as
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877
dmu_objset_refresh_ownership() first disowns a dataset (and releases
it) and then owns it again. There is an assert that the new dataset
object is the same as the old dataset object. When running ZFS Test
Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:
panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
== 0xfffff80021ab4800)
I see that the old dataset has dsl_dataset_evict_async() pending in
ds_dbu.dbu_tqent and its ds_dbuf is NULL.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>
https://www.illumos.org/issues/9164
This issue has been reported by Alan Somers as
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225877
dmu_objset_refresh_ownership() first disowns a dataset (and releases
it) and then owns it again. There is an assert that the new dataset
object is the same as the old dataset object. When running ZFS Test
Suite on FreeBSD we see this panic from zpool_upgrade_007_pos test:
panic: solaris assert: newds == os->os_dsl_dataset (0xfffff80045f4c000
== 0xfffff80021ab4800)
I see that the old dataset has dsl_dataset_evict_async() pending in
ds_dbu.dbu_tqent and its ds_dbuf is NULL.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <don.brady@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andriy Gapon <avg@FreeBSD.org>
Warner Losh [Wed, 14 Mar 2018 23:01:18 +0000 (23:01 +0000)]
When tearing down a queue pair, also delete the queue entries.
The NVME standard has required in section 7.2.6, since at least 1.1,
that a clean shutdown is signalled by deleting the subission and the
completion queues before setting the shutdown bit in CC. The 1.0
standard, apparently, did not and many of the early Intel cards didn't
care. Some newer cards care, at least one whose beta firmware can
scramble the card on an unclean shutdown. Linux has done this for some
time. To make it possible to move forward with an evaluation of this
pre-release card with wonky firmware, delete the queues on the card
when we delete the qpair structures.
Conrad Meyer [Wed, 14 Mar 2018 22:11:45 +0000 (22:11 +0000)]
vfs_bio.c: Apply cleanups motivated by Coverity analysis
It is believed that the conditions Coverity indicated were actually
impossible to hit. So this patch just adds a cleanup to only compute
v_mount once in brelse(), and in vfs_bio_getpages() always initializes error
to zero to appease the static analyzer.
Steven Hartland [Wed, 14 Mar 2018 21:32:23 +0000 (21:32 +0000)]
Fix mps deadlock when handling panic
During shutdown mps waits for its SSU requests to complete however when
performing a reboot after handling a panic the scheduler is stopped so
getmicrotime which is used can be non-functional.
Switch to using the same method as shutdown_panic to ensure we actually
complete.
In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
this to fail.
John Baldwin [Wed, 14 Mar 2018 20:49:51 +0000 (20:49 +0000)]
Fix the check for an empty send socket buffer on a TOE TLS socket.
Compare sbavail() with the cached sb_off of already-sent data instead of
always comparing with zero. This will correctly close the connection and
send the FIN if the socket buffer contains some previously-sent data but
no unsent data.
John Baldwin [Wed, 14 Mar 2018 20:46:25 +0000 (20:46 +0000)]
Remove TLS-related inlines from t4_tom.h to fix iw_cxgbe(4) build.
- Remove the one use of is_tls_offload() and the function. AIO special
handling only needs to be disabled when a TOE socket is actively doing
TLS offload on transmit. The TOE socket's mode (which affects receive
operation) doesn't matter, so remove the check for the socket's mode and
only check if a TOE socket has TLS transmit keys configured to determine
if an AIO write request should fall back to the normal socket handling
instead of the TOE fast path.
- Move can_tls_offload() into t4_tls.c. It is not used in critical paths,
so inlining isn't that important. Change return type to bool while here.
Devin Teske [Wed, 14 Mar 2018 19:23:17 +0000 (19:23 +0000)]
Fix bad error messages from dpv(3)
Before = dpv: <__func__>: posix_spawnp(3): No such file or directory
After = dpv: <path/cmd>: No such file or directory
Most notably, show the 2nd argument being passed to posix_spawnp(3)
so we know what path/cmd failed.
Also, we don't need to have "posix_spawnp(3)" in the error message
nor the function because that can [a] change and [b] traversed using
a debugger if necessary.