Warner Losh [Tue, 20 Mar 2018 03:37:14 +0000 (03:37 +0000)]
Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.
Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.
Warner Losh [Tue, 20 Mar 2018 03:37:09 +0000 (03:37 +0000)]
Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.
Warner Losh [Tue, 20 Mar 2018 03:36:51 +0000 (03:36 +0000)]
Note: this isn't a general thing. It only affects u-boot-based arm64
systems. Make sure the note says that specific case only. Also,
provide a recipe to do it.
Justin Hibbits [Tue, 20 Mar 2018 02:01:30 +0000 (02:01 +0000)]
Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a
pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to
intmax_t, then gcc whines when it gets cast to a pointer. Casting through
uintptr_t silences this warning.
Conrad Meyer [Tue, 20 Mar 2018 00:16:24 +0000 (00:16 +0000)]
blacklist: Fix minor memory leak in configuration parsing error case
Ordinarily, the continue clause of the for-loop would free 'line.' In this
case we instead return early, missing the free. Add an explicit free to
avoid the leak.
[ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.
Matt Joras [Mon, 19 Mar 2018 22:43:27 +0000 (22:43 +0000)]
Fix initialization of eventhandler mutex.
mtx_init does not do a copy of the name string it is passed. The
eventhandler code incorrectly passed the parameter string directly to
mtx_init instead of using the copy it makes. This was an existing
problem with the code that I dutifully copied over in my changes in r325621.
Reported by: Anton Rang <rang AT acm.org>
Reviewed by: rstone, markj
Approved by: rstone (mentor)
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14764
Kristof Provost [Mon, 19 Mar 2018 21:13:25 +0000 (21:13 +0000)]
pf: Fix memory leak in DIOCRADDTABLES
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.
Eric Joyner [Mon, 19 Mar 2018 20:55:05 +0000 (20:55 +0000)]
ixgbe(4): Update shared code, add support for X552 1G, fix bug
This patch will:
- Update ixgbe shared code
- Add support for Intel(R) Ethernet Connection X552 1000BASE-T
- Add error handling for link state check preventing VF from stopping traffic
after changing PF's MTU value
John Baldwin [Mon, 19 Mar 2018 19:09:15 +0000 (19:09 +0000)]
Revert r318180 and re-enable AIO tests on md(4) by default.
The 'physio' fast-path used by AIO requests on md(4) devices, is not
gated on the unsafe_aio knob. Prior to r327755, some AIO requests could
fail the fast-path and fall back to the slow-path (requests for devices
not supporting unmapped I/O and requests which failed with EFAULT during
the fast-path). However, those cases now return a suitable error rather
than using the slow-path.
Lawrence Stewart [Mon, 19 Mar 2018 16:37:47 +0000 (16:37 +0000)]
Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.
More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn
The implementation introduces four new sysctls:
- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
enable ABE for ECN-enabled TCP connections.
- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
multiplicative window decrease factor, specified as a percentage, applied to
the congestion window in response to a loss-based or ECN-based congestion
signal respectively. They default to the values specified in the draft i.e.
beta=50 and beta_ecn=80.
- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
non-zero to enable the use of standard beta (50% by default) when repairing
loss during an ECN-signalled congestion recovery episode. It enables a more
conservative congestion response and is provided for the purposes of
experimentation as a result of some discussion at IETF 100 in Singapore.
The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.
Submitted by: Tom Jones <tj@enoti.me>
Tested by: Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D11616
Kyle Evans [Mon, 19 Mar 2018 16:16:12 +0000 (16:16 +0000)]
Move /boot/overlays to /boot/dtb/overlays
The former is fairly vague; these are FDT overlays to be applied to the
running system, so /boot/dtb is a sensible location to put it without
cluttering up /boot/dtb even further if desired.
Kyle Evans [Mon, 19 Mar 2018 15:48:31 +0000 (15:48 +0000)]
lualoader: Setup default color scheme if we're using colors
The console may have been set for different colors before lualoader kicks
in; notably, a black-on-white color scheme is not necessarily what we're
expecting.
While here, make color.default() a composition of color.escape() instead of
rewriting the escape sequence to make it more obvious what it's achieving: a
white-on-black color scheme with no attributes set.
Kyle Evans [Mon, 19 Mar 2018 15:27:53 +0000 (15:27 +0000)]
Add note to UPDATING about UEFI changes requiring loader(8) update
These problems have only been observed with boards using U-Boot (e.g. ARM)
where virtual addresses are already set in the memory map by the firmware
and the firmware is expecting a call to SetVirtualAddressMap to be made.
I refrain from mentioning this in the note because this could also be the
case on some not-yet-tested firmware on amd64 and it's not a bad
recommendation for the general case.
Ed Maste [Mon, 19 Mar 2018 15:11:10 +0000 (15:11 +0000)]
linux*_sysvec.c: rationalize whitespace and comments
There's a fair amount of duplication between MD linuxulator files.
Make indentation and comments consistent between the three versions of
linux_sysvec.c to reduce diffs when comparing them.
Warner Losh [Sun, 18 Mar 2018 18:50:48 +0000 (18:50 +0000)]
Don't add links or cleanfiles for NO_OBJ case, in addition to not
creating them. Move them under the if after the all: target. They are
just defines, so it doesn't really matter where we have them.
Ian Lepore [Sun, 18 Mar 2018 18:37:47 +0000 (18:37 +0000)]
Add support for 4K and 32K erase block sizes. Many of the supported chips
have these flags set in the ident table, but there was no code to support
using the smaller erase sizes.
Ian Lepore [Sun, 18 Mar 2018 17:47:57 +0000 (17:47 +0000)]
Make all internal routines return an int error status, and check the
status at all call points. Combine the get_status and wait_for_ready
routines, since waiting for ready is the only reason to ever get status.
Ian Lepore [Sun, 18 Mar 2018 17:25:23 +0000 (17:25 +0000)]
Add sc_parent to the softc and use it in place of device_get_parent() calls
all over the place. Also pass the softc as the arg to all the internal
functions instead of passing a device_t and calling device_get_softc() in
each function.
Ian Lepore [Sun, 18 Mar 2018 16:52:31 +0000 (16:52 +0000)]
Bugfix: wait for writes/erases to complete after starting them, instead of
before starting them.
Using the wait-before logic would make sense if there was useful time-
consuming work that could be done between the end of one write and the
beginning of the next, but it also requires doing the wait-for-ready before
reading, because a prior write or erase could still be in progress. Reading
is the far more common case, so adding a whole extra bus transaction to
check for ready before each read would soak up any small gains that might be
had from doing async writes.
Mark Johnston [Sun, 18 Mar 2018 16:49:30 +0000 (16:49 +0000)]
Avoid dequeuing the fault page during a soft fault.
Such pages are re-enqueued at the end of the fault handler, preserving
LRU. Rather than performing two separate operations per fault, simply
requeue the page at the end of the fault (or bump its activation count
if it resides in PQ_ACTIVE, avoiding the page queue lock entirely).
This elides some page lock and page queue lock operations in common
cases, e.g., CoW faults.
Note that we must still dequeue the source page for "optimized" CoW
faults since the page may not remain enqueued while it is moved to
another object.
Mark Johnston [Sun, 18 Mar 2018 16:40:56 +0000 (16:40 +0000)]
Have vm_page_{deactivate,launder}() requeue already-queued pages.
In many cases the page is not enqueued so the change will have no
effect. However, the change is needed to support an optimization in
the fault handler and in some cases (sendfile, the buffer cache) it
was being emulated by the caller anyway.
Ian Lepore [Sun, 18 Mar 2018 16:10:14 +0000 (16:10 +0000)]
Remove a pointless KASSERT and reword a comment a bit. The KASSERT tested
for the same condition that the preceeding lines checked for and would have
returned EIO, so the assert could never possibly trigger (sc_sectorsize must
inherently be an integer multiple of FLASH_PAGE_SIZE).
Ian Lepore [Sun, 18 Mar 2018 15:56:10 +0000 (15:56 +0000)]
Do not overwrite the contents of BIO_WRITE buffers. SPI inherently
transfers data in both directions at once. When writing to the device,
use a dummy buffer for the incoming data, not the same buffer as the
outgoing data. Writes are done in FLASH_PAGE_SIZE chunks, which is only
256 bytes, so just put the dummy buffer into the softc.
Here's the new development(7), which removes information that's
no longer relevant (read: most of what was there) and adds some
quick links to point newcomers in the right direction.
Mariusz Zaborski [Sun, 18 Mar 2018 15:24:45 +0000 (15:24 +0000)]
Update libcasper references to all new man pages.
Remove obsolete example. All services has they own example.
This example also show old type of limiting method which is
not recommended to use.
Mateusz Guzik [Sat, 17 Mar 2018 19:26:33 +0000 (19:26 +0000)]
locks: slightly depessimize lockstat
The slow path is always taken when lockstat is enabled. This induces
rdtsc (or other) calls to get the cycle count even when there was no
contention.
Still go to the slow path to not mess with the fast path, but avoid
the heavy lifting unless necessary.
This reduces sys and real time during -j 80 buildkernel:
before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
Jeff Roberson [Sat, 17 Mar 2018 18:14:49 +0000 (18:14 +0000)]
Move the dirty queues inside the per-domain structure. This resolves a bug
where we had not hit global dirty limits but a single queue was starved
for space by dirty buffers. A single buf_daemon is maintained for now.
Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ
keeping many bufs locked until a cg block is written. Document this with
a comment.
Fix sysctls to work with per-domain variables. Add more ddb debugging.
Warner Losh [Sat, 17 Mar 2018 17:18:29 +0000 (17:18 +0000)]
Add EFI to kernel options.
Some parts of MI modules will soon depend on whether EFI is available
or not. Add EFI to the list of kernel options so we can use it in
the modules build.
Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.
Current arp/nd code relies on the feedback from the datapath indicating
that the entry is still used. This mechanism is incorporated into the
arpresolve()/nd6_resolve() routines. After the inpcb route cache
introduction, the packet path for the locally-originated packets changed,
passing cached lle pointer to the ether_output() directly. This resulted
in the arp/ndp entry expire each time exactly after the configured max_age
interval. During the small window between the ARP/NDP request and reply
from the router, most of the packets got lost.
Fix this behaviour by plugging datapath notification code to the packet
path used by route cache. Unify the notification code by using single
inlined function with the per-AF callbacks.
Warner Losh [Sat, 17 Mar 2018 16:04:06 +0000 (16:04 +0000)]
Only take out the periph lock when we're modifying the flags of the
softc for an async unit attention. CAM locks, sometimes, the periph
lock and other times does not. We were taking the lock always and
running into lock recursion issues on a non-recursive lock. Now we
take it selectively. It's not clear why xpt takes the lock selectively
before calling us, though, and that's still under investigation.
Conrad Meyer [Fri, 16 Mar 2018 22:25:33 +0000 (22:25 +0000)]
elftoolchain nm(1): Initialize allocated memory before use
In out of memory scenarios (where one of these allocations failed but
other(s) did not), nm(1) could reference the uninitialized value of these
allocations (undefined behavior).
Always initialize any successful allocations as the most expedient
resolution of the issue. However, I would encourage upstream elftoolchain
contributors to clean up the error path to just abort immediately, rather
than proceeding sloppily when one allocation fails.
Brooks Davis [Fri, 16 Mar 2018 22:23:04 +0000 (22:23 +0000)]
Add _IOC_NEWLEN() and _IOC_NEWTYPE() macros.
These macros take an existing ioctl(2) command and replace the length
with the specified length or length of the specified type respectively.
These can be used to define commands for 32-bit compatibility with fewer
opportunities for cut-and-paste errors then a whole new definition.
Conrad Meyer [Fri, 16 Mar 2018 21:10:36 +0000 (21:10 +0000)]
libdtrace: Fix another uninitialized dtt_flags UB
Like r331073, eliminate a UB by fully initializing the struct with a designated
initializer. Note that the similar src_dtt is not fully used, so a similar
treatment was not absolutely required. I chose to leave it alone. It
wouldn't hurt to do the same thing, though.
Conrad Meyer [Fri, 16 Mar 2018 18:50:26 +0000 (18:50 +0000)]
random(4): Poll for signals during large reads
Occasionally poll for signals during large reads of the /dev/u?random
devices. This allows cancellation via SIGINT of accidental invocations of
very large reads. (A 2GB /dev/random read, which takes about 10 seconds on
my 2017 AMD Zen processor, can be aborted.)
I believe this behavior was intended since 2014 (r273997), just not fully
implemented.
This is motivated by a potential getrandom(2) interface that may not
explicitly forbid extremely large reads on 64-bit platforms -- even larger
than the 2GB limit imposed on devfs I/O by default. Such reads, if they are
to be allowed, should be cancellable by the user or administrator.
Ian Lepore [Fri, 16 Mar 2018 18:16:27 +0000 (18:16 +0000)]
Use EFI RTC capabilities info when registering, add bootverbose diagnostics.
Make some small improvements to the efirtc driver by obtaining the clock
capabilities (resolution and whether the sub-second counters are reset) and
using the info when registering the clock. When the hardware zeroes out the
subsecond info on clock-set, schedule clock updates to happen just before
top-of-second, so that the RTC time is closely in-sync with kernel time.
Also, in the identify() routine, always add the driver if EFI runtime
services are available, then decide in probe() whether to attach the driver
or not. If not attaching and bootverbose is on, say why. All of this is
basically to avoid "silent failure" -- if someone thinks there should be an
efi rtc and it's not attaching, at least they can set bootverbose and maybe
get a clue from the output.
Dimitry Andric [Fri, 16 Mar 2018 18:04:13 +0000 (18:04 +0000)]
Pull in r321999 from upstream clang trunk (by Ivan A. Kosarev):
[CodeGen] Fix TBAA info for accesses to members of base classes
Resolves:
Bug 35724 - regression (r315984): fatal error: error in backend:
Broken function found (Did not see access type in access path!)
https://bugs.llvm.org/show_bug.cgi?id=35724
Michael Tuexen [Fri, 16 Mar 2018 15:26:07 +0000 (15:26 +0000)]
Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.
Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.
PR: 226421
Reviewed by: bz, hiren, kib
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D13514
Ed Maste [Fri, 16 Mar 2018 14:46:38 +0000 (14:46 +0000)]
Share a single bsd-linux errno table across MD consumers
Three copies of the linuxulator linux_sysvec.c contained identical
BSD to Linux errno translation tables, and future work to support other
architectures will also use the same table. Move the table to a common
file to be used by all. Make it 'const int' to place it in .rodata.
(Some existing Linux architectures use MD errno values, but x86 and Arm
share the generic set.)
This change should introduce no functional change; a followup will add
missing errno values.
Conrad Meyer [Fri, 16 Mar 2018 07:11:53 +0000 (07:11 +0000)]
Garbage collect unused chacha20 code
Two copies of chacha20 were imported into the tree on Apr 15 2017 (r316982)
and Apr 16 2017 (r317015). Only the latter is actually used by anything, so
just go ahead and garbage collect the unused version while it's still only
in CURRENT.
I'm not making any judgement on which implementation is better. If I pulled
the wrong one, feel free to swap the existing implementation out and replace
it with the other code (conforming to the API that actually gets used in
randomdev, of course). We only need one generic implementation.
Warner Losh [Fri, 16 Mar 2018 05:23:48 +0000 (05:23 +0000)]
Try polling the qpairs on timeout.
On some systems, we're getting timeouts when we use multiple queues on
drives that work perfectly well on other systems. On a hunch, Jim
Harris suggested I poll the completion queue when we get a timeout.
This patch polls the completion queue if no fatal status was
indicated. If it had pending I/O, we complete that request and
return. Otherwise, if aborts are enabled and no fatal status, we abort
the command and return. Otherwise we reset the card.
This may clear up the problem, or we may see it result in lots of
timeouts and a performance problem. Either way, we'll know the next
step. We may also need to pay attention to the fatal status bit
of the controller.
PR: 211713
Suggested by: Jim Harris
Sponsored by: Netflix