erj [Wed, 27 Feb 2019 22:26:18 +0000 (22:26 +0000)]
ixgbe(4): Fix panic triggered by assertion from interrupt
r344162 exposed a bug in one of ixgbe's interrupt filters; they are never
supposed to return 0. Fix the interrupt filter to return the proper nonzero
return value.
imp [Wed, 27 Feb 2019 22:16:59 +0000 (22:16 +0000)]
Unconditionally support unmapped BIOs. This was another shim for
supporting older kernels. However, all supported versions of FreeBSD
have unmapped I/Os (as do several that have gone EOL), remove it. It's
unlikely the driver would work on the older kernels anyway at this
point.
imp [Wed, 27 Feb 2019 22:05:01 +0000 (22:05 +0000)]
Remove #ifdef code to support FreeBSD versions that haven't been
supported in years. A number of changes have been made to the driver
that likely wouldn't work on those older versions that aren't properly
ifdef'd and it's project policy to GC such code once it is stale.
mav [Wed, 27 Feb 2019 21:29:21 +0000 (21:29 +0000)]
Refactor command ordering/blocking mechanism in CTL.
Replace long per-LUN queue of blocked commands, scanned on each command
completion and sometimes even twice, causing up to O(n^^2) processing cost,
by much shorter per-command blocked queues, scanned only when respective
command completes, and check only commands before the previous blocker,
reducing cost to O(n).
While there, unblock aborted commands to make them "complete" ASAP to be
removed from the OOA queue and so not waste time ordering other commands
against them. Aborted commands that were not sent to execution yet should
have no visible side effects, so this is safe and easy optimization now,
comparing to commands already in processing, which are a still pain.
Together those two optimizations should fix quite pathological case, when
due to backend slowness CTL accumulated many thousands of blocked requests,
partially aborted by initiator and so supposedly not even existing, but
still wasting CTL CPU time.
manu [Wed, 27 Feb 2019 21:04:40 +0000 (21:04 +0000)]
xhci_mv: Move the driver to generic_xhci
Marvell XHCI is in fact generic-xhci, so move the driver and
add the compatible string.
While here, get and enable the phy if the dtb provide one.
The xhci bindings state that phys should be in a 'phys' property but
Marvell DTS uses 'usb-phy', only add support for 'usb-phy' for now.
jhb [Wed, 27 Feb 2019 20:24:23 +0000 (20:24 +0000)]
Various cleanups to the management of multiple TCP stacks.
- Use strlcpy() with sizeof() instead of strncpy().
- Simplify initialization of TCP functions structures.
init_tcp_functions() was already called before the first call to
register a stack. Just inline the work in the SYSINIT and remove
the racy helper variable. Instead, KASSERT that the rw lock is
initialized when registering a stack.
- Protect the default stack via a direct pointer comparison.
The default stack uses the name "freebsd" instead of "default" so
this protection wasn't working for the default stack anyway.
manu [Wed, 27 Feb 2019 17:30:28 +0000 (17:30 +0000)]
mmc: dwmmc: Match on "rockchip,rk3288-dw-mshc" compatible
This is the common denominator for rockchip compatible from RK3288 to RK3399.
The other compatible are generally present in the DTS but the controllers
are the same.
manu [Wed, 27 Feb 2019 14:20:28 +0000 (14:20 +0000)]
arm64: rockchip: clk_pll: Multiple improvement
Remove the mode_val from the clock definition as it's a bit unreadable.
Use mode_shift to represent which bit control the mode in the register.
Simplify some case where we can avoid a register read before changing it.
Set the PLL back to normal mode after the PLL have stabilized.
bapt [Wed, 27 Feb 2019 13:49:41 +0000 (13:49 +0000)]
Fix a regression introduced in r344569
Import a fix from illumos (thanks Toomas Soomas for pointing at it)
See https://www.illumos.org/issues/10205 for more details
Illumos commit: https://github.com/illumos/illumos-gate/commit/247b7da039fd88350c50e3d7fef15bdab6bef215
luporl [Wed, 27 Feb 2019 13:24:42 +0000 (13:24 +0000)]
Fix kldxref on PowerPC64
When using kldxref on kernel modules built with clang8 + lld8,
kldxref would be unable to find the modules metadata information,
because PowerPC64 was using the ef_nop.c implementation of
ef_reloc().
When GNU LD was used, it was also relocating the metadata section of
the .ko file. LLD does not do this, but only generate dynamic
relocations for it. With minor changes, ef_powerpc.c can now work
for PowerPC64 too.
ian [Wed, 27 Feb 2019 04:58:18 +0000 (04:58 +0000)]
Add support to fdt_slicer for the new style partition data documented in
devicetree/bindings/mtd/partition.txt.
In the old style, all the children of the device node which did not have a
compatible property were the partitions. In the new style, there is a child
node of the device which has a compatible string of "fixed-partitions", and
its children are the individual partitions.
Also, support the read-only property by setting the corresponding slice flag.
ian [Wed, 27 Feb 2019 04:16:32 +0000 (04:16 +0000)]
Rename some functions and variables to have shorter names, which allows
unwrapping multiple lines of code. Also, convert some short multiline
comments into single-line comments. Change old-school FALSE to false.
All in all, no functional changes, it's just more compact and readable.
jhibbits [Wed, 27 Feb 2019 03:30:49 +0000 (03:30 +0000)]
powerpc/mpc85xx: Synchronize timebase the platform correct way
Summary:
To safely synchronize timebase we need to disable the timebase on all
cores, set timebase, and resynchronize. This adds two new devices, mutually
exclusive, which attach on the SoC simplebus, to freeze and unfreeze the
timebase. The devices are singletons, and platform-specific, so no reason
to make them optional and in separate files.
This was found to be necessary for top(1) to work correctly on an AmigaOne
X5000 (P5020 SoC). It also fixes bufdaemon and bufspacedaemon hangs at
shutdown.
Test Plan: Regression test on various Book-E hardware.
kib [Tue, 26 Feb 2019 19:55:03 +0000 (19:55 +0000)]
Modularize xz.
Embedded lzma decompression library becomes a module usable by other
consumers, in addition to geom_uzip.
Most important code changes are
- removal of XZ_DEC_SINGLE define, we need the code to work
with XZ_DEC_DYNALLOC;
- xz_crc32_init() call is removed from geom_uzip, xz module handles
initialization on its own.
xz is no longer embedded into geom_uzip, instead the depend line for
the module is provided, and corresponding kernel option is added to
each MIPS kernel config file using geom_uzip.
The commit also carries unrelated cleanup by removing excess "device geom_uzip"
in places which were missed in r344479.
markj [Tue, 26 Feb 2019 16:34:43 +0000 (16:34 +0000)]
Remove illumos-specific code from the x86 fasttrap_isa.c.
The file has not been touched upstream in over a decade, and the nature
of the code means that a lot of FreeBSD-specific bits are required. Remove
the dead code to improve readability. No functional change intended.
Discussed with: cem
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
markj [Tue, 26 Feb 2019 16:31:47 +0000 (16:31 +0000)]
Remove stub fasttrap implementations.
No platforms except i386, amd64 and powerpc implement fasttrap; the
fasttrap files for other arches do not contain any code and bloat
the output from cscope, so just remove them.
manu [Tue, 26 Feb 2019 13:18:14 +0000 (13:18 +0000)]
arm64: rockchip: rk805: Map the regulator
No map function was provided before so every regulator lookup resolved
the regulator with id 1, as it uses the default mapper, which is wrong.
Correctly map the regulators.
While here remove some debug printfs and make them disable by default.
manu [Tue, 26 Feb 2019 13:16:05 +0000 (13:16 +0000)]
arm64: rockchip: rk3328_pll: Multiple improvement
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here, when switching PLL frequency, first switch it to slow mode.
When set to slow mode the PLL clock will be the external oscillator.
Changing the PLL parameters while its output is used can cause hang (sometimes).
manu [Tue, 26 Feb 2019 13:15:31 +0000 (13:15 +0000)]
arm64: rockchip: clk: ARM CLK improvement
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here set the parent after changing its freqeuncy, this reduce the time
between changing the parent and changing the divider for the arm clock.
manu [Tue, 26 Feb 2019 13:14:49 +0000 (13:14 +0000)]
arm64: rockchip: clk: rk_clk_composite: Properly use the mask bits
RockChip clocks register have a write mask in the upper 16 bits, if a 1
is present the corresponding bit in the lower 16 ones is set.
Use this instead of always setting the mask to 0xFFFF0000.
This avoids a read of the register.
While here add some debug printf useful for debuging clock problems
kib [Tue, 26 Feb 2019 09:45:44 +0000 (09:45 +0000)]
i386 PAE: avoid atomic for pte_store() where possible.
Instead carefully write upper word, and only than the lower word with
PG_V, for previously invalid ptes. It provides some measurable system
time saving on buildworld.
Reviewed by: markj
Tested by: pho
Measured by: bde (early version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D19226
bde [Tue, 26 Feb 2019 09:44:10 +0000 (09:44 +0000)]
Attempt to fix build breakage in r344458.
Non-x86 arches use an inconsistently named header for the file containing
"pc" attributes, and the ifdef messes to include the right header were out
of date in the 2 files that I added to the MI files list.
Only amd64, arm, i386, mips, powerpc and sparc64 are supposed to support
syscons. Only arm and mips were out of date in the ifdef. Test
coverage for of syscons in arm is broken (turned off) in NOTES, but
syscons is in some other arm config files which universe detects as broken.
arm64 and riscv remain broken due to the opposite bug of not turning off
sc in NOTES, the same as before r344458 (see r344443).
The header is MD to contain possibly-non-"pc" encodings of attributes, but
since the attributes are essentially virtual in graphics mode and non-x86
arches only support graphics mode, the header has always been the same on
all arches except for different style bugs, so there should be only 1 MI
copy of it for syscons' use. It was used in pcvt and still gives an an
API and an ABI, so it should be public and MI near or in sys/consio.h.
bapt [Tue, 26 Feb 2019 08:18:34 +0000 (08:18 +0000)]
Implement parallel mounting for ZFS filesystem
It was first implemented on Illumos and then ported to ZoL.
This patch is a port to FreeBSD of the ZoL version.
This patch also includes a fix for a race condition that was amended
With such patch Delphix has seen a huge decrease in latency of the mount phase
(https://github.com/openzfs/openzfs/commit/a3f0e2b569 for details).
With that current change Gandi has measured improvments that are on par with
those reported by Delphix.
jah [Tue, 26 Feb 2019 04:56:10 +0000 (04:56 +0000)]
FFS: allow sendfile(2) to work with block sizes greater than the page size
Implement ffs_getpages_async(), which when possible calls the asynchronous
flavor of the generic pager's getpages function. When the underlying
block size is larger than the system page size, however, it will invoke
the (synchronous) buffer cache pager, followed by a call to the client
completion routine. This retains true asynchronous completion in the most
common (block size <= page size) case, which is important for the performance
of the new sendfile(2). The behavior in the larger block size case mirrors
the default implementation of VOP_GETPAGES_ASYNC, which most other
filesystems use anyway as they do not override the getpages_async method.
kevans [Tue, 26 Feb 2019 03:37:12 +0000 (03:37 +0000)]
stand: Remove unused i386 EFI MD bits
r328169 removed the copy of bootinfo that would've made this somewhat
functional. However, this is irrelevant- earlier work in r292338 was done to
exit boot services in the MI bi_load() rather than having N copies of the
GetMemoryMap/ExitBootServices dance.
i386 never quite caught up to that; ldr_enter was still being called but
the prereq for that, ldr_bootinfo, was no longer. As a consequence, this
ExitBootServices() was being called with a mapkey=0, clearly bogus, and
reportedly breaking the boot in some instances.
asomers [Tue, 26 Feb 2019 03:34:47 +0000 (03:34 +0000)]
ifconfig: eliminate trailing whitespace
Eliminate trailing whitespace on inet, inet6, and groups lines. I think the
"list txpower" command will still show some, but I'm not able to test that.
sobomax [Mon, 25 Feb 2019 23:45:36 +0000 (23:45 +0000)]
Improve error handling: bail out if one of the files scheduled
to go to the FS image we are making cannot be read (e.g. EPERM).
Current behaviour when we issue waring but still proceeed and
return success is definitely not correct: masking out error
condition as well as making a slighly inconsistent FS where
attempt to access the file in question ends up in EBADF. See
linked DR for details.
mckusick [Mon, 25 Feb 2019 21:58:19 +0000 (21:58 +0000)]
After a crash, a file that extends into indirect blocks may end up
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).
Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.
markj [Mon, 25 Feb 2019 19:47:27 +0000 (19:47 +0000)]
Fix handling of rights on stdio streams, take two.
Split the rights-limiting code into two cases: if one of the input
files isn't a regular file, use caph_limit_stream(3) instead of
open-coding the same logic; if both input files are regular files,
and the initial attempts to map them succeed, we limit the rights on
those files to CAP_MMAP_R.
Add a regression test for PR 234885.
PR: 234885
Reviewed by: delphij
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19216
markj [Mon, 25 Feb 2019 19:22:13 +0000 (19:22 +0000)]
Improve vmem tuning for platforms without a direct map.
On platforms without a direct map (i.e., platforms without
UMA_MD_SMALL_ALLOC defined), the boundary tag allocator reserves a
number of tags for use when allocating a new slab of boundary tags,
as such platforms require free boundary tags in order to allocate
boundary tags. r327899 increased the number of boundary tags required
for a KVA allocation in the worst case, and the aforementioned
reservation was not updated accordingly. In some cases, this could
lead to a system hang. Fix the problem by increasing this reservation.
Also reduce KVA_QUANTUM on systems lacking superpage support.
The previous import quantum (4MB with a 4KB page size) was quite large
for systems with limited KVA, and fragmentation in kernel_arena could
cause kernel memory allocation failures even with a substantial amount
of free KVA.
Reported and tested by: jhibbits
Reviewed by: alc, kib
No objections: jeff
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19337
sef [Mon, 25 Feb 2019 19:14:16 +0000 (19:14 +0000)]
Fix another bug introduced during the review process of r344140:
the tag wasn't being computed properly due to chaning a >= comparison
to an == comparison.
Specifically: CBC-MAC encodes the length of the authorization data
into the the stream to be encrypted/hashed. For short data, this is
two bytes (big-endian 16 bit value); for larger data, it's 6 bytes
(a prefix of 0xff, 0xfe, followed by a 32-bit big-endian length). And
there's a larger size, which is 10 bytes. These extra bytes weren't
being accounted for with the post-review code. The other bit that then came
into play was that OCF only calls the Update code with blksiz=16, which
meant that I had to ignore the length variable. (It also means that it
can't be called with a single buffer containing the AAD and payload;
however, OCF doesn't do this for the software-only algorithsm.)
I tested with this script:
ALG=aes-ccm
DEV=soft
for aad in 0 1 2 3 4 14 16 24 30 32 34 36 1020
do
for dln in 16 32 1024 2048 10240
do
echo "Testing AAD length ${aad} data length ${dln}"
/root/cryptocheck -A ${aad} -a ${ALG} -d ${DEV} ${dln}
done
done
emaste [Mon, 25 Feb 2019 18:22:20 +0000 (18:22 +0000)]
Make libifconfig INTERNALLIB
Instead of PRIVATELIB + NO_PIC. This avoids the need for the wlandebug
PIE special case added in r344211, and provides a stronger guarantee
against 3rd party software coming to depend on the API or ABI.
If / when we declare the API/ABI to be stable we can make it a normal
library.
Discussed with: bapt
Sponsored by: The FreeBSD Foundation
manu [Mon, 25 Feb 2019 17:40:00 +0000 (17:40 +0000)]
arm64: rockchip: clk: Set the write mask when setting the clock mux
RockChip clocks have a write mask in the upper 16bits of the mux register
which wasn't set in the set_mux function.
Also the wrong parent was tested instead of the real current one, when
switch parent, test with the current one before.
ian [Mon, 25 Feb 2019 17:30:01 +0000 (17:30 +0000)]
Resolve a name conflict when both SpiFlash and DataFlash devices are present.
Both SpiFlash (mx25l) and DataFlash (at45d) drivers create a disk device
with a name of /dev/flash/spiN where N is the driver's unit number. If
both types of devices are present in the same system, this creates a fatal
conflict that prevents attachment of whichever device attaches second
(because mx25l0 and at45d0 both try to create a spi0).
This gives each type of device a unique name (mx25lN or at45dN respectively)
and also adds an alias of spiN for compatibility. When both device types
appear in the same system, only the first to attach gets the spiN alias.
When the second device attaches there is a non-fatal warning that the alias
can't be created, but both devices are still accessible via their primary
names (and there is no need for the spiN name to work for backwards
compatibility on such a system, because it has never been possible to use
the spiN names when both devices exist).
ian [Mon, 25 Feb 2019 16:40:10 +0000 (16:40 +0000)]
Add a metadata entry for the AT45DB641E chip. This chip has the same 3-byte
jedec ID as its older cousin the AT45DB642D, but uses a different page size.
The only way to distinguish between the two chips is that the 2D chip has
0 bytes of extended ID info and the new 1E has 1 byte of extended ID. The
actual value of the extended ID byte is all zeroes. In other words, it's
the presence of the extended info that identifies this chip. (Presumably
a future upgrade might define non-zero values for the extended ID byte.)
np [Mon, 25 Feb 2019 16:28:13 +0000 (16:28 +0000)]
cxgbe(4): Updates to the default and hashfilter configurations.
- Do not use nvf = 4 as it is not really supported by the firmware.
Firmwares 1.23.3.0 and above will ignore it silently.
- Increase PF4's share of the VIs and let it use all of the RSS table.
ian [Mon, 25 Feb 2019 16:20:58 +0000 (16:20 +0000)]
Include the jedec "extended device information string" in the criteria used
to match a chip to our table of metadata describing the chips. At least one
new DataFlash chip has a 3-byte jedec ID identical to its predecessors and
differs only in the extended info, and it has different metadata requiring a
unique entry in the table. This paves the way for supporting such chips.
The metadata table now includes two new fields, extmask and extid. The two
bytes of extended info obtained from the chip are ANDed with extmask then
compared to extid, so it's possible to use only a subset of the extended
info in the matching.
We now always read 6 bytes of jedec ID info. Most chips don't return any
extended info, and the values read back for those two bytes may be
indeterminate, but such chips have extmask and extid values of 0x0000 in the
table, so the extid effectively doesn't participate in the matching on those
chips and it doesn't matter what they return in the extended info bytes.
0mp [Mon, 25 Feb 2019 15:03:50 +0000 (15:03 +0000)]
Add missing types to the sysctl(9) manual page
Update the diff to include other missing sysctl types found in sysctl.h.
Some of these sysctls are already documented in other pages (e.g counter(9)
and ZONE(9)), but they should at least be mentioned here for completeness.
This patch now documents all of the following:
- SYSCTL_BOOL/SYSCTL_ADD_BOOL
- SYSCTL_COUNTER_U64/SYSCTL_ADD_COUNTER_U64
- SYSCTL_COUNTER_U64_ARRAY/SYSCTL_ADD_COUNTER_U64_ARRAY
- SYSCTL_SBINTIME_MSEC/SYSCTL_ADD_SBINTIME_MSEC
- SYSCTL_SBINTIME_USEC/SYSCTL_ADD_SBINTIME_USEC
- SYSCTL_UMA_CUR/SYSCTL_ADD_UMA_CUR
- SYSCTL_UMA_MAX/SYSCTL_ADD_UMA_MAX
andrew [Mon, 25 Feb 2019 13:15:34 +0000 (13:15 +0000)]
Check the index hasn't changed after writing the cmp entry.
If an interrupt fires while writing the cmp entry we may have a partial
entry. Work around this by using atomic_cmpset to set the new index. If it
fails we need to set the previous index value and try again as the entry
may be in an inconsistent state.
This fixes messages similar to the following from syzkaller:
bad comp 224 type 2163727253
ian [Mon, 25 Feb 2019 03:29:12 +0000 (03:29 +0000)]
Switch to using config_intrhook_oneshot(). That allows the error handling
in the delayed attach to use early returns, which allows reducing the level
of indentation. So all in all, what looks like a lot of changes is really
no change in behavior, mostly just moving whitespace around.
bz [Sun, 24 Feb 2019 22:49:56 +0000 (22:49 +0000)]
Make arp code return (more) errors.
arprequest() is a void function and in case of error we simply
return without any feedback. In case of any local operation
or *if_output() failing no feedback is send up the stack for the
packet which triggered the arp request to be sent.
arpresolve_full() has three pre-canned possible errors returned
(if we have not yet sent enough arp requests or if we tried
often enough without success) otherwise "no error" is returned.
Make arprequest() an "internal" function arprequest_internal() which
does return a possible error to the caller. Preserve arprequest()
as a void wrapper function for external consumers.
In arpresolve_full() add an extra error checking. Use the
arprequest_internal() function and only return an error if non
of the three ones (mentioend above) are already set.
This will return possible errors all the way up the stack and
allows functions and programs to react on the send errors rather
than leaving them in the dark. Also they might get more detailed
feedback of why packets cannot be sent and they will receive it
quicker.
jilles [Sun, 24 Feb 2019 21:05:13 +0000 (21:05 +0000)]
sh: Add set -o pipefail
The pipefail option allows checking the exit status of all commands in a
pipeline more easily, at a limited cost of complexity in sh itself. It works
similarly to the option in bash, ksh93 and mksh.
Like ksh93 and unlike bash and mksh, the state of the option is saved when a
pipeline is started. Therefore, even in the case of commands like
A | B &
a later change of the option does not change the exit status, the same way
(A | B) &
works.
Since SIGPIPE is not handled specially, more work in the script is required
for a proper exit status for pipelines containing commands such as head that
may terminate successfully without reading all input. This can be something
like
(
cmd1
r=$?
if [ "$r" -gt 128 ] && [ "$(kill -l "$r")" = PIPE ]; then
exit 0
else
exit "$r"
fi
) | head
wulf [Sun, 24 Feb 2019 18:47:04 +0000 (18:47 +0000)]
evdev: export event device properties through sysctl interface
A big security advantage of Wayland is not allowing applications to read
input devices all the time. Having /dev/input/* accessible to the user
account subverts this advantage.
libudev-devd was opening the evdev devices to detect their types (mouse,
keyboard, touchpad, etc). This don't work if /dev/input/* is inaccessible.
With the kernel exposing this information as sysctls (kern.evdev.input.*),
we can work w/o /dev/input/* access, preserving the Wayland security model.