Ian Lepore [Sun, 30 Jul 2017 16:17:06 +0000 (16:17 +0000)]
Switch from using iic_transfer() to iicdev_readfrom/writeto(), mostly so
that transfers will be done with proper ownership of the bus. No
behavioral changes.
Alexander Motin [Sun, 30 Jul 2017 15:19:07 +0000 (15:19 +0000)]
Attach ichwd(4) only to ISA bus of the LPC bridge.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
Pull in r309503 from upstream clang trunk (by Richard Smith):
PR33902: Invalidate line number cache when adding more text to
existing buffer.
This led to crashes as the line number cache would report a bogus
line number for a line of code, and we'd try to find a nonexistent
column within the line when printing diagnostics.
This fixes an assertion when building the graphics/champlain port.
Scott Long [Sun, 30 Jul 2017 06:53:58 +0000 (06:53 +0000)]
Split the interrupt setup code into two parts: allocation and configuration.
Do the allocation before requesting the IOCFacts message. This triggers
the LSI firmware to recognize the multiqueue should be enabled if available.
Multiqueue isn't used by the driver yet, but this also fixes a problem with
the cached IOCFacts not matching latter checks, leading to potential problems
with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Ian Lepore [Sat, 29 Jul 2017 23:45:57 +0000 (23:45 +0000)]
Replace the pcf8563 i2c RTC driver with a new nxprtc driver which handles
all the chips in the NXP PCA212x and PCA/PCF85xx series. In addition to
supporting more chips, this driver uses the countdown timer on the chips as
a fractional seconds counter, giving it a resolution of about 15 milliseconds.
Rick Macklem [Sat, 29 Jul 2017 20:08:25 +0000 (20:08 +0000)]
Add a new "-N" option to umount(8), that does a forced dismount of an NFS mount
point.
The new "-N" option does a forced dismount of an NFS mount point, but avoids
doing any checking of the mounted-on path, so that it will not get hung
when a vnode lock is held by another hung process on the mounted-on vnode.
The most common case of this is a "umount" with the "-f" option.
Other than avoiding checking the mounted-on path, it performs the same
forced dismount as a successful "umount -f" would do.
This commit includes a content change to the man page.
Rick Macklem [Sat, 29 Jul 2017 19:52:47 +0000 (19:52 +0000)]
Add kernel support for the NFS client forced dismount "umount -N" option.
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.
Ian Lepore [Sat, 29 Jul 2017 17:00:23 +0000 (17:00 +0000)]
Add inline functions to convert between sbintime_t and decimal time units.
Use them in some existing code that is vulnerable to roundoff errors.
The existing constant SBT_1NS is a honeypot, luring unsuspecting folks into
writing code such as long_timeout_ns*SBT_1NS to generate the argument for a
sleep call. The actual value of 1ns in sbt units is ~4.3, leading to a
large roundoff error giving a shorter sleep than expected when multiplying
by the trucated value of 4 in SBT_1NS. (The evil honeypot aspect becomes
clear after you waste a whole day figuring out why your sleeps return early.)
Don't use libc++ when cross-building for gcc arches
Since we imported clang 5.0.0, the version check in Makefile.inc1 which
checks whether to use libc++ fires even when the compiler for the target
architecture is gcc 4.2.1. This is because only X_COMPILER_VERSION is
checked. Also check X_COMPILER_TYPE, so it will only use libc++ when an
external gcc toolchain is used.
Currently in Virtio driver without TSO/GSO features enabled, the max scatter
gather segments for the TX path can be 4, which limits the support for 9K JUMBO
frames. 9K JUMBO frames results in more than 4 scatter gather segments and
virtio driver fails to send the frame down to host OS. With TSO/GSO feature
enabled max scatter gather segments can be 64, then 9K JUMBO frames are fine,
this is making virtio driver to support JUMBO frames only with TSO/GSO.
Increasing the VTNET_MIN_TX_SEGS which is the case for non TSO/GSO to 32 to
support upto 64K JUMBO frames to Host.
Ed Schouten [Sat, 29 Jul 2017 08:35:07 +0000 (08:35 +0000)]
Be a bit more liberal about sysctl naming.
On the systems on which I tested this exporter, I never ran into metrics
that were named in such a way that they couldn't be exported to
Prometheus metrics directly. Now it turns out that on systems with NUMA,
the sysctl tree contains metrics named dev.${driver}.${index}.%domain.
For these metrics, the % in the name is problematic, as Prometheus
doesn't allow this symbol to be used.
Remove the assertions that were originally put in place to prevent the
exporter from generating malformed output and add code to deal with it
accordingly. For metric names, convert any unsupported character to an
underscore. For label values, perform string escaping.
Rick Macklem [Sat, 29 Jul 2017 02:25:49 +0000 (02:25 +0000)]
Fix possible crash for the NFSv4.1 pNFS client.
If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.
Rick Macklem [Fri, 28 Jul 2017 21:07:57 +0000 (21:07 +0000)]
Modify /etc/rc.d/nfsd so it doesn't force a startup of nfsuserd for NFSv4.
Given that RFC7530 allows uid/gids to be placed in owner/owner_group
strings directly, many NFSv4 environments don't need the nfsuserd.
This small patch modified /etc/rc.d/nfsd so that it does not force
startup of the nfsuserd daemon unless nfs_server_managegids is enabled.
This implies that nfsuserd_enable="YES" must be added to /etc/rc.conf
for NFSv4 server environments that use Kerberos mounts or clients that
do not support the uid/gid in string capability.
Since this could be considered a POLA violation, it will not be MFC'd.
Pull in r308891 from upstream llvm trunk (by Benjamin Kramer):
[CodeGenPrepare] Cut off FindAllMemoryUses if there are too many uses.
This avoids excessive compile time. The case I'm looking at is
Function.cpp from an old version of LLVM that still had the giant
memcmp string matcher in it. Before r308322 this compiled in about 2
minutes, after it, clang takes infinite* time to compile it. With
this patch we're at 5 min, which is still bad but this is a
pathological case.
The cut off at 20 uses was chosen by looking at other cut-offs in LLVM
for user scanning. It's probably too high, but does the job and is
very unlikely to regress anything.
Fixes PR33900.
* I'm impatient and aborted after 15 minutes, on the bug report it was
killed after 2h.
Pull in r308986 from upstream llvm trunk (by Simon Pilgrim):
[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)
D35067/rL308322 attempted to support up to 4 load pairs for memcmp
inlining which resulted in regressions for some optimized libc memcmp
implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the
memcmp expansion to a maximum of 2 load pairs (which matches what we
do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Sean Bruno [Fri, 28 Jul 2017 18:11:53 +0000 (18:11 +0000)]
binmiscctl should use modfind instead of kldfind
kldfind() only matches kernel modules, so if you link imgact_binmisc directly
into the kernel, binmiscctl can't find it, tries to load it, and errors
out with:
Can't load imgact_binmisc kernel module: File exists
A quick search of other base commands shows that the correct procedure is to
call modfind(), and then try kldload() if that fails.
PR: 218593
Submitted by: Dan Nelson <dnelson_1901@yahoo.com>
MFC after: 1 week
Marcin Wojtas [Fri, 28 Jul 2017 11:51:55 +0000 (11:51 +0000)]
Fix remapping VM attributes on Armada 38x
pmap_remap_vm_attr() function requires indexes to
pte2_attr_tab as the arguments (VM_MEMATTR_).
Mistakenly, instead of them, actual values from the
table were used (PTE2_ATTR_), when applying
work-around for Marvell Armada 38x SoCs.
Rick Macklem [Thu, 27 Jul 2017 20:55:31 +0000 (20:55 +0000)]
Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".
Fix probing FC targets with hard addressing turned on.
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.
The intent of that change was to keep loop IDs persistent across
chip reinits.
The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address. It also turned off soft addressing on 2400
(4Gb) and newer controllers.
The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on. So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.
The solution is to largely revert that change. I left some elements
in place that are related to virtual ports, since they were new.
This does work with IBM tape drives with hard and soft addressing
turned on. I have tested it with 4Gb, 8Gb, and 16Gb controllers.
sys/dev/isp.c:
Largely revert FreeBSD SVN change 289937. I left the
ispmbox.h changes in place.
Don't use the PREV_ADDRESS bit on initialization. It tells
the chip to not participate if it can't get the requested
loop ID.
Do use soft addressing on 2400 and newer chips.
Use hard addressing when the user has requested a specific
initiator ID. (hint.isp.X.iid=N in /boot/loader.conf)
Leave some of the virtual port options from that change in
place, but don't turn on the PREV_ADDRESS bit.
Andrew Turner [Thu, 27 Jul 2017 15:06:34 +0000 (15:06 +0000)]
Always set the receive mask in loader.efi. Some UEFI implementations set
this to be too restrictive. We need to have both broadcast and unicast
enabled for loader to work. Set them in all cases to ensure this is true.
This allows the Cavium ThunderX 2s in the netperf cluster to netboot using
a USB NIC.
After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).
Ed Maste [Thu, 27 Jul 2017 12:29:31 +0000 (12:29 +0000)]
genericize target exclusion for missing external toolchain
Previously we excluded riscv from make universe / tinderbox if the
required xtoolchain package was not installed. Make that logic generic
so that we can loop over multiple architectures, in preparation to test
patches to have other architectures rely on external toolchain.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D11652
Marius Strobl [Wed, 26 Jul 2017 22:04:23 +0000 (22:04 +0000)]
- Check the slot type capability, set SDHCI_SLOT_{EMBEDDED,NON_REMOVABLE}
for embedded slots. Fail in the sdhci(4) initialization for slot type
shared, which is completely unsupported by this driver at the moment. [1]
For Intel eMMC controllers, taking the embedded slot type into account
obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
as the SDHCI specification explicitly states that 1.8 V VDD is applicable
to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().
Ed Maste [Wed, 26 Jul 2017 21:23:09 +0000 (21:23 +0000)]
cc_cubic: restore braces around if-condition block
r307901 was reverted in r321480, restoring an incorrect block
delimitation bug present in the original cc_cubic commit. Restore
only the bugfix (brace addition) from r307901.
Ian Lepore [Wed, 26 Jul 2017 21:06:26 +0000 (21:06 +0000)]
Add support for tracking nested calls to iicbus_request/release_bus().
Usually it is sufficient to use iicbus_transfer_excl(), or one of the
higher-level convenience functions that use it, to reserve the bus for the
duration of each register access. Occasionally it is important that a
series of accesses or read-modify-write operations must be done without any
other intervening access to the device, to prevent corrupting state.
Without support for nested request/release, slave device drivers would have
to stop using high-level convenience functions and resort to working with
arrays of iic_msg structs just for a few operations (often involving
one-time device setup or infrequent configuration changes).
The changes here appear large from a glance at the diff, but in fact they're
nearly trivial, and the large diff is because of changes in indentation and
the re-wrapping of comments caused by that. One notable change is that
iicbus_release_bus() now ignores the IICBUS_CALLBACK(IIC_RELEASE_BUS) return
value. The old error handling left the bus in a kind of limbo state where
it was still owned at the iicbus layer, but drivers rarely check the return
of the release call, and it's unclear what they would do to recover from an
error return anyway. No existing low-level drivers return any kind of error
from IIC_RELEASE_BUS except one EINVAL for "you don't own the bus", to which
the right response is probably to carry on with the process of releasing the
reference to the bus anyway.
Ian Lepore [Wed, 26 Jul 2017 20:40:24 +0000 (20:40 +0000)]
Add a pair of convenience routines for doing simple "register" read/writes
on i2c devices, where the "register" can be any length.
Many (perhaps most) common i2c devices are organized as a collection of
(usually 1-byte-wide) registers, and are accessed by first writing a 1-byte
register index/offset number, then by reading or writing the data.
Generally there is an auto-increment feature so the when multiple bytes
are read or written, multiple contiguous registers are accessed.
Most existing slave device drivers allocate an array of iic_msg structures,
fill in all the transfer info, and invoke iicbus_transfer(). These new
functions commonize all that and reduce register access to a simple call
with a few arguments.
Suppose that a file on NFS has partially filled last page, and this
page is dirty. NFS VOP_PAGEOUT() method only marks the the page clean
up to the block of the last written byte, leaving other blocks dirty.
Also any page which erronously exists in the vnode vm_object past EOF
is also left marked as dirty.
With the introduction of the buf-cache coherent pager, each pass of
syncer over the object with such page results in creation of B_DELWRI
buffer due to VOP_WRITE() call. This buffer is noted on next syncer
pass, which results e.g. a visible manifestation of shutdown never
finishing vnode sync. Note that before buf-cache coherency commit, a
dirty page might left never synced to server if a partial writes
occur.
Fix this by clearing dirty bits after EOF. Only blocks of the partial
page which are completely after EOF are marked clean, to avoid
possible user data loss.
Andrew Turner [Wed, 26 Jul 2017 17:39:10 +0000 (17:39 +0000)]
Pass the last exception trap frame to kdb_trap. This allows show registers
in ddb to show the traps registers, and not the registers from within the
panic call.
Ed Schouten [Wed, 26 Jul 2017 06:57:15 +0000 (06:57 +0000)]
Upgrade to the latest sources generated from the CloudABI specification.
The CloudABI specification has had some minor changes over the last half
year. No substantial features have been added, but some features that
are deemed unnecessary in retrospect have been removed:
- mlock()/munlock():
These calls tend to be used for two different purposes: real-time
support and handling of sensitive (cryptographic) material that
shouldn't end up in swap. The former use case is out of scope for
CloudABI. The latter may also be handled by encrypting swap.
Removing this has the advantage that we no longer need to worry about
having resource limits put in place.
- SOCK_SEQPACKET:
Support for SOCK_SEQPACKET is rather inconsistent across various
operating systems. Some operating systems supported by CloudABI (e.g.,
macOS) don't support it at all. Considering that they are rarely used,
remove support for the time being.
- getsockname(), getpeername(), etc.:
A shortcoming of the sockets API is that it doesn't allow you to
create socket(pair)s, having fake socket addresses associated with
them. This makes it harder to test applications or transparently
forward (proxy) connections to them.
With CloudABI, we're slowly moving networking connectivity into a
separate daemon called Flower. In addition to passing around socket
file descriptors, this daemon provides address information in the form
of arbitrary string labels. There is thus no longer any need for
requesting socket address information from the kernel itself.
This change also updates consumers of the generated code accordingly.
Even though system calls end up getting renumbered, this won't cause any
problems in practice. CloudABI programs always call into the kernel
through a kernel-supplied vDSO that has the numbers updated as well.
It is full of distracting noise about UPDATING and may confuse
the user about what is actually being deleted. It is also
possible to have directories removed on every run with
use of WITHOUT_ knobs that the mtree files do not
account for and for which the directories are incorrectly
in OLD_DIRS currently.