ian [Thu, 19 Oct 2017 16:07:57 +0000 (16:07 +0000)]
MFC r323392:
Add gpio methods to read/write/configure up to 32 pins simultaneously.
Sometimes it is necessary to combine several gpio pins into an ad-hoc bus
and manipulate the pins as a group. In such cases manipulating the pins
individualy is not an option, because the value on the "bus" assumes
potentially-invalid intermediate values as each pin is changed in turn. Note
that the "bus" may be something as simple as a bi-color LED where changing
colors requires changing both gpio pins at once, or something as complex as
a bitbanged multiplexed address/data bus connected to a microcontroller.
In addition to the absolute requirement of simultaneously changing the
output values of driven pins, a desirable feature of these new methods is to
provide a higher-performance mechanism for reading and writing multiple
pins, especially from userland where pin-at-a-time access incurs a noticible
syscall time penalty.
These new interfaces are NOT intended to abstract away all the ugly details
of how gpio is implemented on any given platform. In fact, to use these
properly you absolutely must know something about how the gpio hardware is
organized. Typically there are "banks" of gpio pins controlled by registers
which group several pins together. A bank may be as small as 2 pins or as
big as "all the pins on the device, hundreds of them." In the latter case, a
driver might support this interface by allowing access to any 32 adjacent
pins within the overall collection. Or, more likely, any 32 adjacent pins
starting at any multiple of 32. Whatever the hardware restrictions may be,
you would need to understand them to use this interface.
In additional to defining the interfaces, two example implementations are
included here, for imx5/6, and allwinner. These represent the two primary
types of gpio hardware drivers. imx6 has multiple gpio devices, each
implementing a single bank of 32 pins. Allwinner implements a single large
gpio number space from 1-n pins, and the driver internally translates that
linear number space to a bank+pin scheme based on how the pins are grouped
into control registers. The allwinner implementation imposes the restriction
that the first_pin argument to the new functions must always be pin 0 of a
bank.
gordon [Tue, 17 Oct 2017 17:30:18 +0000 (17:30 +0000)]
MFC r324696: Update wpa_supplicant/hostapd for 2017-01 vulnerability release.
hostapd: Avoid key reinstallation in FT handshake
Prevent reinstallation of an already in-use group key
Extend protection of GTK/IGTK reinstallation of WNM-Sleep Mode cases
Fix TK configuration to the driver in EAPOL-Key 3/4 retry case
Prevent installation of an all-zero TK
Fix PTK rekeying to generate a new ANonce
TDLS: Reject TPK-TK reconfiguration
WNM: Ignore Key Data in WNM Sleep Mode Response frame if no PMF in use
WNM: Ignore WNM-Sleep Mode Response if WNM-Sleep Mode has not been used
WNM: Ignore WNM-Sleep Mode Response without pending request
FT: Do not allow multiple Reassociation Response frames
TDLS: Ignore incoming TDLS Setup Response retries
ngie [Tue, 17 Oct 2017 15:52:02 +0000 (15:52 +0000)]
MFC r324478:
Check the exit code from fsck_ffs instead of relying on MODIFIED being in the output
^/head@r323923 changed when MODIFIED is printed at exit. It's better to follow the
documented way of determining whether or not a filesystem is clean per fsck_ffs, i.e.,
ensure that the exit code is either 0 or 7.
The pass/fail determination is brittle prior to this commit, and ^/head@r323923 made
the issue apparent -- thus this needs to be fixed independent of ^/head@r323923.
jhb [Tue, 17 Oct 2017 12:45:51 +0000 (12:45 +0000)]
MFC 323579,323585: Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
To preserve KBI on stable/11, a new SV_HWCAP flag is added which
indicates if the sv_hwcap field is present and valid to avoid examining
the field in old modules. Only sysentvec's which wish to use sv_hwcap
need to set the flag in stable/11.
323579:
Add AT_HWCAP and AT_EHDRFLAGS on all platforms.
A new 'u_long *sv_hwcap' field is added to 'struct sysentvec'. A
process ABI can set this field to point to a value holding a mask of
architecture-specific CPU feature flags. If an ABI does not wish to
supply AT_HWCAP to processes the field can be left as NULL.
The support code for AT_EHDRFLAGS was already present on all systems,
just the #define was not present. This is a step towards unifying the
AT_* constants across platforms.
323585:
Add AT_EHDRFLAGS and AT_HWCAP on amd64.
x86 has two separate (but identical) list of AT_* constants and the
earlier commit to add AT_HWCAP only updated the i386 list.
tuexen [Tue, 17 Oct 2017 12:42:17 +0000 (12:42 +0000)]
MFC r322648:
Ensure inp_vflag is consistently set for TCP endpoints.
Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set
for inpcbs used for TCP sockets, no matter if the setting is derived
from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option.
For UDP this was already done right.
brooks [Sat, 14 Oct 2017 16:23:25 +0000 (16:23 +0000)]
MFC r324243:
Remove an unneeded and incorrect memset().
On Variant I TLS architectures (aarch64, arm, mips, powerpc, and riscv)
the __libc_allocate_tls function allocates thread local storage memory
with calloc(). It then copies initialization data over the portions with
non-zero initial values. Before this change it would then pointlessly
zero the already zeroed remainder of the storage. Unfortunately the
calculation was wrong and it would zero TLS_TCB_SIZE (2*sizeof(void *))
additional bytes.
In practice, this overflow only matters if the TLS segment is sized such
that calloc() allocates less than TLS_TCB_SIZE extra memory. Even
then, the likely result will be zeroing part of the next bucket. This
coupled with the impact being confined to Tier II platforms means there
will be no security advisory for this issue.
jhb [Fri, 13 Oct 2017 22:40:57 +0000 (22:40 +0000)]
MFC 324039: Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.
jhb [Fri, 13 Oct 2017 17:11:08 +0000 (17:11 +0000)]
MFC 324072: Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.
sephe [Fri, 13 Oct 2017 05:09:56 +0000 (05:09 +0000)]
MFC 324489,324516
324489
hyperv/hn: Workaround erroneous hash type observed on WS2016.
Background:
- UDP 4-tuple hash type is unconditionally enabled in Hyper-V on WS2016,
which is _not_ affected by NDIS_OBJTYPE_RSS_PARAMS.
- Non-fragment UDP/IPv4 datagrams' hash type is delivered to VM as
TCP_IPV4.
Currently this erroneous behavior only applies to WS2016/Windows10.
Force l3/l4 protocol check, if the RXed packet's hash type is TCP_IPV4,
and the Hyper-V is running on WS2016/Windows10. If the RXed packet is
UDP datagram, adjust mbuf hash type to UDP_IPV4.
Sponsored by: Microsoft
324516
hyperv/hn: Workaround erroneous hash type observed on WS2016 for VF.
hselasky [Thu, 12 Oct 2017 08:27:57 +0000 (08:27 +0000)]
MFC r324320:
Add support for new cuse(3) error code, CUSE_ERR_NO_DEVICE.
This error code is useful when emulating Linux input event
devices from userspace.
rmacklem [Wed, 11 Oct 2017 13:20:24 +0000 (13:20 +0000)]
MFC: r324074
Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.
hselasky [Wed, 11 Oct 2017 10:12:22 +0000 (10:12 +0000)]
MFC r315405, r323351 and r323364:
Add helper function similar to ip_dev_find() to the LinuxKPI to lookup
a network device by its IPv6 address in the given VNET.
emaste [Wed, 11 Oct 2017 00:31:54 +0000 (00:31 +0000)]
MFC r309151: Use explicit 0x200000 for the amd64 kernel physaddr
(instead of MAXPAGESIZE)
MAXPAGESIZE is not well defined by the GNU ld documentation.
Different linkers, and different versions of the same linker, use
different MAXPAGESIZE values. Current versions of GNU gold and LLVM's
lld use 4K. When set to 4K the kernel panics at boot due to an issue
with x86bios.
Here we want the kernel physaddr to be the amd64 superpage size, so use
that value (2MB) explicitly. With this change GNU gold and LLVM lld can
link a working amd64 kernel.
PR: 214718 (x86bios)
Sponsored by: The FreeBSD Foundation
sephe [Tue, 10 Oct 2017 05:52:28 +0000 (05:52 +0000)]
MFC 323728,323729
323728
hyperv/hn: Fix MTU setting
- Add size of an ethernet header to the value configured to NVS. This
does not seem to have any effects if MTU is 1500, but fix hypervisor
side's setting if MTU > 1500.
- Override the MTU setting according to the view from the hypervisor
side.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12352
323729
hyperv/hn: Incease max supported MTU
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12365
sephe [Tue, 10 Oct 2017 05:46:57 +0000 (05:46 +0000)]
MFC 323727,324316
323727
hyperv/hn: Apply VF's RSS setting
Since in Azure SYN and SYN|ACK go through the synthetic parts while the
rest of the same TCP flow goes through the VF, apply VF's RSS settings
to synthetic parts to have a consistent hash value/type for the same TCP
flow.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12333
sephe [Tue, 10 Oct 2017 05:25:46 +0000 (05:25 +0000)]
MFC 323170
if: Add ioctls to get RSS key and hash type/function.
It will be needed by hn(4) to configure its RSS key and hash
type/function in the transparent VF mode in order to match VF's
RSS settings. The description of the transparent VF mode and
the RSS hash value issue are here:
https://svnweb.freebsd.org/base?view=revision&revision=322299
https://svnweb.freebsd.org/base?view=revision&revision=322485
These are generic enough to promise two independent IOCs instead
of abusing SIOCGDRVSPEC.
Setting RSS key and hash type/function is a different story,
which probably requires more discussion.
Comment about UDP_{IPV4,IPV6,IPV6_EX} were only in the patch
in the review request; these hash types are standardized now.
Reviewed by: gallatin
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12174
MFC r324098:
Some mbuf related fixes in icmp_error()
* check mbuf length before doing mtod() and accessing to IP header;
* update oip pointer and all depending pointers after m_pullup();
* remove extra checks and extra parentheses, wrap long lines;
rmacklem [Sun, 8 Oct 2017 21:20:25 +0000 (21:20 +0000)]
MFC: r323978
Change a panic to an error return.
There was a panic() in the NFS server's write operation that didn't
need to be a panic() and could just be an error return.
This patch makes that change.
Found by code inspection during development of the pNFS service.
Relevant vendor changes:
PR #905: Support for Zstandard read and write filters
PR #922: Avoid overflow when reading corrupt cpio archive
Issue #935: heap-based buffer overflow in xml_data (CVE-2017-14166)
OSS-Fuzz 2936: Place a limit on the mtree line length
OSS-Fuzz 2394: Ensure that the ZIP AES extension header is large enough
OSS-Fuzz 573: Read off-by-one error in RAR archives (CVE-2017-14502)
alc [Sun, 8 Oct 2017 17:14:45 +0000 (17:14 +0000)]
MFC r324173
When an I/O error occurs on page out, there is no need to dirty the page,
because it is already dirty. Instead, assert that the page is dirty.
ngie [Sat, 7 Oct 2017 22:59:09 +0000 (22:59 +0000)]
MFC r306743,r317712:
r306743 (by markj):
gmirror: Bump the syncid if broken disks are found during startup.
Consider a mirror with two components, m1 and m2. Suppose a hardware error
results in the removal of m2, with m1's genid bumped. Suppose further that
a replacement mirror component m3 is created and synchronized, after which
the system is shut down uncleanly. During a subsequent bootup, if gmirror
tastes m1 and m2 first, m2 will be removed from the mirror because it is
broken, but the mirror will be started without bumping the syncid on m1
because all elements of the mirror are accounted for. Then m3 will be
added to the already-running mirror with the same syncid as m1, so the
components will not be synchronized despite the unclean shutdown.
Handle this scenario by bumping the syncid of healthy components if any
broken mirrors are discovered during mirror startup.
r317712 (by markj):
Synchronize unclean mirrors before adding them to a running gmirror.
During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.
alc [Sat, 7 Oct 2017 21:13:54 +0000 (21:13 +0000)]
MFC r305685
Various changes to pmap_ts_referenced()
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.
Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.
On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.
alc [Sat, 7 Oct 2017 20:22:04 +0000 (20:22 +0000)]
MFC r321386,321393
Utilize pmap_enter(..., psind=1) in vm_fault_soft_fast() on amd64. (The
Differential Revision discusses the benefits of this change.)
Add a function, vm_reserv_to_superpage(), that returns the superpage
containing the specified base page.
Previously we'd have an assertion failure in cap_rights_is_set if
sysdecode_cap_rights is called with an invalid cap_rights_t, so test for
validity first.
emaste [Sat, 7 Oct 2017 20:18:20 +0000 (20:18 +0000)]
MFC r323405: newvers.sh: speed up failing git-svn revision search
In the case of running newvers.sh on a git tree w/o git-svn-id notes we
previously piped the entire 'git log' to grep. Add --grep to the log
invocation to avoid processing log entries of no interest.
This saves about 2-3 seconds of newvers.sh run time on my SSD laptop.
Later changes will bring further speedups.
emaste [Sat, 7 Oct 2017 20:17:03 +0000 (20:17 +0000)]
MFC r323394: newvers.sh: accept "git-svn-id:" at the start of a line only
This prevents incorrect subversion revision detection when "git svn" is
not being used to get the sources but git is available. Previously old
subversion revisions included in commit messages were favoured over the
more recent and correct revisions in git notes.
For example cf1f35574722 represents r315395 but was treated as r313908
which is referenced in the commit message. Commits following
r315395/cf1f35574722 but before another commit with a git-svn-id
reference in the commit message would be treated as r313908 as well.
Patch from PR updated to accommodate the initial four space indent in
`git log` ouptut.
emaste [Sat, 7 Oct 2017 20:14:30 +0000 (20:14 +0000)]
MFC r323438: make-memstick.sh: use UFSv2
There's not much practical difference as far as install media is
concerned but newfs creates UFSv2 by default and it is sensible to use
the contemporary UFS version.
I also intend to change makefs to create UFSv2 by default (to match
newfs) so we'll want make-memstick.sh to be explicit, rather than
relying on the host tool's default.
alc [Sat, 7 Oct 2017 18:36:42 +0000 (18:36 +0000)]
MFC r319542,321003,321378
Eliminate duplication of the pmap and pv list unlock operations in
pmap_enter() by implementing a single return path. Otherwise, the
duplication will only increase with the upcoming support for psind == 1.
Extract the innermost loop of pmap_remove() out into its own function,
pmap_remove_ptes(). (This new function will also be used by an upcoming
change to pmap_enter() that adds support for psind == 1 mappings.)
Add support for pmap_enter(..., psind=1) to the amd64 pmap. In other words,
add support for explicitly requesting that pmap_enter() create a 2MB page
mapping. (Essentially, this feature allows the machine-independent layer to
create superpage mappings preemptively, and not wait for automatic promotion
to occur.)
Export pmap_ps_enabled() to the machine-independent layer.
Add a flag to pmap_pv_insert_pde() that specifies whether it should fail or
reclaim a PV entry when one is not available.
Refactor pmap_enter_pde() into two functions, one by the same name, that is
a general-purpose function for creating PDE PG_PS mappings, and another,
pmap_enter_2mpage(), that is used to prefault 2MB read- and/or execute-only
mappings for execve(2), mmap(2), and shmat(2).
alc [Sat, 7 Oct 2017 18:08:37 +0000 (18:08 +0000)]
MFC r320980,321377
Generalize vm_page_ps_is_valid() to support testing other predicates on
the (super)page, renaming the function to vm_page_ps_test().
In vm_page_ps_test(), always check that the base pages within the specified
superpage all belong to the same object. To date, that check has not been
needed, but upcoming changes require it.
alc [Sat, 7 Oct 2017 17:32:39 +0000 (17:32 +0000)]
MFC r321015
Style-only change: Consistently use the variable name "pdpg" throughout
this file. Previously, half of the pointers to a vm_page being used as
a page directory page were named "pdpg" and the rest were named "mpde".
alc [Sat, 7 Oct 2017 17:20:31 +0000 (17:20 +0000)]
MFC r323973,324087
Optimize vm_page_try_to_free(). Specifically, the call to pmap_remove_all()
can be avoided when the page's containing object has a reference count of
zero. (If the object has a reference count of zero, then none of its pages
can possibly be mapped.)
Address nearby style issues in vm_page_try_to_free(), and change its
return type to "bool".
Optimize vm_object_page_remove() by eliminating pointless calls to
pmap_remove_all(). If the object to which a page belongs has no
references, then that page cannot possibly be mapped.
alc [Sat, 7 Oct 2017 16:55:44 +0000 (16:55 +0000)]
MFC r323656
Modify blst_leaf_alloc to take only the cursor argument.
Modify blst_leaf_alloc to find allocations that cross the boundary between
one leaf node and the next when those two leaves descend from the same
meta node.
Update the hint field for leaves so that it represents a bound on how
large an allocation can begin in that leaf, where it currently represents
a bound on how large an allocation can be found within the boundaries of
the leaf.
The first phase of blst_leaf_alloc currently shrinks sequences of
consecutive 1-bits in mask until each has been shrunken by count-1 bits,
so that any bits remaining show where an allocation can begin, or until
all the bits have disappeared, in which case the allocation fails. This
change amends that so that the high-order bit is copied, as if, when the
last block was free, it was followed by an endless stream of free
blocks. It also amends the early stopping condition, so that the shrinking
of 1-sequences stops early when there are none, or there is only one
unbounded one remaining.
The search for the first set bit is unchanged, and the code path
thereafter is mostly unchanged unless the first set bit is in a position
that makes some of those copied sign bits matter. In that case, we look
for a next leaf, and at what blocks it can provide, to see if a
cross-boundary allocation is possible.
The hint is updated on a successful allocation that clears the last bit,
but it not updated on a failed allocation that leaves the last bit
set. So, as long as the last block is free, the hint value for the leaf is
large. As long as the last block is free, and there's a next leaf, a large
allocation can begin here, perhaps. A stricter rule than this would mean
that allocations and frees in one leaf could require hint updates to the
preceding leaf, and this change seeks to leave the freeing code
unmodified.
Define BLIST_BMAP_MASK, and use it for bit masking in blst_leaf_free and
blist_leaf_fill, as well as in blst_leaf_alloc.
avg [Thu, 5 Oct 2017 06:54:25 +0000 (06:54 +0000)]
MFC r323578,r323769: dounmount: do not release the mount point's reference
on the covered vnode
As long as mnt_ref is not zero there can be a consumer that might try
to access mnt_vnodecovered. For this reason the covered vnode must not
be freed until mnt_ref goes to zero.
So, move the release of the covered vnode to vfs_mount_destroy.
trasz [Wed, 4 Oct 2017 12:06:24 +0000 (12:06 +0000)]
MFC r320892:
Make fsck_y_enable default to passing pass -R to fsck_ffs(8) in addition
to -y. To me, fsck_y_enable means "try as hard as possible", and without
-R, it... well, doesn't.
trasz [Wed, 4 Oct 2017 12:04:35 +0000 (12:04 +0000)]
MFC r320803:
Fix "mount -uw /" when the filesystem type doesn't match.
This basically makes "mount -uw /" work when the filesystem
mounted on / is NFS, but the one configured in fstab(5) is UFS,
which can happen when you forget to modify fstab.
Note that the whole special case ("else if (argv[0][0] == '/'")
is probably not needed anyway. I'll take a look at removing it
altogether; for now this is a minimally intrusive fix.
trasz [Wed, 4 Oct 2017 12:01:02 +0000 (12:01 +0000)]
MFC r323225:
Reflect realtime and idle priorities in ps(1) state flags, same like
we do for the usual nice values. It could be argued that they should
use another set of indicators, since the underlying mechanism is
different, but they match the description in the manual page, and so
I think it's ok to not overcomplicate things.
trasz [Wed, 4 Oct 2017 11:59:53 +0000 (11:59 +0000)]
MFC r321422:
Improve the ktrace(1) man page to make it slightly more obvious that there
are _two_ options that control its behaviour wrt child processes; slightly
improve the example[1], and add Xrefs.
trasz [Wed, 4 Oct 2017 11:58:52 +0000 (11:58 +0000)]
MFC r323183:
Make root_mount_rel(9) ignore NULL arguments, like it used to before r313351.
It would be better to fix API consumers to not pass NULL there - most of them,
such as gmirror, already contain the neccessary checks - but this is easier
and much less error-prone.
One known user-visible result is that it fixes panic on a failed "graid label".