Andrew Turner [Wed, 15 Aug 2018 13:40:16 +0000 (13:40 +0000)]
Start to remove XScale support from the ARMv4/v5 pmap. Support for XScale
has been removed from the kernel so we can remove it from here to help
simplify the code.
Will Andrews [Wed, 15 Aug 2018 13:05:04 +0000 (13:05 +0000)]
zfs: add ztest to the kyua test suite.
This program is currently failing, and has been for >6 months on HEAD.
Ideally, this should be run 24x7 in CI, to discover hard-to-find bugs that
only manifest with concurrent i/o.
Navdeep Parhar [Wed, 15 Aug 2018 03:03:01 +0000 (03:03 +0000)]
cxgbe(4): Use two hashes instead of a table to keep track of
hashfilters. Two because the driver needs to look up a hashfilter by
its 4-tuple or tid.
A couple of fixes while here:
- Reject attempts to add duplicate hashfilters.
- Do not assume that any part of the 4-tuple that isn't specified is 0.
This makes it consistent with all other mandatory parameters that
already require explicit user input.
Cy Schubert [Tue, 14 Aug 2018 20:24:10 +0000 (20:24 +0000)]
MFV r337818:
WPA: Ignore unauthenticated encrypted EAPOL-Key data
Ignore unauthenticated encrypted EAPOL-Key data in supplicant
processing. When using WPA2, these are frames that have the Encrypted
flag set, but not the MIC flag.
When using WPA2, EAPOL-Key frames that had the Encrypted flag set but
not the MIC flag, had their data field decrypted without first verifying
the MIC. In case the data field was encrypted using RC4 (i.e., when
negotiating TKIP as the pairwise cipher), this meant that
unauthenticated but decrypted data would then be processed. An adversary
could abuse this as a decryption oracle to recover sensitive information
in the data field of EAPOL-Key messages (e.g., the group key).
(CVE-2018-14526)
Cy Schubert [Tue, 14 Aug 2018 20:10:25 +0000 (20:10 +0000)]
WPA: Ignore unauthenticated encrypted EAPOL-Key data
Ignore unauthenticated encrypted EAPOL-Key data in supplicant
processing. When using WPA2, these are frames that have the Encrypted
flag set, but not the MIC flag.
When using WPA2, EAPOL-Key frames that had the Encrypted flag set but
not the MIC flag, had their data field decrypted without first verifying
the MIC. In case the data field was encrypted using RC4 (i.e., when
negotiating TKIP as the pairwise cipher), this meant that
unauthenticated but decrypted data would then be processed. An adversary
could abuse this as a decryption oracle to recover sensitive information
in the data field of EAPOL-Key messages (e.g., the group key).
(CVE-2018-14526)
Warner Losh [Tue, 14 Aug 2018 18:45:20 +0000 (18:45 +0000)]
When the LUA floating point model is INT64, we don't need to do the
overflow dance. This avoids compile errors on latter-day gcc compilers
as well as simplifies the generated code.
Warner Losh [Tue, 14 Aug 2018 18:44:41 +0000 (18:44 +0000)]
Create a loader for each interpreter for x86 BIOS and all EFI
Create loader_{4th,lua,simp}{,.efi}. All of these are installed by
default. Create LOADER_DEFAULT_INTERP to specify the default
interpreter when no other is specified. LOADER_INTERP is the current
interpreter language building. Turn building of lua on by default to
match 4th. simploader is a simplified loader build w/o any interpreter
language (but with a simple loader). This is the historic behavir you
got with WITHOUT_FORTH. Make a hard link to the default loader. This
has to be a hard link rather than the more desirable soft link because
older zfsboot blocks don't support symlinks.
Kyle Evans [Tue, 14 Aug 2018 18:35:33 +0000 (18:35 +0000)]
bectl(8): Check jailparam_* return values
Previous iteration of this assumed that these won't fail because we've
already setup the jail param to this point, but the allocations could still
fail in pretty bad conditions.
Admit that it's possible and return (ENOENT, EINVAL, ENOMEM, or 0) when
deleting arguments. EINVAL shouldn't happen since we're passing optarg;
which may satisfy *optarg == '\0' but never optarg == NULL.
Lower the default limits on the IPv6 reassembly queue.
Currently, the limits are quite high. On machines with millions of
mbuf clusters, the reassembly queue limits can also run into
the millions. Lower these values.
Also, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.
Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.
Lower the default limits on the IPv4 reassembly queue.
In particular, try to ensure that no bucket will have a reassembly
queue larger than approximately 100 items. This limits the cost to
find the correct reassembly queue when processing an incoming
fragment.
Due to the low limits on each bucket's length, increase the size of
the hash table from 64 to 1024.
On the guest entry in bhyve, flush L1 data cache, using either L1D
flush command MSR if available, or by reading enough uninteresting
data to fill whole cache.
Flush is automatically enabled on CPUs which do not report RDCL_NO,
and can be disabled with the hw.vmm.l1d_flush tunable/kenv.
Security: CVE-2018-3646
Reviewed by: emaste. jhb, Tony Luck <tony.luck@intel.com>
Sponsored by: The FreeBSD Foundation
Currently, we process IPv6 fragments with 0 bytes of payload, add them
to the reassembly queue, and do not recognize them as duplicating or
overlapping with adjacent 0-byte fragments. An attacker can exploit this
to create long fragment queues.
There is no legitimate reason for a fragment with no payload. However,
because IPv6 packets with an empty payload are acceptable, allow an
"atomic" fragment with no payload.
Implement a limit on on the number of IPv6 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv6 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv6 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.
Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet6.ip6.maxfragbucketsize).
Add a limit of the number of fragments per IPv6 packet.
The IPv4 fragment reassembly code supports a limit on the number of
fragments per packet. The default limit is currently 17 fragments.
Among other things, this limit serves to limit the number of fragments
the code must parse when trying to reassembly a packet.
Add a limit to the IPv6 reassembly code. By default, limit a packet
to 65 fragments (64 on the queue, plus one final fragment to complete
the packet). This allows an average fragment size of 1,008 bytes, which
should be sufficient to hold a fragment. (Recall that the IPv6 minimum
MTU is 1280 bytes. Therefore, this configuration allows a full-size
IPv6 packet to be fragmented on a link with the minimum MTU and still
carry approximately 272 bytes of headers before the fragmented portion
of the packet.)
Users can adjust this limit using the net.inet6.ip6.maxfragsperpacket
sysctl.
Make the IPv6 fragment limits be global, rather than per-VNET, limits.
The IPv6 reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
on enough VNETs), it is possible that the sum of all the VNET fragment
limits will exceed the number of mbuf clusters available in the system.
Given the fact that the fragment limits are intended (at least in part) to
regulate access to a global resource, the IPv6 fragment limit should
be applied on a global basis.
Note that it is still possible to disable fragmentation for a particular
VNET by setting the net.inet6.ip6.maxfragpackets sysctl to 0 for that
VNET. In addition, it is now possible to disable fragmentation globally
by setting the net.inet6.ip6.maxfrags sysctl to 0.
Implement a limit on on the number of IPv4 reassembly queues per bucket.
There is a hashing algorithm which should distribute IPv4 reassembly
queues across the available buckets in a relatively even way. However,
if there is a flaw in the hashing algorithm which allows a large number
of IPv4 fragment reassembly queues to end up in a single bucket, a per-
bucket limit could help mitigate the performance impact of this flaw.
Implement such a limit, with a default of twice the maximum number of
reassembly queues divided by the number of buckets. Recalculate the
limit any time the maximum number of reassembly queues changes.
However, allow the user to override the value using a sysctl
(net.inet.ip.maxfragbucketsize).
Add a global limit on the number of IPv4 fragments.
The IP reassembly fragment limit is based on the number of mbuf clusters,
which are a global resource. However, the limit is currently applied
on a per-VNET basis. Given enough VNETs (or given sufficient customization
of enough VNETs), it is possible that the sum of all the VNET limits
will exceed the number of mbuf clusters available in the system.
Given the fact that the fragment limit is intended (at least in part) to
regulate access to a global resource, the fragment limit should
be applied on a global basis.
VNET-specific limits can be adjusted by modifying the
net.inet.ip.maxfragpackets and net.inet.ip.maxfragsperpacket
sysctls.
To disable fragment reassembly globally, set net.inet.ip.maxfrags to 0.
To disable fragment reassembly for a particular VNET, set
net.inet.ip.maxfragpackets to 0.
Improve IPv6 reassembly performance by hashing fragments into buckets.
Currently, all IPv6 fragment reassembly queues are kept in a flat
linked list. This has a number of implications. Two significant
implications are: all reassembly operations share a common lock,
and it is possible for the linked list to grow quite large.
Improve IPv6 reassembly performance by hashing fragments into buckets,
each of which has its own lock. Calculate the hash key using a Jenkins
hash with a random seed.
Currently, IPv4 fragments are hashed into buckets based on a 32-bit
key which is calculated by (src_ip ^ ip_id) and combined with a random
seed. However, because an attacker can control the values of src_ip
and ip_id, it is possible to construct an attack which causes very
deep chains to form in a given bucket.
To ensure more uniform distribution (and lower predictability for
an attacker), calculate the hash based on a key which includes all
the fields we use to identify a reassembly queue (dst_ip, src_ip,
ip_id, and the ip protocol) as well as a random seed.
Reserve page at the physical address zero on amd64.
We always zero the invalidated PTE/PDE for superpage, which means that
L1TF CPU vulnerability (CVE-2018-3620) can be only used for reading
from the page at zero.
Note that both i386 and amd64 exclude the page from phys_avail[]
array, so this change is redundant, but I think that phys_avail[] on
UEFI-boot does not need to do that. Eventually the blacklisting
should be made conditional on CPUs which report that they are not
vulnerable to L1TF.
Reviewed by: emaste. jhb
Sponsored by: The FreeBSD Foundation
amd64: ensure that curproc->p_vmspace pmap always matches PCPU
curpmap.
When performing context switch on a machine without PCID, if current
%cr3 equals to the new pmap %cr3, which is typical for kernel_pmap
vs. kernel process, I overlooked to update PCPU curpmap value. Remove
check for %cr3 not equal to pm_cr3 for doing the update. It is
believed that this case cannot happen at all, due to other changes in
this revision.
Also, do not set the very first curpmap to kernel_pmap, it should be
vmspace0 pmap instead to match curproc.
Move the common code to activate the initial pmap both on BSP and APs
into pmap_activate_boot() helper.
Mark Johnston [Tue, 14 Aug 2018 14:02:53 +0000 (14:02 +0000)]
Don't use memcpy() in the early microcode loading code.
At some point memcpy() may be an ifunc, ifunc resolution cannot be done
until CPU identification has been performed, and CPU identification must
be done after loading any microcode updates.
X-MFC with: r337715
Sponsored by: The FreeBSD Foundation
Andrew Turner [Tue, 14 Aug 2018 11:00:54 +0000 (11:00 +0000)]
Support reading from the arm64 ID registers from userspace.
Trap reads to the arm64 ID registers and write a safe value into them. This
will allow us to put more useful values in these later and have userland
check them to find what features the hardware supports.
These are currently safe defaults, but will later be populated with better
values from the hardware.
Restore ability to send ICMP and ICMPv6 redirects.
It was lost when tryforward appeared. Now ip[6]_tryforward will be enabled
only when sending redirects for corresponding IP version is disabled via
sysctl. Otherwise will be used default forwarding function.
Ian Lepore [Mon, 13 Aug 2018 23:53:11 +0000 (23:53 +0000)]
Export the eeprom device size via readonly sysctl. Also export the write
page size and address size, although they are likely to be inherently
less-interesting values outside of the driver.
Warner Losh [Mon, 13 Aug 2018 19:59:42 +0000 (19:59 +0000)]
Port the mps panic-safe shutdown_final handling to mpr
r330951 by smh fixed the mps driver to avoid deadlocks when panicing.
The same code is needed for mpr, so port it here, along with the fix
which allows the CCBs scheduled to complete avoiding at least a scary
message and likely other unintended consequences.
Warner Losh [Mon, 13 Aug 2018 19:59:37 +0000 (19:59 +0000)]
Call xpt_sim_poll in shutdown_final handler.
When we're shutting down, we send a number of start/stop commands to
the known targets. We have to wait for them to complete. During a
panic, the interrupts are off, and using pause to wait for them to
fire and complete won't work: we have to poll after pause returns so
the completion routines of the CCBs run so we decrement work
outstanding counts.
Warner Losh [Mon, 13 Aug 2018 19:59:32 +0000 (19:59 +0000)]
Create xpt_sim_poll and refactor a bit using it.
xpt_sim_poll takes the sim to poll as an argument. It will do the
proper locking protocol, call the SIM polling routine, and then call
camisr_runqueue to process completions on any CCBs the SIM's poll
routine completed. It will be used during late shutdown when a SIM is
waiting for CCBs it sent during shutdown to finish and the scheduler
isn't running because we've panic'd.
This sequence was used twice in cam_xpt, so refactor those to use this
new function.
evdev: Remove evdev.ko linkage dependency on kbd driver
Move evdev_ev_kbd_event() helper from evdev to kbd.c as otherwise evdev
unconditionally requires all keyboard and console stuff to be compiled
into the kernel. This dependency happens as evdev_ev_kbd_event() helper
references kbdsw global variable defined in kbd.c through use of
kbdd_ioctl() macro.
While here make all keyboard drivers respect evdev_rcpt_mask while setting
typematic rate and LEDs with evdev interface.
[ig4] Fix initialization sequence for newer ig4 chips
Newer chips may require assert/deassert after power down for proper
startup. Check respective flag in DEVIDLE_CTRL and perform operation
if neccesssary.
Glen Barber [Mon, 13 Aug 2018 17:23:43 +0000 (17:23 +0000)]
Add lang/python2, lang/python3, and lang/python to GCE images
to help avoid hard-coding 'python<MAJOR>.<MINOR>' in several
scripts in the client-side scripts.
PR: 230248
MFC after: 3 days
Submitted by: gustavo.scalet@collabora.com
Sponsored by: The FreeBSD Foundation
Mark Johnston [Mon, 13 Aug 2018 17:13:09 +0000 (17:13 +0000)]
Implement kernel support for early loading of Intel microcode updates.
Updates in the format described in section 9.11 of the Intel SDM can
now be applied as one of the first steps in booting the kernel. Updates
that are loaded this way are automatically re-applied upon exit from
ACPI sleep states, in contrast with the existing cpucontrol(8)-based
method. For the time being only Intel updates are supported.
Microcode update files are passed to the kernel via loader(8). The
file type must be "cpu_microcode" in order for the file to be recognized
as a candidate microcode update. Updates for multiple CPU types may be
concatenated together into a single file, in which case the kernel
will select and apply a matching update. Memory used to store the
update file will be freed back to the system once the update is applied,
so this approach will not consume more memory than required.
Reviewed by: kib
MFC after: 6 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16370
Prevent some parallel swap-ins, rate-limit swapper swap-ins.
If faultin() was called outside swapper (from PHOLD()), do not allow
swapper to initiate additional swap-ins. Swapper' initiated swap-ins
are serialized because they are synchronous and executed in the
context of the thread0. With the added limitation, we only allow
parallel swap-ins from PHOLD(), which is up to PHOLD() users to
manage, usually they do not need to.
Rate-limit swapper' swap-ins to one in the MAXSLP / 2 seconds
interval, counting faultin() swapins.
Andrew Gallatin [Mon, 13 Aug 2018 14:13:25 +0000 (14:13 +0000)]
lagg: allow lacp to manage the link state
Lacp needs to manage the link state itself. Unlike other
lagg protocols, the ability of lacp to pass traffic
depends not only on the lagg members having link, but also
on the lacp protocol converging to a distributing state with the
link partner.
If we prematurely mark the link as up, then we will send a
gratuitous arp (via arp_handle_ifllchange()) before the lacp
interface is capable of passing traffic. When this happens,
the gratuitous arp is lost, and our link partner may cache
a stale mac address (eg, when the base mac address for the
lagg bundle changes, due to a BIOS change re-ordering NIC
unit numbers)