brooks [Wed, 30 Nov 2016 01:17:02 +0000 (01:17 +0000)]
MFC r309027:
Allocate a struct ifreq rather than using a (wrong) computed size for
the BIOCSETIF ioctl.
The kernel always copies an entire struct ifreq and IPv4 addresses will
always fit in an ifreq.
On systems with pointers larger than 64-bits, the computed size will be
less than the size of struct ifreq, potentially resulting in the kernel
attempting to copyin memory from outside the allocation.
asomers [Mon, 28 Nov 2016 22:35:10 +0000 (22:35 +0000)]
MFC r308780
Fix "camcontrol rescan" with SATA drives behind a SAS controller
A bug in CAM's serial number hash logic resulted in SATA drives behind a SAS
controller getting removed and readded anytime the drive was rescanned for
any reason
jhb [Mon, 28 Nov 2016 18:36:37 +0000 (18:36 +0000)]
MFC 307756: Define max_align_t for C11.
libc++'s stddef.h includes an existing definition of max_align_t for
C++11, but it is only defined for C++, not for C. In addition, GCC and
clang both define an alternate version of max_align_t that uses a
union of multiple types rather than a plain long double as in libc++.
This adds a __max_align_t to <sys/_types.h> that matches the GCC and
clang definition that is mapped to max_align_t in <stddef.h>.
hselasky [Mon, 28 Nov 2016 17:22:45 +0000 (17:22 +0000)]
MFC r308730:
Make sure MAC address is reprogrammed when if_init() callback is
invoked. Else promiscious mode must be used to pass traffic. While at
it fix a debug print macro.
kib [Sun, 27 Nov 2016 09:10:33 +0000 (09:10 +0000)]
MFC r308618:
Provide simple mutual exclusion between mount point update and unmount.
In the update path in ffs_mount(), drop vfs_busy() reference around namei().
emaste [Sat, 26 Nov 2016 03:39:02 +0000 (03:39 +0000)]
MFC r307003, r307564: makewhatis: make output reproducible
r307003: Instead:
1) provide fts_open() with a comparison function to process directories
and files in a deterministic order
2) in addition to the existing hash, insert pages into a linked list
which will be sorted (by virtue of 1)
3) iterate over pages by the list in 2, instead of hash order
Idea from: des
r307564: makewhatis: avoid skipping another page after one with no mlinks
rstone [Sat, 26 Nov 2016 01:16:33 +0000 (01:16 +0000)]
MFC r308580:
Don't read if_counters with if_addr_lock held
Calling into an ifnet implementation with the if_addr_lock already
held can cause a LOR and potentially a deadlock, as ifnet
implementations typically can take the if_addr_lock after their
own locks during configuration. Refactor a sysctl handler that
was violating this to read if_counter data in a temporary buffer
before the if_addr_lock is taken, and then copying the data
in its final location later, when the if_addr_lock is held.
jhb [Fri, 25 Nov 2016 22:12:03 +0000 (22:12 +0000)]
MFC 307333: Reprogram I/O APIC interrupt pins when registering an I/O APIC.
All I/O APIC pins are masked when an I/O APIC is first probed. The
APIC enumerator (MP Table or MADT) then parses its associated tables to
configure individual pins to set custom delivery modes or alternate
routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt
pins are left masked until the first interrupt is assigned. However,
pins with unusual settings (e.g. NMI or SMI) are never assigned an
interrupt and thus never re-programmed. The I/O APIC code used to
reprogram all interrupt pins during registration but this was lost in
r151979.
In theory, this is mostly a no-op as the ACPI APIC table does not
include a way to enumerate NMI or SMI pins for the I/O APIC, so only
systems using an MP Table would be affected.
araujo [Fri, 25 Nov 2016 05:54:17 +0000 (05:54 +0000)]
MFC r308443, r308459, r308462, r308478, r308786
r308443:
Add -d flag that prints domain only.
PR: 212875
Submitted by: Ben RUBSON <ben.rubson@gmail.com>
Reviewed by: pi
r308459:
Fix missing '-' for the flags -s and -d on both manpage and usage.
Reported by: garga, bde
r308462:
Add flag -B which does the same like batch mode but without exiting after
print. Also add a new flag -s that add blocks size to statistics.
PR: 198347, 212726
Submitted by: Ben RUBSON <ben.rubson@gmail.com>
Tested by: pi
MFC After: 2 weeks.
r308478:
We can't use protect(1) inside a jail(8)!
To avoid have warning for services that are using oomprotect, oomprotect
will only be applied on services that won't run inside jails.
Reported by: allanjude
MFC after: 2 weeks.
r308786:
rc.subr: Swap checks so we only fork sysctl if *_oomprotect is set.
emaste [Fri, 25 Nov 2016 00:25:59 +0000 (00:25 +0000)]
MFC r307969: strings: fix exit status if a file before the last one fails
Previously a command like "strings f1 f2 f3" reported the exit status
based only on processing the last file.
As with GNU strings, report an error exit status if an error was
encountered processing any of the files. While here simplify the
exit status handling to just success (0) / failure (1).
emaste [Thu, 24 Nov 2016 00:45:00 +0000 (00:45 +0000)]
MFC r308772: crunchide: report explicit error for combined string table
Some tools produce objects with a combined strtab and shstrtab.
These objects are not supported by crunchide since it rewrites the
symtab and strtab to "hide" symbols. This invalidates section header
offsets into a combined strtab/shstrtab.
In the future we could support these objects (by ensuring that we retain
unmodified section name strings in the output .strtab, and then rewriting
each section header's sh_name).
jhb [Wed, 23 Nov 2016 23:53:52 +0000 (23:53 +0000)]
MFC 308056: Fix formatting of tables.
Specifically, use .Ta instead of tabs to separate column entries. While
here fix a few other things:
- Use .Sy for all column headers (previously only the first column header
was bold)
- Use .Dv to markup constants used for MIB names.
- Use "1234" and "4321" for the byte order descriptions without
thousands separators.
- Mark up header files in the first table with .In.
jhb [Wed, 23 Nov 2016 23:45:42 +0000 (23:45 +0000)]
MFC 307975: Enable EFER_NXE properly on APs.
EFER_NXE is set in the EFER MSR by initializecpu() and must be set on all
CPUs in the system. When PG_NX support was added to PAE on i386, the
block to enable EFER_NXE was placed in a section of initializecpu() that
only runs if 'cpu == CPU_686'. During early boot, locore does an
initial pass to set cpu that sets it to CPU_686 on all CPUs later than
a Pentium. Later, printcpuinfo() adjusts the 'cpu' variable on
PII and later CPUs to one of CPU_PII, CPU_PIII, or CPU_P4. However,
printcpuinfo() is called after initializecpu() on the BSP, so the BSP
would enable EFER_NXE and pg_nx. The APs execute initializecpu() much
later after printcpuinfo() has run. The end result on a modern CPU was
that cpu was set to CPU_PIII when the APs invoked initializecpu(), so
they did not enable EFER_NXE. As a result, the APs would fault when
trying to access any pages marked with PG_NX set.
When booting a 2 CPU PAE kernel in bhyve this manifested as a hang before
single user mode. The attempt to execute /bin/init tried to copy out
the exec strings (argv, etc.) to a non-executable mapping while running
on the AP. The instruction kept faulting due to invalid bits in the PTE
in an infinite loop.
Fix this by moving the code to enable EFER_NXE out of the switch statement
on 'cpu' and always doing it if 'amd_feature' supports AMDID_NX.
kib [Wed, 23 Nov 2016 09:25:51 +0000 (09:25 +0000)]
MFC r308689:
Pass CPUID[1] %edx (cpu_feature), %ecx (cpu_feature2) and
CPUID[7].%ebx (cpu_stdext_feature), %ecx (cpu_stdext_feature2) to the
ifunc resolvers on x86.
MFC r308925:
Adjust r308689 to make rtld compilable with either in-tree or
(hopefully) stock gcc 4.2.1 on i386 and other arches.
asomers [Tue, 22 Nov 2016 20:28:17 +0000 (20:28 +0000)]
MFC r307584
Fix C++ includability of crypto headers with static array sizes
C99 allows array function parameters to use the static keyword for their
sizes. This tells the compiler that the parameter will have at least the
specified size, and calling code will fail to compile if that guarantee is
not met. However, this syntax is not legal in C++.
This commit reverts r300824, which worked around the problem for
sys/sys/md5.h only, and introduces a new macro: min_size(). min_size(x) can
be used in headers as a static array size, but will still compile in C++
mode.
jhb [Tue, 22 Nov 2016 18:43:04 +0000 (18:43 +0000)]
MFC 308142: Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h.
Previously these were only declared under #ifdef SMP in <machine/smp.h>.
However, these variables are defind in pmap.c unconditionally, and efirt.c
references them unconditionally. This fixes non-SMP kernel builds.
jilles [Sat, 19 Nov 2016 20:02:49 +0000 (20:02 +0000)]
MFC r307755: swapoff: Remove only late devices with -aL.
Currently, '/etc/rc.d/swaplate stop' removes all swap devices. This can be
very slow and may not even be possible if there is a lot of swap space in
use. However, removing swap devices is only needed for late swap devices
that may depend on daemons that subsequent shutdown steps stop. Normal swap
devices such as hard disk partitions will remain available throughout the
shutdown process and need not be removed.
In swapoff, interpret -aL to remove late swap devices only, and use this in
etc/rc.d/swaplate. The meaning of -aL in swapon remains unchanged (add all
swap devices, both normal and late).
bapt [Wed, 16 Nov 2016 07:05:42 +0000 (07:05 +0000)]
MFC r308477:
make pxeboot consistent with common/dev_net.c
Always define boot.netif.server in kenv in pxeboot
Add "boot.tftproot.server" to kenv when pxeboot uses tftpfs
Change the code order when setting env for TFTP or NFS to be the same as
common/dev_net.c
bapt [Wed, 16 Nov 2016 07:01:52 +0000 (07:01 +0000)]
MFC r307238:
Stop closing the network device when netbooting for loaders using the common
dev_net.c code.
The NETIF_OPEN_CLOSE_ONCE flag was added in r201932 to prevent that behaviour
on some architectures (sparc64 and powerpc64) the default was left to always
open and close the device for each open and close of a file by the loader
because it was necessary for u-boot on arm.
Since it has been added, the flag was turned on for every arches including the
u-boot loader for arm.
This also fixes netbooting on RPi3 (tested by gonzo@)
For the loader.efi it greatly speeds up netbooting
mckusick [Wed, 16 Nov 2016 01:03:42 +0000 (01:03 +0000)]
MFC r307978:
Bug 180894 reports that rm -rf on a directory causes kernel panic and reboot.
Return EINVAL rather than panic for low directory link count.
hiren [Tue, 15 Nov 2016 22:18:52 +0000 (22:18 +0000)]
MFC r302474 (By gnn)
On FreeBSD there is a setsockopt option SO_USER_COOKIE which allows
setting a 32 bit value on each socket. This can be used by applications
and DTrace as a rendezvous point so that an applicaton's data can
more easily be captured at run time. Expose the user cookie via
DTrace by updating the translator in tcp.d and add a quick test
program, a TCP server, that sets the cookie on each connection
accepted.
hselasky [Tue, 15 Nov 2016 08:54:03 +0000 (08:54 +0000)]
MFC r308416:
Add timer to watch the RQ when we are out of mbufs.
The firmware/hardware does not generate additional completion
events unless we post new buffers. Use a timer to try to post
more buffers in case we are temporarily out of mbufs. Else
the receive schedule completely stops.
gonzo [Tue, 15 Nov 2016 03:40:22 +0000 (03:40 +0000)]
MFC r308295:
[gpio] Add GPIO driver for Intel Bay Trail SoC
Bay Trail has three banks of GPIOs exposed to userland as /dev/gpiocN,
where N is 1, 2, and 3. Pins in each bank are pre-named to match names
on boards schematics: GPIO_S0_SCnn, GPIO_S0_NCnn, and GPIO_S5_nn.
Controller supports edge-triggered and level-triggered interrupts but
current version of the driver does not have interrupts support
loos [Tue, 15 Nov 2016 01:20:36 +0000 (01:20 +0000)]
Stop abusing from struct ifnet presence to determine the packet direction
for dummynet, use the correct argument for that, remove the false coment
about the presence of struct ifnet.
gonzo [Tue, 15 Nov 2016 00:28:07 +0000 (00:28 +0000)]
MFC r308428:
Refactor FDT part of gpioled driver
- Split driver in two parts: FDT and non-FDT
- Instead of reattach gpioled nodes to GPIO bus use
gpio_pin_get_by_ofw_idx and add ofwbus and simplebus as parrent buses
gonzo [Mon, 14 Nov 2016 21:27:18 +0000 (21:27 +0000)]
MFC r308189:
[psm] Fix choosing wrong mode for synaptic device + trackpoint
With guest trackpoint present trackpoint probing switched synaptics
device to absolute mode with different protocol instead of keeping it
in relative mode.
mav [Sat, 12 Nov 2016 23:53:37 +0000 (23:53 +0000)]
MFC r308173:
Fix ZIL records ordering when ZVOL opened both with and without FSYNC.
Before this an earlier writes to a ZVOL opened without FSYNC could get to
ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by
replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt),
marking all log records sync if anybody opened the ZVOL with FSYNC.
mav [Sat, 12 Nov 2016 23:39:08 +0000 (23:39 +0000)]
MFC r308055: Add vdev_reopening support to vdev_geom.
It allows to avoid extra GEOM providers flapping without significant need.
Since GEOM got resize support, we don't need to reopen provider to get new
size. If provider was orphaned and no longer valid, ZFS should already
know that, and in such case reopen should be done in full as expected.
mav [Sat, 12 Nov 2016 23:37:26 +0000 (23:37 +0000)]
MFC r308051: Matching GUIDs, handle possible race on vdev detach.
In case of vdev detach, causing top level mirror vdev destruction, leaf
vdev changes its GUID to one of the destroyed mirror, that creates race
condition when GUID in vdev label may not match one in the pool config.
This change replicates logic nuance of vdev_validate() by adding special
exception, matching the vdev GUID against the top level vdev GUID.
Since this exception is not completely reliable (may give false positives
if we fail to erase label on detached vdev), use it only as last resort.
Quick way to reproduce this scenario now is detach vdev from a pool with
enabled autoextend. During vdev detach autoextend logic tries to reopen
remaining vdev, that always fails now since in-memory configuration is
already updated, while on-disk labels are not yet.
mav [Sat, 12 Nov 2016 23:29:09 +0000 (23:29 +0000)]
MFC r307318: MFV r307314:
6988 spa_sync() spends half its time in dmu_objset_do_userquota_updates
Using a benchmark which creates 2 million files in one TXG, I observe
that the thread running spa_sync() is on CPU almost the entire time we
are syncing, and therefore can be a performance bottleneck. About 50% of
the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls
zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was
modified (or created). In this benchmark, all the files are owned by the
same user/group, so all 2 million calls to zap_increment_int() are
modifying the same entry in the zap. The same issue exists for the
DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are
syncing, and when we finish, iterate over the in-memory map and modify
the ZAP once per entry. This reduces the number of calls to
zap_increment_int() from "number of objects modified" to "number of
owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark
by ~33%, from 11 seconds to 7 seconds.
Closes #107
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Jinshan Xiong <jinshan.xiong@intel.com>
Author: Matthew Ahrens <mahrens@delphix.com>