ngie [Sat, 3 Dec 2016 02:55:19 +0000 (02:55 +0000)]
MFC r304797,r305467,r305468,r305483:
r304797 (by jmmv):
Make use of Kyua's work directories.
Change the vnode tests to use the current directory when creating temporary
files, which we can assume is a volatile work directory, and then make the
kqueue_test.sh driver _not_ abandon the directory created by Kyua.
This makes the various kqueue tests independent of each other, and ensures
the temporary file is cleaned up on failure.
Problem spotted by asomers@ when reviewing D4254.
r305467:
Move tests/sys/kqueue/... to tests/sys/kqueue/libkqueue/...
This is being done to clearly distinguish the libkqueue tests
from the (soon to be imported) NetBSD tests.
r305468:
Port contrib/netbsd-tests/kernel/kqueue/... as tests/sys/kqueue/...
proc2_test must be skipped because the invariant tested
(`ke.fflags & NOTE_TRACKERR`) doesn't pass.
r305483:
Fix tests/sys/kqueue NetBSD tests on 32-bit platforms by using proper
format specifier for pointers when printing them out with printf(3)
lib/libc/tests/stdio/open_memstream_test.c was moved to
lib/libc/tests/stdio/open_memstream2_test.c to accomodate
the new open_memstream test from NetBSD.
Tested on: amd64 (VMware fusion VM; various bare metal platforms); i386 (VMware fusion VM); make tinderbox
jhb [Sat, 3 Dec 2016 01:10:45 +0000 (01:10 +0000)]
MFC 303348:
cxgbe(4): Initialize the adapter queues (fwq and mgmtq) instead of
returning EAGAIN if they aren't available when the user tries to program
a filter. Do this after validating the filter so that the driver
doesn't bring up the queues if it doesn't have to.
jhb [Sat, 3 Dec 2016 00:18:38 +0000 (00:18 +0000)]
MFC 304854: cxgbe/iw_cxgbe: Various fixes to the iWARP driver.
- Return appropriate error code instead of ENOMEM when sosend() fails in
send_mpa_req.
- Fix for problematic race during destroy_qp.
- Abortive close in the failure of send_mpa_reject() instead of normal close.
- Remove the unnecessary doorbell flowcontrol logic.
jhb [Fri, 2 Dec 2016 22:27:54 +0000 (22:27 +0000)]
MFC 308564: Don't place threads on the run queue after waking up other CPUs.
The other CPU might resume and see a still-empty runq and go back to
sleep before sched_add() adds the thread to the runq. This results
in a lost wakeup and a potential hang if the system is otherwise
completely idle.
The race originated due to a micro-optimization (my fault) in 4BSD in
that it avoided putting a thread on the run queue if the scheduler was
going to preempt to the new thread. To avoid complexity while fixing
this race, just drop this optimization. 4BSD now always sets the
"owepreempt" flag when a preemption is warranted and defers the actual
preemption to the thread_unlock of the caller the same as ULE.
jhb [Fri, 2 Dec 2016 21:35:14 +0000 (21:35 +0000)]
MFC 308005: Add powerd(8) support for several families of AMD CPUs.
Use the same logic to calculate the nominal CPU frequency from the P-state
MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10.
Family 0x14 was included in the original patch in the PR but I left that
out as the BIOS writer's guide for family 0x14 CPUs show a different layout
for the relevant MSR and include a different formulate for calculating the
frequency.
While here, simplify a few expressions and print out the family of
unsupported CPUs in hex rather than decimal.
julian [Fri, 2 Dec 2016 08:24:00 +0000 (08:24 +0000)]
MFH: r309295
bhyve: stability and performance improvement for dbgport
The TCP server implementation in dbgport does not track clients, so it
may try to write to a disconected socket resulting in SIGPIPE.
Avoid that by setting SO_NOSIGPIPE socket option.
Because dbgport emulates an I/O port to guest, the communication is done
byte by byte. Reduce latency of the TCP/IP transfers by using
TCP_NODELAY option. In my tests that change improves performance of
kgdb commands with lots of output (e.g. info threads) by two orders of
magnitude.
A general note. Since we have a uart emulation in bhyve, that can be
used for the console and gdb access to guests. So, bvmconsole and bvmdebug
could be de-orbited now. But there are many existing deployments that
still dependend on those.
julian [Fri, 2 Dec 2016 05:42:49 +0000 (05:42 +0000)]
MFH: r303611
slite style changes. There is an incoming patch that rewrites a
lot of this module and I want to get the style and whitespace changes in
a separate commit (or maybe more).
gonzo [Thu, 1 Dec 2016 22:22:19 +0000 (22:22 +0000)]
MFC r308898, r308940, r308942, r308944, r309112
r308898:
[bytgpio] Fix USB disconnect event after listsing pins on gpioc2
- Do not set input flag when reading value from GPIO pin, it is not
required and for gpioc2(S5 bank) setting both input and output flags
leads to some kind of electric interference (curren drop?) that
causes USB devices to disconnect
- Check pad configuration when attaching device and provide IN/OUT
capabilities only for pads that are configured as GPIO. Do not let
user code to configure or change value of non-GPIO pads. There is
no information for NC bank in intel's datasheet so for now function
check is ignored for pins in it
Reported by: Frank H.
MFC after: 3 days
r308940:
[bytgpio] prepare bytgpio(4) for modularization
- Add detach method
- module should depend on gpiobus, not gpio
r308942:
[bytgpio] Add module for bytgpio(4)
MFC after: 3 days
r308944 by hiren@:
r308942 broke kernel build.
Add acpi_if.h to module makefile to fix it.
Submitted by: peter
r309112:
[bytgpio] Fix pc98 build by disabling bytgpio module for this platform
jhb [Thu, 1 Dec 2016 20:36:48 +0000 (20:36 +0000)]
MFC 308456: Pass the correct flag to find_symdef() from _rtld_bind().
When symbol versioning was added to rtld, the boolean 'in_plt' argument
to find_symdef() was converted to a bitmask of flags. The first flag
added was 'SYMLOOK_IN_PLT' which replaced the 'in_plt' bool. This
happened to still work by accident as SYMLOOK_IN_PLT had the value of 1
which is the same as 'true', so there should be no functional change.
vangyzen [Wed, 30 Nov 2016 21:53:06 +0000 (21:53 +0000)]
MFC r306577 r306652 306830
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4
address is added to an interface, and that is usually sufficient.
However, in some circumstances, such as when a shared address is
passed between cluster nodes, this single GARP may occasionally be
dropped or lost. This can lead to neighbors on the network link
working with a stale ARP cache and sending packets destined for
that address to the node that previously owned the address, which
may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting
the net.link.ether.inet.garp_rexmit_count sysctl to a value greater
than zero. The setting represents the maximum number of retransmissions.
The interval between retransmissions is calculated using an exponential
backoff algorithm, doubling each time, so the retransmission intervals
are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval
between GARP retransmissions, the maximum number of retransmissions
is limited to 16 for sanity. This limit corresponds to a maximum
interval between retransmissions of 2^16 seconds ~= 18 hours.
Increasing this limit is possible, but sending out GARPs spaced
days apart would be of little use.
Update arp(4) to document the net.link.ether.inet.garp_rexmit_count sysctl.
Submitted by: dab
Relnotes: yes
Sponsored by: Dell EMC
vangyzen [Wed, 30 Nov 2016 20:47:54 +0000 (20:47 +0000)]
MFC r308904
Fix error reporting from wcstof()
When wcstof() skipped initial space and then parsing failed, it set
endptr to the first non-space character. Fix it to correctly report
failure by setting endptr to the beginning of the input string.
The fix is from theraven@, who fixed this bug in wcstod() and
wcstold() in r227753.
While I'm here:
Move assignments out of declarations in wcstod() and wcstold().
This is against my personal preference, but it is our agreed style(9).
Set endptr correctly on malloc() failure in all three functions.
Remove an incorrect comment: This is pointer arithmetic,
so the code was not actually making that assumption.
wcstold() advanced the wcp pointer beyond leading whitespace
and then reset it back to the beginning of the string.
Do not reset it. This seems to have no functional effect,
since strtold_l() also skips leading whitespace. I'm making
the change to keep this function consistent with wcstof() and
wcstod(), and because the C11 spec prescribes the use of iswspace()
to skip leading space.
Reported by: libc++ unit test for std::stof(std::wstring)
Sponsored by: Dell EMC
vangyzen [Wed, 30 Nov 2016 18:26:22 +0000 (18:26 +0000)]
MFC r308824
locale: fix display of "grouping" and "mon_grouping" values
The "grouping" and "mon_grouping" values are arrays of one-byte
integers, not arrays of ASCII characters. Display them in a format
similar to GNU and MacOS.
dexuan [Wed, 30 Nov 2016 07:22:46 +0000 (07:22 +0000)]
MFC 308797-308799, 309082
r308797
update the hv_vmbus(4) manual by adding a dependency on pci
We enhanced the vmbus driver to support PCIe pass-through recently.
Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
r308798
remove the hv_ata_pci_disengage(4) manual
A few months ago, we removed the driver, which was not necessary any longer.
Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
r308799
fix share/man/man4/Makefile for hv_ata_pci_disengage.4
We need to remove the line since we removed the related manual just now.
Reviewed by: sephe
Approved by: sephe (mentor)
Sponsored by: Microsoft
r309082
share/man/man4/Makefile: Only install Hyper-V man pages on amd64 and i386
We shouldn't install them on the architectures not supported by Hyper-V.
And, hv_ata_pci_disengage.4.gz should be removed from all architectures:
1) It should have only applied to Hyper-V;
2) For Hyper-V platforms (amd64 and i386), the related driver was removed by
r306426 | sephe | 2016-09-29 09:41:52 +0800 (Thu, 29 Sep 2016),
because now we have a better mechanism to disble the ata driver for hard
disks when the VM runs on Hyper-V.
dexuan [Wed, 30 Nov 2016 06:20:43 +0000 (06:20 +0000)]
MFC: 308723-308725,308793-308795,309127
Approved by: sephe (mentor)
r308723
hyperv/vmbus: add a new method to get vcpu_id
vcpu_id is host's representation of guest CPU.
We get the mapping between vcpu_id and FreeBSD kernel's cpu id when VMBus
driver is loaded. Later, when a driver, like the coming pcib driver, talks
to the host and needs to refer to a guest CPU, the driver must use the
vcpu_id.
The feature enables us to pass through physical PCIe devices to FreeBSD VM
running on Hyper-V (Windows Server 2016) to get near-native performance with
low CPU utilization.
The patch implements a PCI bridge driver to support the feature:
1) The pcib driver talks to the host to discover device(s) and presents
the device(s) to FreeBSD's pci driver via PCI configuration space (note:
to access the configuration space, we don't use the standard I/O port
0xCF8/CFC method; instead, we use an MMIO-based method supplied by Hyper-V,
which is very similar to the 0xCF8/CFC method).
2) The pcib driver allocates resources for the device(s) and initialize
the related BARs, when the device driver's attach method is invoked;
3) The pcib driver talks to the host to create MSI/MSI-X interrupt
remapping between the guest and the host;
4) The pcib driver supports device hot add/remove.
brooks [Wed, 30 Nov 2016 01:17:02 +0000 (01:17 +0000)]
MFC r309027:
Allocate a struct ifreq rather than using a (wrong) computed size for
the BIOCSETIF ioctl.
The kernel always copies an entire struct ifreq and IPv4 addresses will
always fit in an ifreq.
On systems with pointers larger than 64-bits, the computed size will be
less than the size of struct ifreq, potentially resulting in the kernel
attempting to copyin memory from outside the allocation.
asomers [Mon, 28 Nov 2016 22:35:10 +0000 (22:35 +0000)]
MFC r308780
Fix "camcontrol rescan" with SATA drives behind a SAS controller
A bug in CAM's serial number hash logic resulted in SATA drives behind a SAS
controller getting removed and readded anytime the drive was rescanned for
any reason
jhb [Mon, 28 Nov 2016 18:36:37 +0000 (18:36 +0000)]
MFC 307756: Define max_align_t for C11.
libc++'s stddef.h includes an existing definition of max_align_t for
C++11, but it is only defined for C++, not for C. In addition, GCC and
clang both define an alternate version of max_align_t that uses a
union of multiple types rather than a plain long double as in libc++.
This adds a __max_align_t to <sys/_types.h> that matches the GCC and
clang definition that is mapped to max_align_t in <stddef.h>.
hselasky [Mon, 28 Nov 2016 17:22:45 +0000 (17:22 +0000)]
MFC r308730:
Make sure MAC address is reprogrammed when if_init() callback is
invoked. Else promiscious mode must be used to pass traffic. While at
it fix a debug print macro.
kib [Sun, 27 Nov 2016 09:10:33 +0000 (09:10 +0000)]
MFC r308618:
Provide simple mutual exclusion between mount point update and unmount.
In the update path in ffs_mount(), drop vfs_busy() reference around namei().
emaste [Sat, 26 Nov 2016 03:39:02 +0000 (03:39 +0000)]
MFC r307003, r307564: makewhatis: make output reproducible
r307003: Instead:
1) provide fts_open() with a comparison function to process directories
and files in a deterministic order
2) in addition to the existing hash, insert pages into a linked list
which will be sorted (by virtue of 1)
3) iterate over pages by the list in 2, instead of hash order
Idea from: des
r307564: makewhatis: avoid skipping another page after one with no mlinks
rstone [Sat, 26 Nov 2016 01:16:33 +0000 (01:16 +0000)]
MFC r308580:
Don't read if_counters with if_addr_lock held
Calling into an ifnet implementation with the if_addr_lock already
held can cause a LOR and potentially a deadlock, as ifnet
implementations typically can take the if_addr_lock after their
own locks during configuration. Refactor a sysctl handler that
was violating this to read if_counter data in a temporary buffer
before the if_addr_lock is taken, and then copying the data
in its final location later, when the if_addr_lock is held.
jhb [Fri, 25 Nov 2016 22:12:03 +0000 (22:12 +0000)]
MFC 307333: Reprogram I/O APIC interrupt pins when registering an I/O APIC.
All I/O APIC pins are masked when an I/O APIC is first probed. The
APIC enumerator (MP Table or MADT) then parses its associated tables to
configure individual pins to set custom delivery modes or alternate
routing (e.g. routing IRQ 0 to intpin 2). Pins for regular interrupt
pins are left masked until the first interrupt is assigned. However,
pins with unusual settings (e.g. NMI or SMI) are never assigned an
interrupt and thus never re-programmed. The I/O APIC code used to
reprogram all interrupt pins during registration but this was lost in
r151979.
In theory, this is mostly a no-op as the ACPI APIC table does not
include a way to enumerate NMI or SMI pins for the I/O APIC, so only
systems using an MP Table would be affected.
araujo [Fri, 25 Nov 2016 05:54:17 +0000 (05:54 +0000)]
MFC r308443, r308459, r308462, r308478, r308786
r308443:
Add -d flag that prints domain only.
PR: 212875
Submitted by: Ben RUBSON <ben.rubson@gmail.com>
Reviewed by: pi
r308459:
Fix missing '-' for the flags -s and -d on both manpage and usage.
Reported by: garga, bde
r308462:
Add flag -B which does the same like batch mode but without exiting after
print. Also add a new flag -s that add blocks size to statistics.
PR: 198347, 212726
Submitted by: Ben RUBSON <ben.rubson@gmail.com>
Tested by: pi
MFC After: 2 weeks.
r308478:
We can't use protect(1) inside a jail(8)!
To avoid have warning for services that are using oomprotect, oomprotect
will only be applied on services that won't run inside jails.
Reported by: allanjude
MFC after: 2 weeks.
r308786:
rc.subr: Swap checks so we only fork sysctl if *_oomprotect is set.
emaste [Fri, 25 Nov 2016 00:25:59 +0000 (00:25 +0000)]
MFC r307969: strings: fix exit status if a file before the last one fails
Previously a command like "strings f1 f2 f3" reported the exit status
based only on processing the last file.
As with GNU strings, report an error exit status if an error was
encountered processing any of the files. While here simplify the
exit status handling to just success (0) / failure (1).
emaste [Thu, 24 Nov 2016 00:45:00 +0000 (00:45 +0000)]
MFC r308772: crunchide: report explicit error for combined string table
Some tools produce objects with a combined strtab and shstrtab.
These objects are not supported by crunchide since it rewrites the
symtab and strtab to "hide" symbols. This invalidates section header
offsets into a combined strtab/shstrtab.
In the future we could support these objects (by ensuring that we retain
unmodified section name strings in the output .strtab, and then rewriting
each section header's sh_name).
jhb [Wed, 23 Nov 2016 23:53:52 +0000 (23:53 +0000)]
MFC 308056: Fix formatting of tables.
Specifically, use .Ta instead of tabs to separate column entries. While
here fix a few other things:
- Use .Sy for all column headers (previously only the first column header
was bold)
- Use .Dv to markup constants used for MIB names.
- Use "1234" and "4321" for the byte order descriptions without
thousands separators.
- Mark up header files in the first table with .In.
jhb [Wed, 23 Nov 2016 23:45:42 +0000 (23:45 +0000)]
MFC 307975: Enable EFER_NXE properly on APs.
EFER_NXE is set in the EFER MSR by initializecpu() and must be set on all
CPUs in the system. When PG_NX support was added to PAE on i386, the
block to enable EFER_NXE was placed in a section of initializecpu() that
only runs if 'cpu == CPU_686'. During early boot, locore does an
initial pass to set cpu that sets it to CPU_686 on all CPUs later than
a Pentium. Later, printcpuinfo() adjusts the 'cpu' variable on
PII and later CPUs to one of CPU_PII, CPU_PIII, or CPU_P4. However,
printcpuinfo() is called after initializecpu() on the BSP, so the BSP
would enable EFER_NXE and pg_nx. The APs execute initializecpu() much
later after printcpuinfo() has run. The end result on a modern CPU was
that cpu was set to CPU_PIII when the APs invoked initializecpu(), so
they did not enable EFER_NXE. As a result, the APs would fault when
trying to access any pages marked with PG_NX set.
When booting a 2 CPU PAE kernel in bhyve this manifested as a hang before
single user mode. The attempt to execute /bin/init tried to copy out
the exec strings (argv, etc.) to a non-executable mapping while running
on the AP. The instruction kept faulting due to invalid bits in the PTE
in an infinite loop.
Fix this by moving the code to enable EFER_NXE out of the switch statement
on 'cpu' and always doing it if 'amd_feature' supports AMDID_NX.
kib [Wed, 23 Nov 2016 09:25:51 +0000 (09:25 +0000)]
MFC r308689:
Pass CPUID[1] %edx (cpu_feature), %ecx (cpu_feature2) and
CPUID[7].%ebx (cpu_stdext_feature), %ecx (cpu_stdext_feature2) to the
ifunc resolvers on x86.
MFC r308925:
Adjust r308689 to make rtld compilable with either in-tree or
(hopefully) stock gcc 4.2.1 on i386 and other arches.
asomers [Tue, 22 Nov 2016 20:28:17 +0000 (20:28 +0000)]
MFC r307584
Fix C++ includability of crypto headers with static array sizes
C99 allows array function parameters to use the static keyword for their
sizes. This tells the compiler that the parameter will have at least the
specified size, and calling code will fail to compile if that guarantee is
not met. However, this syntax is not legal in C++.
This commit reverts r300824, which worked around the problem for
sys/sys/md5.h only, and introduces a new macro: min_size(). min_size(x) can
be used in headers as a static array size, but will still compile in C++
mode.
jhb [Tue, 22 Nov 2016 18:43:04 +0000 (18:43 +0000)]
MFC 308142: Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h.
Previously these were only declared under #ifdef SMP in <machine/smp.h>.
However, these variables are defind in pmap.c unconditionally, and efirt.c
references them unconditionally. This fixes non-SMP kernel builds.
jilles [Sat, 19 Nov 2016 20:02:49 +0000 (20:02 +0000)]
MFC r307755: swapoff: Remove only late devices with -aL.
Currently, '/etc/rc.d/swaplate stop' removes all swap devices. This can be
very slow and may not even be possible if there is a lot of swap space in
use. However, removing swap devices is only needed for late swap devices
that may depend on daemons that subsequent shutdown steps stop. Normal swap
devices such as hard disk partitions will remain available throughout the
shutdown process and need not be removed.
In swapoff, interpret -aL to remove late swap devices only, and use this in
etc/rc.d/swaplate. The meaning of -aL in swapon remains unchanged (add all
swap devices, both normal and late).
bapt [Wed, 16 Nov 2016 07:05:42 +0000 (07:05 +0000)]
MFC r308477:
make pxeboot consistent with common/dev_net.c
Always define boot.netif.server in kenv in pxeboot
Add "boot.tftproot.server" to kenv when pxeboot uses tftpfs
Change the code order when setting env for TFTP or NFS to be the same as
common/dev_net.c
bapt [Wed, 16 Nov 2016 07:01:52 +0000 (07:01 +0000)]
MFC r307238:
Stop closing the network device when netbooting for loaders using the common
dev_net.c code.
The NETIF_OPEN_CLOSE_ONCE flag was added in r201932 to prevent that behaviour
on some architectures (sparc64 and powerpc64) the default was left to always
open and close the device for each open and close of a file by the loader
because it was necessary for u-boot on arm.
Since it has been added, the flag was turned on for every arches including the
u-boot loader for arm.
This also fixes netbooting on RPi3 (tested by gonzo@)
For the loader.efi it greatly speeds up netbooting
mckusick [Wed, 16 Nov 2016 01:03:42 +0000 (01:03 +0000)]
MFC r307978:
Bug 180894 reports that rm -rf on a directory causes kernel panic and reboot.
Return EINVAL rather than panic for low directory link count.
hiren [Tue, 15 Nov 2016 22:18:52 +0000 (22:18 +0000)]
MFC r302474 (By gnn)
On FreeBSD there is a setsockopt option SO_USER_COOKIE which allows
setting a 32 bit value on each socket. This can be used by applications
and DTrace as a rendezvous point so that an applicaton's data can
more easily be captured at run time. Expose the user cookie via
DTrace by updating the translator in tcp.d and add a quick test
program, a TCP server, that sets the cookie on each connection
accepted.
hselasky [Tue, 15 Nov 2016 08:54:03 +0000 (08:54 +0000)]
MFC r308416:
Add timer to watch the RQ when we are out of mbufs.
The firmware/hardware does not generate additional completion
events unless we post new buffers. Use a timer to try to post
more buffers in case we are temporarily out of mbufs. Else
the receive schedule completely stops.