sephe [Mon, 8 Aug 2016 06:33:59 +0000 (06:33 +0000)]
MFC 303737
hyperv/storvsc: Claim SPC-3 conformance, thus enable UNMAP support
The Hyper-V on pre-win10 systems will only report SPC-2 conformance,
but it actually conforms to SPC-3. The INQUIRY response is adjusted
to propagate the SPC-3 version information to CAM.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7405
jhb [Sat, 6 Aug 2016 23:53:33 +0000 (23:53 +0000)]
MFC 303076,303225: Use MTX_SYSINIT for the VESA lock.
303076:
vesa: fix panic on suspend
Fix the following panic seen when migrating a FreeBSD guest on Xen:
panic: mtx_lock() of destroyed mutex @ /usr/src/sys/dev/fb/vesa.c:541
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001d2fa4f0
vpanic() at vpanic+0x182/frame 0xfffffe001d2fa570
kassert_panic() at kassert_panic+0x126/frame 0xfffffe001d2fa5e0
__mtx_lock_flags() at __mtx_lock_flags+0x15b/frame 0xfffffe001d2fa630
vesa_bios_save_restore() at vesa_bios_save_restore+0x78/frame 0xfffffe001d2fa680
vga_suspend() at vga_suspend+0xa3/frame 0xfffffe001d2fa6b0
isavga_suspend() at isavga_suspend+0x1d/frame 0xfffffe001d2fa6d0
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame
[...]
This is caused because vga_sub_configure (which is called if the VGA adapter
is attached after VESA tried to initialize), points to vesa_configure, which
doesn't initialize the VESA mutex. In order to fix it, make sure
vga_sub_configure points to vesa_load, so that all the needed vesa
components are properly initialized.
303225:
Use MTX_SYSINIT for the VESA lock.
vesa_init_done isn't a reliable guard for the mutex init. If
vesa_configure() doesn't find valid VESA info it will not set
vesa_init_done, but the lock will remain initialized. Revert r303076
and use MTX_SYSINIT to deterministically init the lock.
jhb [Fri, 5 Aug 2016 17:13:25 +0000 (17:13 +0000)]
MFC 302181,302635: Disable MSI-X migration on older Xen hypervisors.
302181:
Add a tunable to disable migration of MSI-X interrupts.
The new 'machdep.disable_msix_migration' tunable can be set to 1 to
disable migration of MSI-X interrupts.
Xen versions prior to 4.6.0 do not properly handle updates to MSI-X
table entries after the initial write. In particular, the operation
to unmask a table entry after updating it during migration is not
propagated to the "real" table for passthrough devices causing the
interrupt to remain masked. At least some systems in EC2 are
affected by this bug when using SRIOV. The tunable can be set in
loader.conf as a workaround.
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and
70a3cb are required on the hypervisor side for this to be fixed, and those
are only included in 4.6.0, so stay on the safe side and disable MSI-X
interrupt migration on anything older than 4.6.0.
It should not cause major performance degradation unless a lot of MSI-X
interrupts are allocated.
ngie [Wed, 3 Aug 2016 01:19:10 +0000 (01:19 +0000)]
MFstable/11 r303691:
MFC r302550,r302551,r302552,r302553:
r302550:
Deobfuscate cleanup path in clnt_dg_create(..)
Similar to r300836 and r301800, cl and cu will always be non-NULL as they're
allocated using the mem_alloc routines, which always use
`malloc(..., M_WAITOK)`.
Deobfuscating the cleanup path fixes a leak where if cl was NULL and
cu was not, cu would not be free'd, and also removes a duplicate test for
cl not being NULL.
Similar to r300836, r301800, and r302550, cl and ct will always
be non-NULL as they're allocated using the mem_alloc routines,
which always use `malloc(..., M_WAITOK)`.
emaste [Mon, 1 Aug 2016 19:53:18 +0000 (19:53 +0000)]
MFC r303338: vt: lock Giant around kbd calls in CONS_GETINFO
Note that keyboards are stored in an array and are not freed (just
"unregistered" by clearing some fields) so a race would be limited to
obtaining stale information about an unregistered keyboard.
296063:
Lock the NDP default router list and count defrouter references.
This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.
297397
Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.
When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.
299213
Clean up callers of nd6_prelist_add().
nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.
MFC 303109: Update crashinfo to work with newer gdb from ports.
If gdb from ports is installed, use it instead of the base system gdb
to extract variables from a kernel. Note that base gdb and ports gdb
do not support the same options for invoking a single command in batch
mode, so a wrapper shell function is used. In addition, prefer kgdb
from ports when generating a backtrace if present.
In tcp_input(), don't acquire the pcbinfo global write lock for SYN
packets targeting a listening socket. Permit to reduce TCP input
processing starvation in context of high SYN load (e.g. short-lived TCP
connections or SYN flood).
MFC 302899: Add documentation for the sigevent structure.
- Add a sigevent(3) manpage to give a general overview of the sigevent
structure and the available notification mechanisms.
- Document that AIO requests contain a nested sigevent structure that can
be used to request completion notification.
- Expand the sigevent details in other manuals to note details such as
the extra values stored in a queued signal's information or in a posted
kevent.
MFC r297871: boot1.efifat: provide a fallback startup.nsh
In case the firmware falls through to executing startup.sh, populate it
with the name of our boot loader. In normal operation this should not be
necessary but may allow the system to boot if it would otherwise just
remain at a shell prompt.
MFC r301974: ar: enable reproducible output by default when invoked as 'ar -s'
ar output is already deterministic by default for ar -q and ar -r, and
when invoked as ranlib. Make ar -s equivalent to ranlib and enable
deterministic output by default in that case too.
MFC r302278: libcxxrt: correct mangled "typeinfo name" symbols in Version.map
r261644 (MFC of r260553) added missing C++ typinfos to libcxxrt's
version script. It appears that a number of duplicate mangled symbols
were added due to a cut and paste error. Switch the second instances to
_ZTS*, typeinfo name for *.
Found by lld, which produces an error or warning for duplicate symbols.
MFC r302567:
In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called,
if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate().
MFC 299977: Use polling spin loops for timeouts during early boot.
Some ACPI operations such as mutex acquires and event waits accept a
timeout. The ACPI OSD layer implements these timeouts by using regular
sleep timeouts. However, this doesn't work during early boot before
event timers are setup. Instead, use polling combined with DELAY()
to spin.
This fixes booting on upcoming Intel systems with Kaby Lake processors.
MFC 298370,298372,298377,298379,298380,298484:
Abstract out _OSC handling and invoke it for PCI bridges.
r298370:
Add a wrapper for evaluating _OSC methods.
This wrapper does not translate errors in the first word to ACPI
error status returns. Use this wrapper in the acpi_cpu(4) driver in
place of the existing _OSC code. While here, fix a bug where the wrong
count of words was passed when invoking _OSC.
r298372:
Invoke _OSC on Host-PCI bridges.
Tell the firmware that we support PCI-express config space access
and MSI.
r298377:
Remove query flag from acpi_EvaluateOSC(). This function does not support
return buffer (yet).
r298379:
There is no need to use array any more. No functional change.
r298380:
Prefer sizeof(*pointer) over sizeof(type). No funtional change.
r298484:
Optionally return the output capabilities list from _OSC.
Both of the callers were expecting the input cap_set to be modified.
This fixes them to request cap_set to be updated with the returned buffer.
1) Eliminate possibility to call __*collate_range_cmp() with inclomplete
locale (which cause core dump) by removing whole 'table' argument
by which it passed.
2) Restore __collate_range_cmp() in __sccl().
3) Collating [a-z] range in regcomp() works for single byte locales only
(we can't do it for other ones). In previous state only first 256
wide chars are considered and all others are just silently dropped from the
range.
MFC r300612
Reject ioctl commands for FLSHGCHR and FLSHPCHR if the size is greater
than sc->areq. This is a bounds check to ensure we're not just cramming
arbitrarily sized nonsense into the driver and overflowing the heap.
MFC r299188
Since igb_detach() cleans up all the data structures that will be
free'd by the functions following its call, we can simply return instead
of crashing and burning in the event of igb_detach() failing.
Avoid a possible heap overflow in our nlm code by limiting the number
of service to the arbitrary value of 256. Log an appropriate message
that indicates the hard limit.
MFC r303031: clang++: Always use --eh-frame-hdr on FreeBSD, even for -static
FreeBSD uses LLVM's libunwind on FreeBSD/arm64 today (and we expect to
use it more widely in the future) and it requires the EH frame segment
in static binaries.
MFC - r302384 to 10-STABLE
Do not initialize the adapter on MTU change when adapter status is down.
This fixes long-standing problems when changing settings of the adapter.
Decrease lock contention within the TCP accept case by removing
the INP_INFO lock from tcp_usr_accept. As the PR/patch states
this was following the advice already in the code.
See the PR below for a full discussion of this change and its
measured effects.
Fix problems in the FQ-PIE AQM cleanup code that could leak memory or
cause a crash.
Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup()
is not able to use callout_drain() to make sure that all callouts are
finished before it returns, and callout_stop() is not sufficient to make
that guarantee. After pie_cleanup() returns, dummynet will free a
structure that any remaining callouts will want to access.
Fix these problems by allocating a separate structure to contain the
data used by the callouts. In pie_cleanup(), call callout_reset_sbt()
to replace the normal callout with a cleanup callout that does the cleanup
work for each sub-queue. The instance of the cleanup callout that
destroys the last flow will also free the extra allocated block of memory.
Protect the reference count manipulation in the cleanup callout with
DN_BH_WLOCK() to be consistent with all of the other usage of the reference
count where this lock is held by the dummynet code.
Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.
While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's.
Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.
READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
Updates to EC2 loader.conf:
* Use console=comconsole (r301732) since EC2 now has a "VGA" console;
* Enable blkif indirect segment I/O (r302288) since EC2 now consistently
gets better disk performance with this option enabled.
Support checksum offloading for TCP/IPV6 and UDP/IPV6.
Support SCTP checksum offloading for SCTP/IPV6.
Support SCTP checksum offloading on all controllers except 82575.
Don't check the area that the host has not filled.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209443
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210425
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6955
302605
hyperv/stor: Save the response status and xfer length properly.
The current command response handling discards status and xfer
length unconditionally, so that all of the commands would be
considered successful, even if errors happened. When errors
really happens, this causes all kinds of wiredness, since the
buffer will not be filled on the host side and sense data will
be ignored.
Most of the time, errors do not happen, however, error does
happen for the request sent immediately after the disk resizing.
Discarding the SCSI status (SCSI_STATUS_CHECK_COND) and sense
data (capacity changes) prevents the disk resizing from working
properly.
This commit saves the response status and xfer length properly
for later use.
Submitted by: Dexuan Cui <decui microsoft com>
Noticed by: sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D7181
This library uses macros to define fprintf behvavior for several object
types The compiler will see the non-string literal arguments to the fprintf
calls and omit warnings for them. Quiese these warnings in contrib code:
cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c:743:12: warning: format
string is not a string literal (potentially insecure) [-Wformat-security]
ARENDER(pctl, nvlist_array, nvl, name, val, nelem);
cddl/lib/libavl/Makefile
cddl/lib/libctf/Makefile
cddl/lib/libnvpair/Makefile
cddl/lib/libumem/Makefile
cddl/lib/libuutil/Makefile
Increase WARNS to the highest working level for each of these
libraries
Add the ability to print out the module specific information in likely formats.
Among other things this gives us the ability to find out the syscall number of a dynamically loaded syscall that has a dynamicly allocated vector number.
MFC r302402: Fix ahci(4) driver attach to controller with 32 ports.
Incorrect sign expansion in variables that supposed to be a bit fields
caused infinite loop. Fixing this allows system properly detect maximal
possible 32 devices configured on AHCI HBA of BHyVe. That case did not
happen in a wild before due to lack of hardware AHCI HBAs with 32 ports.