avg [Thu, 11 Aug 2016 20:48:03 +0000 (20:48 +0000)]
MFC r303763,303791,303869: zfs: honour and make use of vfs vnode locking protocol
ZFS POSIX Layer is originally written for Solaris VFS which is very
different from FreeBSD VFS. Most importantly many things that FreeBSD VFS
manages on behalf of all filesystems are implemented in ZPL in a different
way.
Thus, ZPL contains code that is redundant on FreeBSD or duplicates VFS
functionality or, in the worst cases, badly interacts / interferes
with VFS.
The most prominent problem is a deadlock caused by the lock order reversal
of vnode locks that may happen with concurrent zfs_rename() and lookup().
The deadlock is a result of zfs_rename() not observing the vnode locking
contract expected by VFS.
This commit removes all ZPL internal locking that protects parent-child
relationships of filesystem nodes. These relationships are protected
by vnode locks and the code is changed to take advantage of that fact
and to properly interact with VFS.
Removal of the internal locking allowed all ZPL dmu_tx_assign calls to
use TXG_WAIT mode.
Another victim, disputable perhaps, is ZFS support for filesystems with
mixed case sensitivity. That support is not provided by the OS anyway,
so in ZFS it was a buch of dead code.
To do:
- replace ZFS_ENTER mechanism with VFS managed / visible mechanism
- replace zfs_zget with zfs_vget[f] as much as possible
- get rid of not really useful now zfs_freebsd_* adapters
- more cleanups of unneeded / unused code
- fix / replace .zfs support
ae [Thu, 11 Aug 2016 10:41:19 +0000 (10:41 +0000)]
MFC r303842:
Fix constructing of setdscp opcode with tablearg keyword.
setdscp's argument can have zero value that conflicts with IP_FW_TARG
value. Always set high-order bit if parser doesn't find tablearg keyword.
MFC r303845:
Fix formatting of setfib opcode.
Zero fib is correct value and it conflicts with IP_FW_TARG.
Use bprint_uint_arg() only when opcode contains IP_FW_TARG,
otherwise just print numeric value with cleared high-order bit.
tuexen [Thu, 11 Aug 2016 10:14:03 +0000 (10:14 +0000)]
MFC r303792:
Fix various bugs in relation to the I-DATA chunk support
This is joint work with rrs.
MFC r303793:
Mark an unused parameter as such.
MFC r303798:
Don't modify a structure without holding a reference count on it.
MFC r303813:
Remove stream queue entry consistently from wheel.
While there, improve the handling of drain.
MFC r303819:
Consistently check for unsent data on the stream queues.
MFC r303831:
Fix a locking issue found by stress testing with tsctp.
The inp read lock neeeds to be held when considering control->do_not_ref_stcb.
MFC r303834:
Fix the sending of FORWARD-TSN and I-FORWARD-TSN chunks. The
last SID/SSN pair wasn't filled in.
Thanks to Julian Cordes for providing a packetdrill script
triggering the issue and making me aware of the bug.
mjg [Thu, 11 Aug 2016 09:30:25 +0000 (09:30 +0000)]
MFC r303583:
amd64: implement pagezero using rep stos
The current implementation uses non-temporal writes. This turns out to
be detrimental to performance if the page is used shortly after, which
is the typical case with page faults.
rwlock: s/READER/WRITER/ in wlock lockstat annotation
==
sx: increment spin_cnt before cpu_spinwait in xlock
The change is a no-op only done for consistency with the rest of the file.
==
locks: change sleep_cnt and spin_cnt types to u_int
Both variables are uint64_t, but they only count spins or sleeps.
All reasonable values which we can get here comfortably hit in 32-bit range.
==
Implement trivial backoff for locking primitives.
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.
One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.
For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.
Current parameters are autotuned on boot based on mp_cpus.
Autotune factors are very conservative and are subject to change later.
==
locks: fix up ifdef guards introduced in r303643
Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct
define.
==
locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case
==
locks: fix sx compilation on mips after r303643
The kernel.h header is required for the SYSINIT macro, which apparently
was present on amd64 by accident.
dim [Tue, 9 Aug 2016 19:20:53 +0000 (19:20 +0000)]
MFC r303676:
Fix a segfault in bsdgrep when parsing the invalid extended regexps "?"
or "+" (these are invalid, because there is no preceding operand).
When bsdgrep attempts to emulate GNU grep in discarding and ignoring the
invalid ? or + operators, some later logic in tre_compile_fast() goes
beyond the end of the buffer, leading to a crash.
Fix this by bailing out, and reporting a bad pattern instead.
Approved by: re (gjb, kib)
Reported by: Steve Kargl
gonzo [Mon, 8 Aug 2016 17:53:51 +0000 (17:53 +0000)]
MFC r303726
Fix EHCI driver by excluding first 512K from available memory
On Zynq 256K-512K memory region is not accessible by all bus masters.
EHCI driver fails when trying to use it for DMA transfers. Patching
memory node does not help because ubldr overrides values there with
the ones obtained from u-boot. So as a workaround we just mark first
512K as reserved.
PR: 211484
Submitted by: Thomas Skibo <thoma555-bsd@yahoo.com>
Approved by: re (gjb)
vangyzen [Mon, 8 Aug 2016 15:07:38 +0000 (15:07 +0000)]
MFC r303788
Fix some logic in PCIe HotPlug; display EI status
The interpretation of the Electromechanical Interlock Status was
inverted, so we disengaged the EI if a card was inserted.
Fix it to engage the EI if a card is inserted.
When displaying the slot capabilites/status with pciconf:
- We inverted the sense of the Power Controller Control bit,
saying the power was off when it was really on (according to
this bit). Fix that.
- Display the status of the Electromechanical Interlock:
EI(engaged)
EI(disengaged)
sephe [Mon, 8 Aug 2016 07:09:40 +0000 (07:09 +0000)]
MFC 303737
hyperv/storvsc: Claim SPC-3 conformance, thus enable UNMAP support
The Hyper-V on pre-win10 systems will only report SPC-2 conformance,
but it actually conforms to SPC-3. The INQUIRY response is adjusted
to propagate the SPC-3 version information to CAM.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7405
oshogbo [Mon, 8 Aug 2016 06:33:57 +0000 (06:33 +0000)]
MFC r302966:
Fix nvlist array memory leak.
When we change nvl_array_next to NULL it means that we want to destroy
or take nvlist_array. The nvpair, which stores next nvlist of
nvlist_array element is no longer needed and can be freed.
Submitted by: Adam Starak <starak.adam@gmail.com>
Approved by: re (gjb)
jhb [Sat, 6 Aug 2016 23:52:09 +0000 (23:52 +0000)]
MFC 303076,303225: Use MTX_SYSINIT for the VESA lock.
303076:
vesa: fix panic on suspend
Fix the following panic seen when migrating a FreeBSD guest on Xen:
panic: mtx_lock() of destroyed mutex @ /usr/src/sys/dev/fb/vesa.c:541
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001d2fa4f0
vpanic() at vpanic+0x182/frame 0xfffffe001d2fa570
kassert_panic() at kassert_panic+0x126/frame 0xfffffe001d2fa5e0
__mtx_lock_flags() at __mtx_lock_flags+0x15b/frame 0xfffffe001d2fa630
vesa_bios_save_restore() at vesa_bios_save_restore+0x78/frame 0xfffffe001d2fa680
vga_suspend() at vga_suspend+0xa3/frame 0xfffffe001d2fa6b0
isavga_suspend() at isavga_suspend+0x1d/frame 0xfffffe001d2fa6d0
bus_generic_suspend_child() at bus_generic_suspend_child+0x44/frame
[...]
This is caused because vga_sub_configure (which is called if the VGA adapter
is attached after VESA tried to initialize), points to vesa_configure, which
doesn't initialize the VESA mutex. In order to fix it, make sure
vga_sub_configure points to vesa_load, so that all the needed vesa
components are properly initialized.
303225:
Use MTX_SYSINIT for the VESA lock.
vesa_init_done isn't a reliable guard for the mutex init. If
vesa_configure() doesn't find valid VESA info it will not set
vesa_init_done, but the lock will remain initialized. Revert r303076
and use MTX_SYSINIT to deterministically init the lock.
jhb [Fri, 5 Aug 2016 22:23:04 +0000 (22:23 +0000)]
MFC 303406,303501: Fix panic when using aio_fsync().
303406:
Adjust tests in fsync job scheduling loop to reduce indentation.
303501:
Fix locking issues with aio_fsync().
- Use correct lock in aio_cancel_sync when dequeueing job.
- Add _locked variants of aio_set/clear_cancel_function and use those
to avoid lock recursion when adding and removing fsync jobs to the
per-process sync queue.
- While here, add a basic test for aio_fsync().
jhb [Fri, 5 Aug 2016 18:41:51 +0000 (18:41 +0000)]
MFC 303497,303559,303645: Disable PCI-e hotplug on bridges with power
controllers.
303497:
Add a loader tunable (hw.pci.enable_pcie_hp) to disable PCI-e HotPlug.
Some systems and/or devices (such as riser cards) do not include a
non-compliant implementation of PCI-e HotPlug that can result in devices
not being attached (e.g. the HotPlug code might assume that a card is
being unplugged and will power the slot off and detach it). This
tunable can be set to 0 to disable support for PCI-e HotPlug ignoring
the incorrect HotPlug state on these slots.
303559:
Try to declare _hw_pci for all sysctl cases needed after r303497.
303645:
Disable PCI hotplug support for slots with power controllers.
After further review of the spec, I do not think the current HotPlug
code handles slots with power controllers correctly. In particular,
the power state of the slot is to be inferred from other events, not
from examining the state of the power control bit in SLOT_CTL. For now,
disable PCI hotplug support on such slots.
Mention that basename(3) and dirname(3) will change in the future.
Update the existing manual pages for basename(3) and dirname(3) to
mention that in future versions of FreeBSD, these functions will no
longer use internal buffers for storing the results.
dim [Thu, 4 Aug 2016 17:26:32 +0000 (17:26 +0000)]
MFC r303631:
Fix non-functional bsdinstall services dialog.
The most recent version of bsdinstall does not seem to respect any of
the checkboxes in the "Choose the services you would like to be started
at boot" dialog. None of the chosen services end up in the rc.conf file
that is installed onto the target system.
This is caused by the bsdinstall/scripts/hardening script, which
implements the new hardening options dialog. The script starts by
overwriting the previously written rc.conf.services file:
echo -n > $BSDINSTALL_TMPETC/rc.conf.services
which is obviously incorrect. It should clear out rc.conf.hardening
instead.
MFC r303615:
An old tables implementation had all tables preallocated,
so when user did `ipfw table N flush` it always worked, but now
when table N doesn't exist the kernel returns ESRCH error.
This isn't fatal error for flush and destroy commands. Do not
call err(3) when errno is equal to ESRCH. Also warn only when
quiet mode isn't enabled. This fixes a regression in behavior,
when old rules are loaded from file.
Also use correct value for switch in the table_swap().
ngie [Wed, 3 Aug 2016 01:06:51 +0000 (01:06 +0000)]
MFC r302550,r302551,r302552,r302553:
Approved by: re (gjb)
r302550:
Deobfuscate cleanup path in clnt_dg_create(..)
Similar to r300836 and r301800, cl and cu will always be non-NULL as they're
allocated using the mem_alloc routines, which always use
`malloc(..., M_WAITOK)`.
Deobfuscating the cleanup path fixes a leak where if cl was NULL and
cu was not, cu would not be free'd, and also removes a duplicate test for
cl not being NULL.
Similar to r300836, r301800, and r302550, cl and ct will always
be non-NULL as they're allocated using the mem_alloc routines,
which always use `malloc(..., M_WAITOK)`.
ngie [Wed, 3 Aug 2016 00:03:03 +0000 (00:03 +0000)]
MFC r302571,r302572,r302577,r302841:
Approved by: re (gjb)
r302571:
Remove redundant declaration for radeon_pm_acpi_event_handler(..) to fix
-Wredundant-decls warning
PR: 209924
Tested with: devel/amd64-gcc (5.3.0)
r302572:
Remove redundant declarations for intel_fbc_enabled(..) and
i915_gem_dump_object(..) to fix -Wredundant-decls warning
PR: 209924
Tested with: devel/amd64-gcc (5.3.0)
r302577:
Add missing default case to capable(..) function definition
By definition (enum __drm_capabilities), cases other than CAP_SYS_ADMIN
aren't possible. Add in a KASSERT safety belt and return false in
!INVARIANTS case if an invalid value is passed in, as it would be a
programmer error.
This fixes a -Wreturn-type error with gcc 5.3.0.
r302841:
Always panic if an invalid capability is passed to `capable(..)` instead of
just with INVARIANTS
rwatson's point was valid in the sense that if the data passed at runtime is
invalid, it should always trip the invariant, not just in the debug case.
This is a deterrent against malicious input, or input caused by hardware
errors.
glebius [Tue, 2 Aug 2016 13:57:20 +0000 (13:57 +0000)]
Merge r303263:
Partially revert r257696/r257713, which have an issue with writing to user
controlled address. Restore the old code that emulated OSIOCGIFCONF in if.c.
alc [Mon, 1 Aug 2016 21:21:26 +0000 (21:21 +0000)]
MFC r303356 and r303465
Remove any mention of cache (PG_CACHE) pages from the comments in
vm_pageout_scan(). That function has not cached pages since r284376.
emaste [Mon, 1 Aug 2016 20:02:59 +0000 (20:02 +0000)]
MFC r303396: rename ARM's libunwind.S to to avoid conflict with llvm libunwind
llvm libunwind includes a libunwind.cpp, but on ARM libunwind.S is found
first in .PATH. Rename the latter one, since it is not going to be
updated again.
emaste [Mon, 1 Aug 2016 19:50:28 +0000 (19:50 +0000)]
MFC r303338: vt: lock Giant around kbd calls in CONS_GETINFO
Note that keyboards are stored in an array and are not freed (just
"unregistered" by clearing some fields) so a race would be limited to
obtaining stale information about an unregistered keyboard.
emaste [Mon, 1 Aug 2016 16:18:01 +0000 (16:18 +0000)]
iMFC r303400: libcxxrt: fix demangling of wchar_t
'wchar_t' is 7 characters long, not 6. r303297 (MFC'd in r303398) fixed
this in libelftc, but not the second copy of this file that we have in
libcxxrt.
MFC r302929: Now that potentially buggy versions of Xen are automatically
detected (see r302635, MFCed as r302895), there is no need to force msix
interrupt migration off via loader.conf.
MFC 303164: Add more documentation regarding unsafe AIO requests.
The asynchronous I/O changes made previously result in different
behavior out of the box. Previously all AIO requests failed with
ENOSYS / SIGSYS unless aio.ko was explicitly loaded. Now, some AIO
requests complete and others ("unsafe" requests) fail with EOPNOTSUPP.
Reword the introductory paragraph in aio(4) to add a general
description of AIO before describing the vfs.aio.enable_unsafe sysctl.
Remove the ENOSYS error description from aio_fsync(2), aio_read(2),
and aio_write(2) and replace it with a description of EOPNOTSUPP.
Remove the ENOSYS error description from aio_mlock(2).
Log a message to the system log the first time a process requests an
"unsafe" AIO request that fails with EOPNOTSUPP. This is modeled on
the log message used for processes using the legacy pty devices.
Add new System Hardening menu and options to bsdinstall.
This patch add new 'hardening' file responsible for new bsdinstall
'System Hardening' menu allowing users to set some sane and carefully
picked system security options (like random process id's, hiding
other users/groups processes and others).
All options are OFF by default in this patch due to POLA principle
with intention to turn change some of them to ON by default in future.
MFC 303109: Update crashinfo to work with newer gdb from ports.
If gdb from ports is installed, use it instead of the base system gdb
to extract variables from a kernel. Note that base gdb and ports gdb
do not support the same options for invoking a single command in batch
mode, so a wrapper shell function is used. In addition, prefer kgdb
from ports when generating a backtrace if present.
r303272:
SYSTEM_COMPILER: Rework the logic to allow a 'make test-system-compiler'.
r303273:
Fix empty WANT_COMPILER_TYPE when neither compiler is bootstrapped.
MFC r303034: Include makewhatis in ITOOLS when MK_MAN_UTILS is true
Previously it was conditional on MK_MAN. It's possible to build
FreeBSD with man pages but without man page tools. MK_MAN_UTILS
is the conditional used in share/man/Makefile for determining whether
makewhatis is executed at install time, so it is the proper one for
ITOOLS as well.
MFC r303046:
libc: tag the rune initialization function prototypes visibility as hidden.
It is good practice to export as few symbols as possible from your shared
libraries, so use the GCC visibility attribute in this case, matching what
Apple's libc does.
MFC 302899: Add documentation for the sigevent structure.
- Add a sigevent(3) manpage to give a general overview of the sigevent
structure and the available notification mechanisms.
- Document that AIO requests contain a nested sigevent structure that can
be used to request completion notification.
- Expand the sigevent details in other manuals to note details such as
the extra values stored in a queued signal's information or in a posted
kevent.
_Unwind_Exception is required to be double word aligned. GCC has
interpreted this to mean "use the maximum useful alignment for the
target" so follow that lead.