ngie [Sun, 1 Jan 2017 04:48:38 +0000 (04:48 +0000)]
MFstable/11 r310997:
MFC r310996:
Look for list.h in ${.CURDIR} to unbreak the build with a ports-based copy
of llvm38 on ^/stable/11 (oh, the bugs you find when you set CC,CXX,CPP
manually and it skips the bootstrap stage for the toolchain...)
(by markj)
Don't increment the spin count until after the first attempt to acquire a
rwlock read lock. Otherwise the lockstat:::rw-spin probe will fire
spuriously.
==
rwlock: s/READER/WRITER/ in wlock lockstat annotation
==
sx: increment spin_cnt before cpu_spinwait in xlock
The change is a no-op only done for consistency with the rest of the file.
==
locks: change sleep_cnt and spin_cnt types to u_int
Both variables are uint64_t, but they only count spins or sleeps.
All reasonable values which we can get here comfortably hit in 32-bit range.
==
Implement trivial backoff for locking primitives.
All current spinning loops retry an atomic op the first chance they get,
which leads to performance degradation under load.
One classic solution to the problem consists of delaying the test to an
extent. This implementation has a trivial linear increment and a random
factor for each attempt.
For simplicity, this first thouch implementation only modifies spinning
loops where the lock owner is running. spin mutexes and thread lock were
not modified.
Current parameters are autotuned on boot based on mp_cpus.
Autotune factors are very conservative and are subject to change later.
==
locks: fix up ifdef guards introduced in r303643
Both sx and rwlocks had copy-pasted ADAPTIVE_MUTEXES instead of the correct
define.
==
locks: fix compilation for KDTRACE_HOOKS && !ADAPTIVE_* case
==
locks: fix sx compilation on mips after r303643
The kernel.h header is required for the SYSINIT macro, which apparently
was present on amd64 by accident.
mjg [Sat, 31 Dec 2016 13:23:28 +0000 (13:23 +0000)]
MFC r303583:
amd64: implement pagezero using rep stos
The current implementation uses non-temporal writes. This turns out to
be detrimental to performance if the page is used shortly after, which
is the typical case with page faults.
ngie [Sat, 31 Dec 2016 00:01:21 +0000 (00:01 +0000)]
MFstable/11 r310877:
MFC r310455:
Clarify failure in snmp_output(..) with call to snmp_pdu_decode
- Explicitly test snmp_pdu_encode against SNMP_CODE_OK instead of assuming
any non-zero value is bad.
- Print out the code before calling abort() to give the end-user something
actionable to debug without having to recompile the binary, since the
core might not have these details.
ngie [Fri, 30 Dec 2016 23:59:43 +0000 (23:59 +0000)]
MFstable/11 r310875:
MFC r310458,r310466:
r310458:
Group all loadable modules in the %default section
This will allow new users to uncomment the modules and have things work
with less head scratching, in the event they decide to uncomment any
of the section separators, e.g. %usm or %vcm, as the module loading is
only effective in the %default section.
r310466:
Don't hardcode $(securityModelUSM) (3) in the authPriv example under the %vacm
section
arybchik [Fri, 30 Dec 2016 17:24:28 +0000 (17:24 +0000)]
MFC r310627
sfxge(4): do not limit driver RSS table to RSS channels max
Specification of entire RSS table in the driver allows to spread traffic
more equally across CPUs/RSS channels if number of RSS channels is not
power of 2.
sephe [Thu, 29 Dec 2016 09:10:37 +0000 (09:10 +0000)]
MFC 309320,309726,309728
309320
hyperv/storvsc: Don't use timedwait.
The timeout is unnecessary.
Reviewed by: jhb
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8656
309726
hyperv/storvsc: Fix the SCSI disk attachment issue.
On pre-WS2016 Hyper-V, if the only LUNs > 7 are used, then all disks
fails to attach. Mainly because those versions of Hyper-V do not set
SRB_STATUS properly and deliver junky INQUERY responses.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reported by: Hongxiong Xian <v-hoxian microsoft com>
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8724
309728
hyperv/storvsc: Minor style changes; no functional changes.
sephe [Thu, 29 Dec 2016 06:48:10 +0000 (06:48 +0000)]
MFC 309085
hyperv/hn: Fix primary channel revocation
Since hypervisor will not drain the TX bufring, once the channels are
revoked:
- Setup vmbus orphan handler properly.
- Make sure that suspension will not wait the TX bufring draining
forever.
- GC the pending TX descs on detach path, before freeing the busdma
stuffs.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8559
sephe [Thu, 29 Dec 2016 06:45:36 +0000 (06:45 +0000)]
MFC 309030,309039,309080,309081,309083
309030
hyperv/vmbus: Set a mark on the revoked channel.
This will be used to fix device detach DEVMETHOD for revoked primary
channel.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8522
309039
hyperv/vmbus: Merge free/active locks.
These functions are only used by management stuffs, so there are
no needs to introduce extra complexity.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8524
309080
hyperv/vmbus: Implement orphan support for transaction API
It will be used to fix the primary channel revocation support.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8525
309081
hyperv/vmbus: Fix the primary channel revoking on vmbus side.
Drivers can now use vmbus_chan_{is_revoked,set_orphan,unset_orphan}() and
vmbus_xact_ctx_orphan() to fix their attach/detach DEVMETHODs for revoked
primary channels.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8545
309083
hyperv/vmbus: Fix the multi-channel revoking on vmbus side.
- Reference count the sub-channel when channel offer message is
processed, so that immediate rescind message on the same channel
will not race sub-channel open on driver side.
- Drop the above reference when sub-channel is closed, this closely
mimics the hypervisor's reaction when primary channel is closed
on the VM side. No drivers use sub-channel after primary channel
is closed.
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8546
sephe [Thu, 29 Dec 2016 05:32:34 +0000 (05:32 +0000)]
MFC 308664,308742,308743
308664
hyperv/vss: Add driver and tools for VSS
VSS stands for "Volume Shadow Copy Service". Unlike virtual machine
snapshot, it only takes snapshot for the virtual disks, so both
filesystem and applications have to aware of it, and cooperate the
whole VSS process.
This driver exposes two device files to the userland:
/dev/hv_fsvss_dev
Normally userland programs should _not_ mess with this device file.
It is currently used by the hv_vss_daemon(8), which freezes and
thaws the filesystem. NOTE: currently only UFS is supported, if
the system mounts _any_ other filesystems, the hv_vss_daemon(8)
will veto the VSS process.
If hv_vss_daemon(8) was disabled, then this device file must be
opened, and proper ioctls must be issued to keep the VSS working.
/dev/hv_appvss_dev
Userland application can opened this device file to receive the
VSS freeze notification, hold the VSS for a while (mainly to flush
application data to filesystem), release the VSS process, and
receive the VSS thaw notification i.e. applications can run again.
The VSS will still work, even if this device file is not opened.
However, only filesystem consistency is promised, if this device
file is not opened or is not operated properly.
hv_vss_daemon(8) is started by devd(8) by default. It can be disabled
by editting /etc/devd/hyperv.conf.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: kib, mckusick
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8224
308742
hyperv/vss: Nuke unused variables.
Submitted by: markj
Reported by: markj
Sponsored by: Microsoft
308743
hyperv/vss: Install the userland daemon to /usr/sbin instead of /
Submitted by: markj
Reported by: markj
Sponsored by: Microsoft
jhb [Tue, 27 Dec 2016 20:06:26 +0000 (20:06 +0000)]
MFC 309581,309582,310424: Document T6 support.
309581:
Document support for Terminator 6 adapters in cxgbe(4) and cxgbev(4).
309582:
Bump Dd for addition of T6.
310424:
Replace passive voice with active voice and other tweaks.
- Drop uses of 'will'.
- Replace 'to use' with active voice.
- Tidy language around interrupt types and clarify that INTx doesn't
work on VFs.
- Drop leading articles from sysctl/tunable descriptions.
- Tweak the wording of several sysctl/tunable descriptions.
pfg [Mon, 26 Dec 2016 16:43:39 +0000 (16:43 +0000)]
MFC r310367:
pax(1): Fix a bug with archives smaller than 512 bytes.
The problem here is that the archive is too short (< 512 bytes). The
buffer routines, try to read at least 512 bytes, even when we try to
determine what format file we have, which is wrong.
avg [Sat, 24 Dec 2016 13:28:39 +0000 (13:28 +0000)]
define Maxmem for ia64, the only platform that didn't have it
This is a direct commit to stable/10 as the platform was removed
in the newer branches.
Maxmem is required for compiling fwohci(4) on ia64 since commit
r310081, MFC of r277511.
It was easier to add Maxmem than to make a special case for ia64
in fwohci.
ngie [Sat, 24 Dec 2016 13:00:19 +0000 (13:00 +0000)]
MFstable/11 r310506:
MFC r309837:
Change the process limits for RLIMIT_MEMLOCK to RLIM_INFINITY when
executing :mincore_resid
The default process limits in FreeBSD is 64kB for unprivileged users,
which empirically is too low to run the :mincore_resid testcase.
Process limits are inherited, so even though the default limit for
root users is RLIM_INFINITY, the inherited limit with "sudo" with the
default login.conf will be 64kB.
Use setrlimit to set rlim_max for RLIMIT_MEMLOCK to RLIM_INFINITY to
avoid ENOMEM issues when calling mlock to wire the mmap'ed address
space.
setrlimit requires root access to increase rlim_max, so require root
privileges when running the test
Discovered when executing the tests with sudo, e.g.
"sudo kyua test -k /usr/tests/lib/libc/sys/Kyuafile mincore_test"
jhb [Fri, 23 Dec 2016 19:42:17 +0000 (19:42 +0000)]
MFC 309588: Don't attach to Host-PCI bridges with a bad bus number.
If the bus number assigned to a Host-PCI bridge doesn't match the first
bus number in the associated producer range from _CRS, print a warning and
fail to attach rather than panicking due to an assertion failure.
At least one single-socket Dell machine leaves a "ghost" Host-PCI bridge
device in the ACPI namespace that seems to correspond to the I/O hub in
the second socket of a two-socket machine. However, the BIOS doesn't
configure the settings for this "ghost" bridge correctly, nor does it have
any PCI devices behind it.
jhb [Fri, 23 Dec 2016 19:28:15 +0000 (19:28 +0000)]
MFC 308820,308821: Fixes for fatal page faults on x86.
308820:
Report page faults due to reserved bits in PTEs as a separate fault type.
Rather than reporting a page fault due to a bad PTE as a protection
violation with the "rsv" flag, treat these faults as a separate type of
fault altogether.
308821:
MFamd64: Various fatal page fault fixes.
- If a page fault is triggered due to reserved bits in a PTE, treat it
as a fatal fault and panic.
- If PG_NX is in use, report whether a fatal page fault is due to an
instruction fetch or a data access.
- If a fatal page fault is due to reserved bits in a PTE, report that as
the page fault type rather than a protection violation.
ken [Fri, 23 Dec 2016 18:29:10 +0000 (18:29 +0000)]
MFC, r310338:
------------------------------------------------------------------------
r310338 | ken | 2016-12-20 14:17:07 -0700 (Tue, 20 Dec 2016) | 37 lines
Turn on FC-Tape by default in the isp(4) driver.
FC-Tape provides additional link level error recovery, and is
highly recommended for tape devices. It will only be turned on for
a given target if the target supports it.
Without this setting, we default to whatever FC-Tape setting is in
NVRAM on the card.
This can be overridden by setting the following loader tunable, for
example for isp0:
hint.isp.0.nofctape=1
sys/conf/options:
Add a new kernel config option, ISP_FCTAPE_OFF, that
defaults the FC-Tape configuration to off.
sys/dev/isp/isp_pci.c:
If ISP_FCTAPE_OFF is defined, turn off FC-Tape. Otherwise,
turn it on if the card supports it.
share/man/man4/isp.4:
Add a description of FC-Tape to the isp(4) man page.
Add descriptions of the fctape and nofctape options, as well as the
ISP_FCTAPE_OFF kernel configuration option.
Add the ispfw module and kernel drivers to the suggested
configurations at the top of the man page so that users are less
likely to leave it out. The driver works well with the included
firmware, but may not work at all with whatever firmware the user
has flashed on their card.
ed [Tue, 20 Dec 2016 07:50:49 +0000 (07:50 +0000)]
MFC r309650:
Properly sign extend the result of jrand48() and mrand48().
These functions are supposed to return a value between [-2^31, 2^31).
This doesn't seem to work on 64-bit systems, where we return a value
between [0, 3^32). Patch up the function to use proper casts to int32_t.
While there, fix some other style bugs.
rmacklem [Mon, 19 Dec 2016 22:28:28 +0000 (22:28 +0000)]
MFC: r309566
Fix the NFSv4.1 server for Open reclaim after a reboot.
The NFSv4.1 server failed to update the nfs-stablerestart file for
a client when the client was issued its first Open. As such, recovery
of Opens after a server reboot failed with NFSERR_NOGRACE.
This patch fixes this.
It also changes the code so that it malloc()'s the 1024 byte array
instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1.
Note that this bug only affected NFSv4.1 and only when clients attempted
to reclaim Opens after a server reboot.
lifanov [Mon, 19 Dec 2016 19:39:02 +0000 (19:39 +0000)]
MFC r310160
retain cc.4.gz man page for Chelsio T6 NICs
This man page was removed in r225583 when cc.4 was renamed to mod_cc.4
With reintroduction of cc.4 "make installworld; make delete-old" was
no longer convergent.
Reviewed by: matthew
Approved by: jhb (implicit), matthew (mentor)
Differential Revision: https://reviews.freebsd.org/D8829
trasz [Mon, 19 Dec 2016 18:26:26 +0000 (18:26 +0000)]
MFC r307774:
Fix libusb20_dev_get_desc(3) to use the "vendor product" order, not
"product vendor". This is consistent with how it's generally done.
The ordering is visible eg in usbconfig(8) output.
kadesai [Mon, 19 Dec 2016 13:14:39 +0000 (13:14 +0000)]
MFC r309284-r309294
r309294
This patch upgrades driver version to 06.712.04.00-fbsd
r309293
This patch will add code to refire IOCTL commands after OCR.
r309292
This patch will unblock SYNCHRONIZE_CACHE command to firmware,
i.e. don't block the SYNCHRONIZE_CACHE command at driver instead of passing it to firmware for all Gen3 controllers.
r309291
Wait for AEN task to be completed(if in queue) before resetting the controller
and return without processing event in AEN thread, if controller reset is in progress.
r309290
This patch will add task management support in driver. Below is high level description:
If a SCSI IO times out, then before initiating OCR, now the driver will try to send a
target reset to the particular target for which the IO is timed out. If that also fails,
then the driver will initiate OCR.
r309289
Process outstanding reply descriptors from all the reply descriptor post queues before initiating OCR.
r309288
Clean up reference to AEN command if abort AEN is succesful as the command is aborted.
Did the same by setting sc->aen_cmd = NULL when aborting AEN is successful.
r309287
Update controller properties(read OCR capability bit) when MR_EVT_CTRL_PROP_CHANGED recieved.
r309286
Add sanity check in IO and IOCTL path not to process command further if controller is in
HW_CRITICAL_ERROR.
r309285
Use a variable to indicate Gen3 controllers and remove all PCI ids based
checks used for gen3 controllers.
r309284
High level description of new solution -
Free MFI and MPT command from same context.
Free both the command either from process (from where mfi-mpt pass-through was called) or from
ISR context. Do not split freeing of MFI and MPT, because it creates the race condition which
will do MFI/MPT list.
hselasky [Mon, 19 Dec 2016 09:52:32 +0000 (09:52 +0000)]
MFC r309400:
Fix for endless recursion in the ACPI GPE handler during boot.
When handling a GPE ACPI interrupt object the EcSpaceHandler()
function can be called which checks the EC_EVENT_SCI bit and then
recurse on the EcGpeQueryHandler() function. If there are multiple GPE
events pending the EC_EVENT_SCI bit will be set at the next call to
EcSpaceHandler() causing it to recurse again via the
EcGpeQueryHandler() function. This leads to a slow never ending
recursion during boot which prevents proper system startup, because
the EC_EVENT_SCI bit never gets cleared in this scenario.
The behaviour is reproducible with the ALASKA AMI in combination with
a newer Skylake based mainboard in the following way:
Enter BIOS and adjust the clock one hour forward. Save and exit the
BIOS. System fails to boot due to the above mentioned bug in
EcGpeQueryHandler() which was observed recursing multiple times.
This patch adds a simple recursion guard to the EcGpeQueryHandler()
function and also also adds logic to detect if new GPE events occurred
during the execution of EcGpeQueryHandler() and then loop on this
function instead of recursing.
hselasky [Mon, 19 Dec 2016 09:45:23 +0000 (09:45 +0000)]
MFC r309404:
Fix return value from ng_uncallout().
callout_stop() recently started returning -1 when the callout is already
stopped, which is not handled by the netgraph code. Properly filter
the return value. Netgraph callers only want to know if the callout
was cancelled and not draining or already stopped.
dim [Sun, 18 Dec 2016 14:31:11 +0000 (14:31 +0000)]
MFC r310013 (by cperciva):
Check that blkfront devices have a non-zero number of sectors and a
non-zero sector size. Such a device would be a virtual disk of zero
bytes; clearly not useful, and not something we should try to attach.
As a fortuitous side effect, checking that these values are non-zero
here results in them not *becoming* zero later on the function. This
odd behaviour began with r309124 (clang 3.9.0) but is challenging to
debug; making any changes to this function whatsoever seems to affect
the llvm optimizer behaviour enough to make the unexpected zeroing of
the sector_size variable cease.
PR: 215209
Security: The potential for variables to unexpectedly become zero
has worrying consequences for security in general, but
not so much in this particular context.
MFC r310086:
In xbd_connect(), use correct scanf conversion specifiers for the
feature_barrier and feature_flush variables. Otherwise, adjacent
variables on the stack, such as sector_size, may be overwritten, with
disastrous results.
Note that I did not see a good reason to revert the addition of zero
checks introduced in r310013. Better safe than sorry.
asomers [Fri, 16 Dec 2016 20:10:55 +0000 (20:10 +0000)]
MFC r308806
Speed up pw operations that edit /etc/group or /etc/passwd
r285050 fixed a bug in pw that could lead to /etc/passwd or /etc/group
corruption on power loss. However, it fixed it by opening those files with
O_SYNC, which is very slow, especially on ZFS. This change replaces O_SYNC
with appropriately placed fsync()s instead, which is much faster. Using a
ZFS tmpdir, the time to run pw's kyua tests drops from 245s to 35s.
jhb [Fri, 16 Dec 2016 01:06:35 +0000 (01:06 +0000)]
MFC 308690: Sync instruction cache's after writing user breakpoints on MIPS.
Add an implementation for pmaps_sync_icache() on MIPS that sync's the
instruction cache on all CPUs via smp_rendezvous() after a debugger
inserts a breakpoint via ptrace(PT_IO).
vangyzen [Thu, 15 Dec 2016 16:52:17 +0000 (16:52 +0000)]
MFC r309676
Export the whole thread name in kinfo_proc
kinfo_proc::ki_tdname is three characters shorter than
thread::td_name. Add a ki_moretdname field for these three
extra characters. Add the new field to kinfo_proc32, as well.
Update all in-tree consumers to read the new field and assemble
the full name, except for lldb's HostThreadFreeBSD.cpp, which
I will handle separately. Bump __FreeBSD_version.
mav [Thu, 15 Dec 2016 08:11:32 +0000 (08:11 +0000)]
MFC 309714: Fix spa_alloc_tree sorting by offset in r305331.
Original commit "7090 zfs should improve allocation order" declares alloc
queue sorted by time and offset. But in practice io_offset is always zero,
so sorting happened only by time, while order of writes with equal time was
completely random. On Illumos this did not affected much thanks to using
high resolution timestamps. On FreeBSD due to using much faster but low
resolution timestamps it caused bad data placement on disks, affecting
further read performance.
This change switches zio_timestamp_compare() from comparing uninitialized
io_offset to really populated io_bookmark values. I haven't decided yet
what to do with timestampts, but on simple tests this change gives the
same peformance results by just making code to work as declared.
vangyzen [Thu, 15 Dec 2016 01:45:31 +0000 (01:45 +0000)]
MFC r309460
thr_set_name(): silently truncate the given name as needed
Instead of failing with ENAMETOOLONG, which is swallowed by
pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1
bytes. This is more likely what the user wants, and saves the
caller from truncating it before the call (which was the only
recourse).
The man page changes were not merged because thr_set_name.2
does not exist on stable/10.
dim [Wed, 14 Dec 2016 17:27:44 +0000 (17:27 +0000)]
Merge r309860 from stable/9, as this also applies to stable/10:
Fix libllvmanalysis build failure after r309857: on stable/9, llvm is
compiled by gcc, and without -std=c++11, so the nullptr keyword is
unknown. Use the old-school plain zero syntax instead.