Brooks Davis [Fri, 25 May 2018 20:40:23 +0000 (20:40 +0000)]
Make vadvise compat freebsd11.
The vadvise syscall (aka ovadvise) is undocumented and has always been
implmented as returning EINVAL. Put the syscall under COMPAT11 and
provide a userspace implementation.
Mark Felder [Fri, 25 May 2018 19:36:26 +0000 (19:36 +0000)]
rc.subr: Support loading environmental variables from a file
The current support for setting environment via foo_env="" in rc.conf is
not scalable and does not handle envs with spaces in the value. It seems
a common pattern for some newer software is to skip configuration files
altogether and rely on the env. This is well supported in systemd unit
files and may be the inspiration for this trend.
Ilya Bakulin [Fri, 25 May 2018 19:00:28 +0000 (19:00 +0000)]
Fix building GENERIC-MMCCAM on arm64
Since GENERIC includes quite a few drivers now, all MMC drivers should have
appropriate MMCCAM-related ifdefs and include opt_mmccam.h so that
those ifdefs are actually processed correctly.
Reviewed by: manu
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15569
Ed Maste [Fri, 25 May 2018 17:31:43 +0000 (17:31 +0000)]
if_muge: Use lock assertion instead of broken locking in lan78xx_chip_init
Previously lan78xx_chip_init locked the driver's mutex if not already
locked, but unlocked it only in the case of error. This provided a
fairly clear indication that the function is already called with the
lock held, so just replace it with a lock assertion.
In particular, stop using pmap_pte() to read non-promoted pte while
walking the page table. pmap_pte() needs to shoot down the kernel
mapping globally which causes IPI broadcast. Since
pmap_extract_and_hold() is used for slow copyin(9), it is very
significant hit for the 4/4 kernels.
Instead, create single purpose per-processor page frame and use it to
locally map page table page inside the critical section, to avoid
reuse of the frame by other thread if context switched.
Measurement demostrated very significant improvements in any load that
utilizes copyin/copyout.
Found and benchmarked by: bde
Sponsored by: The FreeBSD Foundation
Roger Pau Monné [Fri, 25 May 2018 08:44:00 +0000 (08:44 +0000)]
xen: remove dead code from gnttab.h
This code was left over when it was imported from Linux. The original
committer thought that those functions would be implemented, so the
prototypes where left in place. Delete them at once.
Andriy Gapon [Fri, 25 May 2018 07:33:20 +0000 (07:33 +0000)]
re-synchronize TSC-s on SMP systems after resume, if necessary
The TSC-s are checked and synchronized only if they were good
originally. That is, invariant, synchronized, etc.
This is necessary on an AMD-based system where after a wakeup from STR I
see that BSP clock differs from AP clocks by a count that roughly
corresponds to one second. The APs are in sync with each other. Not
sure if this is a hardware quirk or a firmware bug.
This is what I see after a resume with this change:
SMP: passed TSC synchronization test after adjustment
acpi_timer0: restoring timecounter, ACPI-fast -> TSC-low
Andriy Gapon [Fri, 25 May 2018 07:29:52 +0000 (07:29 +0000)]
fix zfs_getpages crash when called from sendfile, followup to r329363
It turns out that sendfile_swapin() has an optimization where it may
insert pointers to bogus_page into the page array that it passes to
VOP_GETPAGES. That happens to work with buffer cache, because it
extensively uses bogus_page internally, so it has the necessary checks.
However, ZFS did not expect bogus_page as VOP_GETPAGES(9) does not
document such a (ab)use of bogus_page.
So, this commit adds checks and handling of bogus_page.
I expect that use of bogus_page with VOP_GETPAGES will get documented
sooner rather than later.
Reported by: Andrew Reilly <areilly@bigpond.net.au>, delphij
Tested by: Andrew Reilly <areilly@bigpond.net.au>
Requested by: many
MFC after: 1 week
Alexander Motin [Fri, 25 May 2018 03:34:33 +0000 (03:34 +0000)]
Refactor NVMe CAM integration.
- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
- Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
- Add initial support for namespace change async events. So far only
in CAM mode, but it allows run-time namespace arrival and departure.
- Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.
Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.
Marcelo Araujo [Fri, 25 May 2018 02:07:05 +0000 (02:07 +0000)]
Fix a memory leak on topology_parse().
strdup(3) allocates memory for a copy of the string, does the copy and
returns a pointer to it. If there is no sufficient memory NULL is returned
and the global errno is set to ENOMEM.
We do a sanity check to see if it was possible to allocate enough memory.
Also as we allocate memory, we need to free this memory used. Or it will
going out of scope leaks the storage it points to.
Adrian Chadd [Fri, 25 May 2018 01:27:39 +0000 (01:27 +0000)]
[ath_hal] migrate the shared HAL_RESET_* pieces out into ath_hal.
I'm in the process of reworking how the reset path works with an eye
to better recovery when the chips hang and/or go RF/PHY deaf.
This is the first step in a lot of unification and API changes.
Warner Losh [Thu, 24 May 2018 23:20:10 +0000 (23:20 +0000)]
Protect bzero call against macro expansion
Shortly, we'll be moving to defining bzero and memset in terms of
__builting_memset. To do that, we can't have macro calls to bzero in
the fallback impelmentation of memset. Normal calls to bzero are fine.
All 4 architectures that use this have their own copies of bzero, so
there's no mutual recursion issue between memset and bcopy.
Olivier Houchard [Thu, 24 May 2018 21:38:18 +0000 (21:38 +0000)]
Import CK as of commit 0f017230ccc86929f56bf44ef2dca93d7df8076b.
This brings us the renaming of fields in ck_queue, so that our own
LIST/SLIST/TAILQ/etc won't accidentally work with them.
Olivier Houchard [Thu, 24 May 2018 21:35:52 +0000 (21:35 +0000)]
Import CK as of commit 0f017230ccc86929f56bf44ef2dca93d7df8076b.
This brings us the renaming of fields in ck_queue, so that our own
LIST/SLIST/TAILQ/etc won't accidentally work with them.
Matt Macy [Thu, 24 May 2018 21:22:03 +0000 (21:22 +0000)]
libpmcstat: Don't build pmu tables on !amd64 until the corresponding
util routines have been written and tested. Currently building them
breaks the build on power64
Matt Macy [Thu, 24 May 2018 21:13:46 +0000 (21:13 +0000)]
AF_UNIX: It is possible for UNIX datagram sockets to be connected
to themselves. The updated code assumed that that could not happen
and would try to lock the unp mutex twice.
There may be a lingering issue here but this fixes it for the
reporter.
PR: 228458
Reported by: marieheleneka at gmail.com
Warner Losh [Thu, 24 May 2018 21:11:33 +0000 (21:11 +0000)]
Make memmove and bcopy share code
Make memmove the primary interface, but have bcopy be an alternative
entry point that jumps into memmove. This will slightly pessimize
bcopy calls, but those are about to get much rarer. Return dst always,
but it will be ignored by bcopy callers. We can remove just the alt
entry point if we ever remove bcopy entirely.
Warner Losh [Thu, 24 May 2018 21:11:28 +0000 (21:11 +0000)]
Define memmove and make bcopy alt entry point
Make a memmove entry point just before bcopy and have it swap its args
before continuing into the body of bcopy. Adjust the returns to return
dst (original %o0 swapped to %o1) from both entry points. bcopy users
will ignore them. Since these are in the branch delay slot, it should
take no additional time. I use %o6 for this rather than just move %o1
back to %o2 at the end since my sparc64 assembler knowledge is weak.
Also eliminate wrapper call from memmove to bcopy.
Warner Losh [Thu, 24 May 2018 21:11:24 +0000 (21:11 +0000)]
Make memmove an alias for memcpy
memcpy was an alias for bcopy with arg swap. This code handles
overlapping copies, so making memmove an alias is safe. We can
eliminate the call from libkern's memmove to this bcopy as a result.
Bryan Drewery [Thu, 24 May 2018 18:49:19 +0000 (18:49 +0000)]
rescue: Restore 'make depend' call to fix WITH_META_MODE after r334008.
The rescue/crunchgen build avoids linking binaries for the objects it is
building by doing 'make foo.o bar.o' rather than 'make all'. This breaks the
implicit 'beforebuild: depend' dependency which ensured that all source files
were generated and up-to-date before building the object files. This
manifested as a WITH_META_MODE build problem for bin/sh in the rescue build
with syntax.{c,h} not properly being regenerated or recognized as changed in
the dependency graph.
Conrad Meyer [Thu, 24 May 2018 17:06:00 +0000 (17:06 +0000)]
Yank crufty INTR_FILTER option
It was introduced to the tree in r169320 and r169321 in May 2007.
It never got much use and never became a kernel default. The code
duplicates the default path quite a bit, with slight modifications. Just
yank out the cruft. Whatever goals were being aimed for can probably be met
within the existing framework, without a flag day option.
Ed Maste [Thu, 24 May 2018 16:34:06 +0000 (16:34 +0000)]
if_muge: Add LAN78xx family USB ids but attach only to LAN7800
This driver was developed for the LAN7800 and the register-compatible
LAN7515 and has only been tested on those devices. Adding support for
other members of the family should be straightforward, once we have
devices to test.
With this change the driver will probe but fail to attach due to the
Chip ID check, but will leave a hint leading to the driver that needs
work.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15537
Warner Losh [Thu, 24 May 2018 16:31:18 +0000 (16:31 +0000)]
We can't release the refcount outside of the periph lock.
We're dropping the periph lock then dropping the refcount. However,
that violates the locking protocol and is racy. This seems to be
the cause of weird occasional panics with a bogus assert.
Brooks Davis [Thu, 24 May 2018 16:25:18 +0000 (16:25 +0000)]
Avoid two suword() calls per auxarg entry.
Instead, construct an auxargs array and copy it out all at once.
Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent
the array. This is the correct type where pairs of words just happend
to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act
on this array rather than introducing a new macro.
Return errors on copyout() and suword() failures and handle them in the
caller.
Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64
which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32
(now removed due to the use of proper types).
Andrew Turner [Thu, 24 May 2018 14:55:50 +0000 (14:55 +0000)]
Exclude memory from the /reserved-memory mappings with the no-map property
set. This memory must not be mapped by the operating system other than
under control of the device driver.
Obtained from: ABT Systems Ltd
Sponsored by: Turing Robotic Industries
Mark Johnston [Thu, 24 May 2018 14:16:22 +0000 (14:16 +0000)]
Split the active and inactive queue scans into separate subroutines.
The scans are largely independent, so this helps make the code
marginally neater, and makes it easier to incorporate feedback from the
active queue scan into the page daemon control loop.
Improve some comments while here. No functional change intended.
Roger Pau Monné [Thu, 24 May 2018 10:19:54 +0000 (10:19 +0000)]
xen-blkback: don't unbind the interrupt while holding the lock
There's no need to perform the interrupt unbind while holding the
blkback lock, and doing so leads to the following LOR:
lock order reversal: (sleepable after non-sleepable)
1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423
2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224
stack backtrace:
#0 0xffffffff80bdd993 at witness_debugger+0x73
#1 0xffffffff80bdd814 at witness_checkorder+0xe34
#2 0xffffffff80b7d798 at _sx_xlock+0x68
#3 0xffffffff811b3913 at intr_remove_handler+0x43
#4 0xffffffff811c63ef at xen_intr_unbind+0x10f
#5 0xffffffff80a12ecf at xbb_disconnect+0x2f
#6 0xffffffff80a12e54 at xbb_shutdown+0x1e4
#7 0xffffffff80a10be4 at xbb_frontend_changed+0x54
#8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14
#9 0xffffffff80a2a382 at xenwatch_thread+0x182
#10 0xffffffff80b34164 at fork_exit+0x84
#11 0xffffffff8101ec9e at fork_trampoline+0xe
Reported by: Nathan Friess <nathan.friess@gmail.com>
Sponsored by: Citrix Systems R&D
Roger Pau Monné [Thu, 24 May 2018 10:18:31 +0000 (10:18 +0000)]
dev/xenstore: prevent transaction hijacking
The user-space xenstore device is currently lacking a check to make
sure that the caller is only using transaction ids currently assigned
to it. This allows users of the xenstore device to hijack transactions
not started by them, although the scope is limited to transactions
started by the same domain.
Tested by: Nathan Friess <nathan.friess@gmail.com>
Sponsored by: Citrix Systems R&D
Roger Pau Monné [Thu, 24 May 2018 10:17:49 +0000 (10:17 +0000)]
dev/xenstore: add support for watches
Allow user-space applications to register watches using the xenstore
device. This is needed in order to run toolstack operations on
domains different than the one where xenstore is running (in which
case the device is not used, since the connection to xenstore is done
using a plain socket).
Tested by: Nathan Friess <nathan.friess@gmail.com>
Sponsored by: Citrix Systems R&D
Roger Pau Monné [Thu, 24 May 2018 10:17:03 +0000 (10:17 +0000)]
xenstore: don't wait with the PCATCH flag
Due to the current synchronous xenstore implementation in FreeBSD, we
cannot return from xs_read_reply without reading a reply, or else the
ring gets out of sync and the next request will read the previous
reply and crash due to the type mismatch. A proper solution involves
making use of the req_id field in the message and allowing multiple
in-flight messages at the same time on the ring.
Remove the PCATCH flag so that signals don't interrupt the wait.
Tested by: Nathan Friess <nathan.friess@gmail.com>
Sponsored by: Citrix Systems R&D
Roger Pau Monné [Thu, 24 May 2018 10:16:11 +0000 (10:16 +0000)]
xenstore: remove the suspend sx lock
There's no need to prevent suspend while doing xenstore transactions,
callers of transactions are supposed to be prepared for a transaction
to fail.
This fixes a bug that could be triggered from the xenstore user-space
device, since starting a transaction from user-space would result in
returning there with a sx lock held, that causes a WITNESS check to
trigger.
Tested by: Nathan Friess <nathan.friess@gmail.com>
Sponsored by: Citrix Systems R&D
x86: stop unconditionally clearing PSL_T on the trace trap.
We certainly should clear PSL_T when calling the SIGTRAP signal
handler, which is already done by all x86 sendsig(9) ABI code. On the
other hand, there is no obvious reason why PSL_T needs to be cleared
when returning from the signal handler. For instance, Linux allows
userspace to set PSL_T and keep tracing enabled for the desired
period. There are userspace programs which would use PSL_T if we make
it possible, for instance sbcl.
Remember if PSL_T was set by PT_STEP or PT_SETSTEP by mean of TDB_STEP
flag, and only clear it when the flag is set.
Discussed with: Ali Mashtizadeh
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D15054
Matt Macy [Wed, 23 May 2018 20:50:09 +0000 (20:50 +0000)]
udp: assign flowid to udp sockets round-robin
On a 2x8x2 SKL this increases measured throughput with 64
netperf -H $DUT -t UDP_STREAM -- -m 1
from 590kpps to 1.1Mpps
before:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender.svg
after:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg