https://www.illumos.org/issues/3871
GCC 4.5.3 on Gentoo Linux did not like a few of the changes made in the issue
3604 patch. It printed an error and a couple of warnings:
../../cmd/zdb/zdb.c: In function 'dump_bpobj':
../../cmd/zdb/zdb.c:1257:3: error: 'for' loop initial declarations are only
allowed in C99 mode
../../cmd/zdb/zdb.c:1257:3: note: use option -std=c99 or -std=gnu99 to compile
your code
../../cmd/zdb/zdb.c: In function 'dump_deadlist':
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
../../cmd/zdb/zdb.c:1323:8: warning: too many arguments for format
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Richard Yao <ryao@gentoo.org>
https://www.illumos.org/issues/6866
Need this for #6865.
To be generally more scripting-friendly, overload this issue with adding '-q'
option which should skip printing any label information.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
https://www.illumos.org/issues/7280
zdb is very handy for diagnosing problems with a pool in a safe and
quick way. When a pool is in a bad shape, we often want to disable some
fail-safes, or adjust some tunables in order to open them. In the
kernel, this is done by changing public variables in mdb. The goal of
this feature is to add the same capability to zdb and ztest, so that
they can change libzpool tuneables from the command line.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Michael Zhilin [Mon, 2 Oct 2017 20:33:16 +0000 (20:33 +0000)]
[libthr] revert change of visibility of _thread_keytable to unbreak debugger
Fix regression by r318539. The sysutils/pstack uses library libthread_db to
read information about threads state. The function pt_ta_new makes lookup of
several key symbols including _thread_keytable. But r318539 mades this field
static. It causes silent ignore of libthr library by pstack and as result
sysutils/pstack doesn't output any thread information.
Emmanuel Vadot [Mon, 2 Oct 2017 19:17:09 +0000 (19:17 +0000)]
Allwinner H3 CCU: Fix build on ARM64
ccu_h3.c is also used on ARM64 as it provides clocks for the H5 SoC.
Since ARM64 doesn't have sys/gun/dts/include in it's include path, use
the full name for the sun8i-h3-ccu.h include.
Michael Tuexen [Mon, 2 Oct 2017 18:25:30 +0000 (18:25 +0000)]
Fix a bug which avoided that rules for matching port numbers for SCTP
packets where actually matched.
While there, make clean in the man-page that SCTP port numbers are
supported in rules.
Eugene Grosbein [Mon, 2 Oct 2017 16:33:04 +0000 (16:33 +0000)]
rsh: introduce new option -N disabling shutdown of socket sending path.
This prevents premature disconnection of rsh session with protocol
implementation confused by "end-of-file" condition for standard
input stream. For example, modern Cisco IOS (15.x) versions
can be managed with "rsh -N" cron jobs having /dev/null as stdin.
Emmanuel Vadot [Mon, 2 Oct 2017 16:21:20 +0000 (16:21 +0000)]
Allwinner A31 ccu: Use clock/reset IDs from dt-bindings
Do not redefines resets and clocks ID which are already in the
dt-bindings include directory. Those files are under dual licenced
under GPL2/MIT so use them directly.
Emmanuel Vadot [Mon, 2 Oct 2017 16:12:06 +0000 (16:12 +0000)]
Allwinner A64 ccu: Use clock/reset IDs from dt-bindings
Do not redefines resets and clocks ID which are already in the
dt-bindings include directory. Those files are under dual licenced
under GPL2/MIT so use them directly.
Emmanuel Vadot [Mon, 2 Oct 2017 15:48:39 +0000 (15:48 +0000)]
Allwinner H3 ccu: Use clock/reset IDs from dt-bindings
Do not redefines resets and clocks ID which are already in the
dt-bindings include directory. Those files are under dual licence
GPL2/MIT so use them directly.
Allan Jude [Mon, 2 Oct 2017 14:19:31 +0000 (14:19 +0000)]
bsdinstall(8) hardening menu: Utilize new kern.randompid=1 behaviour
Enabling the PID randomization option in bsdinstall(8)'s hardening menu
now randomizes the effective value of kern.randompid on each boot.
Previous behaviour:
When kern.randompid was enabled via the the bsdinstall(8) hardening menu,
a random value was generated and placed in the systems /etc/sysctl.conf as
kern.randompid=value
This makes the value of kern.randompid static across reboots.
New behaviour:
When kern.randompid is enabled via the bsdinstall(8) hardening menu, the
line kern.randompid=1 is placed in the systems /etc/sysctl.conf.
This takes advantage of a new kernel feature and makes the value of
kern.randompid be randomized by the kernel on each reboot.
Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com>
Reviewed by: des
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12433
https://www.illumos.org/issues/8600
ZFS channel programs should be able to create snapshots.
In addition to the base snapshot functionality, this will likely entail adding
extra logic to handle edge cases which were formerly not possible, such as
creating then destroying a snapshot in the same transaction sync.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
https://www.illumos.org/issues/8592
ZFS channel programs should be able to perform a rollback. This logic will
probably look pretty similar to zfs.sync.destroy().
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brad Lewis <brad.lewis@delphix.com>
https://www.illumos.org/issues/8604
Every time we want to unmount a snapshot (happens during snapshot deletion or
renaming) we unnecessarily iterate through all the mountpoints in the VFS layer
(see zfs_get_vfs).
Ideally we would just put a hold on the snapshot and access its respective VFS
resource directly.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
FreeBSD note: I added a FreeBSD specific function getzfsvfs_ref() which
is like getzfsvfs() but returns a filesystem referenced, not busied.
We want a busied filesystem in most cases, because we access its private
data and, thus, we need to prevent the filesystem from being unmounted
and its private data destroyed. But in some cases we can either get
away with just a referenced filesystem or we must not busy the
filesystem. Unmounting the filesystem is one of such cases.
Alan Cox [Mon, 2 Oct 2017 07:14:32 +0000 (07:14 +0000)]
When mdstart_swap() accesses a page that is already in the active queue,
mark the page as referenced rather than calling vm_page_activate(). This
allows the page's act_count to grow beyond ACT_INIT and better reflect
its usage. (See also r324146, which modified a function used by tmpfs,
uiomove_object_page(), to behave in the same way.)
Ian Lepore [Mon, 2 Oct 2017 02:58:28 +0000 (02:58 +0000)]
Define a single instance of ahci_devclass and reference it from all the
attachment code for various SOCs and busses. Remove all the static and
should-have-been-static and named-differently instances of it.
This should eliminate the recently-grown build warnings about multiple
definitions when building arm kernels.
Ian Lepore [Mon, 2 Oct 2017 01:03:18 +0000 (01:03 +0000)]
Enhance the interrupt capabilities of ti_pruss driver.
The existing ti_pruss driver for the PRUSS Hardware provided by the AM335x
ARM CPU has basic interrupt capabilities. This updated driver provides some
more options:
- Sysctl based configuration for the interrupts (for some examples, see the
test plan in the phabricator review cited below).
- A device file (/dev/pruss0.irqN) for each enabled interrupt. This file
can be read and the device blocks if no irq has happened or returns an
uint64_t timestamp based on nanouptime().
- Each interrupt device file provides kqueue-based event notification,
blocking read(), or select().
Submitted by: Manuel Stuhn <freebsdnewbie@freenet.de>
Differential Revision: https://reviews.freebsd.org/D11959
Ian Lepore [Mon, 2 Oct 2017 00:49:33 +0000 (00:49 +0000)]
Allow Raspberry Pi platform and drivers to be configured with upstream DTBs.
- Added more compatibility strings to drivers not yet converted
- Added new RPI platform code compatibility string to match the ones used
upstream
- Adapted RPI and RPI2 DTS to match the new platform code compatibility
string
The goal is to use the upstream DTBs as a replacement for our custom one.
This is now possible with these changes.
Additionally, as the RPI firmware automatically chooses the right DTB for
us, this would allow to have one common armv6 kernel for RPI0 and RPI1
(BCM2835-based), and one common armv7 kernel for RPI2 v1.1 (BCM2836-based),
and RPI2 v1.2 / RPI3 (BCM2837-based).
Patrick Kelsey [Sun, 1 Oct 2017 23:37:17 +0000 (23:37 +0000)]
The soisconnected() call removed from syncache_socket() in r307966 was
not extraneous in the TCP Fast Open (TFO) passive-open case. In the
TFO passive-open case, syncache_socket() is being called during
processing of a TFO SYN bearing a valid cookie, and a call to
soisconnected() is required in order to allow the application to
immediately consume any data delivered in the SYN and to have a chance
to generate response data to accompany the SYN-ACK. The removal of
this call to soisconnected() effectively converted all TFO passive
opens to having the same RTT cost as a standard 3WHS.
This commit adds a call to soisconnected() to syncache_tfo_expand() so
that it is only in the TFO passive-open path, thereby restoring TFO
passve-open RTT performance and preserving the non-TFO connection-rate
performance gains realized by r307966.
Julien Charbon [Sun, 1 Oct 2017 21:20:28 +0000 (21:20 +0000)]
Fix an infinite loop in tcp_tw_2msl_scan() when an INP_TIMEWAIT inp has
been destroyed before its tcptw with INVARIANTS undefined.
This is a symmetric change of r307551:
A INP_TIMEWAIT inp should not be destroyed before its tcptw, and INVARIANTS
will catch this case. If INVARIANTS is undefined it will emit a log(LOG_ERR)
and avoid a hard to debug infinite loop in tcp_tw_2msl_scan().
Andrew Turner [Sun, 1 Oct 2017 19:52:47 +0000 (19:52 +0000)]
To prepare for adding EFI runtime services support on arm64 move the
machine independent parts of the existing code to a new file that can be
shared between amd64 and arm64.
https://www.illumos.org/issues/8605
zfs.exists() in channel programs doesn't return any result, and should have a
man page entry.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Chris Williamson <chris.williamson@delphix.com>
Ian Lepore [Sun, 1 Oct 2017 16:48:36 +0000 (16:48 +0000)]
Work around bcm283x silicon bugs to make i2c repeat-start work for the most
common case where it's needed -- a write followed by a read to the same slave.
The i2c controller in this chip only performs complete transfers, it does
not provide control over start/repeat-start/stop operations on the bus.
Thus, we have gotten a full stop/start sequence rather than a repeat-start
when doing a typical i2c slave access of "write address, read data". Some
i2c slave devices require a repeat-start to work correctly.
These changes cause the controller to do a repeat-start by pre-staging the
read parameters in the controller registers immediate after the controller
has latched the values for the initial write operation, but before any
bytes are actually written. With the values pre-staged, when the write
portion of the transfer completes, the state machine in the silicon sees
a new start operation already staged and that causes it to perform a
repeat-start. The key to tricking the buggy hardware into doing this is
to avoid prefilling any output data in the transmit FIFO so that it is
possible to catch the silicon in the state where transmit values are
latched but the transmit isn't completed yet.
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
https://www.illumos.org/issues/7431
ZFS channel programs (ZCP) adds support for performing compound ZFS
administrative actions via Lua scripts in a sandboxed environment (with time
and memory limits).
This initial commit includes both base support for running ZCP scripts, and a
small initial library of API calls which support getting properties and
listing, destroying, and promoting datasets.
Testing: in addition to the included unit tests, channel programs have been in
use at Delphix for several months for batch destroying filesystems. The
dsl_destroy_snaps_nvl() call has also been replaced with
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Chris Williamson <chris.williamson@delphix.com>
https://www.illumos.org/issues/8552
In the LUA interpreter used by "zfs program", the lua format() function
accidentally includes support for '%f' and friends, which can cause compilation
problems when building on platforms that don't support floating-point math in
the kernel (e.g. sparc). Support for '%f' friends (%f %e %E %g %G) should be
removed, since there's no way to supply a floating-point value anyway (all
numbers in ZFS LUA are int64_t's).
Reviewed by: Yuri Pankov <yuripv@gmx.com>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
https://www.illumos.org/issues/8590
In dsl_destroy_snapshots_nvl(), "snaps_normalized" is not freed after it is
added to "arg".
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
FreeBSD notes:
- zfs-program.8 manual page is taken almost as is from the vendor repository,
no FreeBSD-ification done
- fixed multiple instances of NULL being used where an integer is expected
- replaced ETIME and ECHRNG with ETIMEDOUT and EDOM respectively
This commit adds a modified version of Lua 5.2.4 under
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua, mirroring the
upstream. See README.zfs in that directory for the description of Lua
customizations.
See zfs-program.8 on how to use the new feature.
Use make_dev_s(9) to create device, since the device ioctl interface
needs to access si_drv1 to get softc pointer.
Remove the common but not functional attempt to prevent parallel
accesses by file descriptors by blocking more than one open. Either
threads in one process, or forked siblings, or file descriptors passed
over unix domain sockets all allow to execute parallel requests once
one fd is opened. Since ioctl handler uses smbus_request_bus() to
take the bus ownership, the correct mechanism establishes exclusive
access already.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Martin Matuska [Sun, 1 Oct 2017 00:40:23 +0000 (00:40 +0000)]
MFV r324145,324147:
Sync libarchive with vendor.
Relevant vendor changes:
PR #905: Support for Zstandard read and write filters
PR #922: Avoid overflow when reading corrupt cpio archive
Issue #935: heap-based buffer overflow in xml_data (CVE-2017-14166)
OSS-Fuzz 2936: Place a limit on the mtree line length
OSS-Fuzz 2394: Ensure that the ZIP AES extension header is large enough
OSS-Fuzz 573: Read off-by-one error in RAR archives (CVE-2017-14502)
Mark Johnston [Sat, 30 Sep 2017 23:41:28 +0000 (23:41 +0000)]
Have uiomove_object_page() keep accessed pages in the active queue.
Previously, uiomove_object_page() would maintain LRU by requeuing the
accessed page. This involves acquiring one of the heavily contended page
queue locks. Moreover, it is unnecessarily expensive for pages in the
active queue.
As of r254304 the page daemon continually performs a slow scan of the
active queue, with the effect that unreferenced pages are gradually
moved to the inactive queue, from which they can be reclaimed. Prior to
that revision, the active queue was scanned only during shortages of
free and inactive pages, meaning that unreferenced pages could get
"stuck" in the queue. Thus, tmpfs was required to use the inactive queue
and requeue pages in order to maintain LRU. Now that this is no longer
the case, tmpfs I/O operations can use the active queue and avoid the
page queue locks in most cases, instead setting PGA_REFERENCED on
referenced pages to provide pseudo-LRU.
Relevant vendor changes:
PR #905: Support for Zstandard read and write filters
PR #922: Avoid overflow when reading corrupt cpio archive
Issue #935: heap-based buffer overflow in xml_data (CVE-2017-14166)
OSS-Fuzz 2936: Place a limit on the mtree line length
OSS-Fuzz 2394: Ensure that the ZIP AES extension header is large enough
OSS-Fuzz 573: Read off-by-one error in RAR archives (CVE-2017-14502)
Michael Tuexen [Sat, 30 Sep 2017 11:40:18 +0000 (11:40 +0000)]
* Update function definitions.
* Ensure that the datalen always describes the length after the IPv6
header consistently, not matter which protocol us used for probes..
* Document that the default length is 20, not 12.
* Don't send inormation in probe packets which is not needed or
even checked when the responses are processed.
* Address CID 978587.
This is mainly a cleanup preparing the addition of SCTP and TCP
as possible probe packet protocols.
Mention new -n flag.
Remove optional -h from the operation list lines, -h would cause the
utility to exit without performing the action.
Explain the default path behavior, list default path.
Correct example of update performed from the non-default path,
it needs -n and the trailing slash is redundand.
Remove useless BUGS section.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Allow to disable default microcode updates search path with the new
'-n' option.
Look for updates in the default locations only after user-supplied
locations are tried.
If newer microcode files are put into non-standard path, both measures
allow to avoid situation where older update loaded from the default
path first, and then the second update is applied from non-standard
path. Applying intermediate updates might be undesirable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Rick Macklem [Fri, 29 Sep 2017 23:13:01 +0000 (23:13 +0000)]
Add support for Flex File Layout to the pNFS client structures.
This patch modifies the pNFS client layout and deviceinfo structures
to add fields and unions for the Flex File Layout. Until a future
commit adds Flex File layout support, these new fields are not used.
This patch should not affect the "pnfs" option for File Layout.
Ian Lepore [Fri, 29 Sep 2017 22:13:26 +0000 (22:13 +0000)]
Enhance mdmfs(8) to work with tmpfs(5).
Existing scripts and associated config such as rc.initdiskless, rc.d/var,
and others, use mdmfs to create memory filesystems. That program accepts a
size argument which allows SI suffixes and treats an unsuffixed number as a
count of 512 byte sectors. That makes it difficult to convert existing
scripts to use tmpfs instead of mdmfs, because tmpfs treats unsuffixed
numbers as a count of bytes. The script logic to deal with existing user
config that might include suffixed and unsuffixed numbers is... unpleasant.
Also, there is no g'tee that tmpfs will be available. It is sometimes
configured out of small-resource embedded systems to save memory and flash
storage space.
These changes enhance mdmfs(8) so that it accepts two new values for the
'md-device' arg: 'tmpfs' and 'auto'. With tmpfs, the program always uses
tmpfs(5) (and fails if it's not available). With 'auto' the program prefers
tmpfs, but falls back to using md(4) if tmpfs isn't available. It also
handles the -s <size> argument so that the mdconfig interpetation of
unsuffixed numbers applies when tmpfs is used as well, so that existing user
config keeps working after a switch to tmpfs.
A new rc setting, mfs_type, is added to etc/defaults/rc.conf to let users
force the use of tmpfs or md; the default value is "auto".
The GCC xmmintrin.h header brokenly includes mm_malloc.h unconditionally.
(The Clang version of xmmintrin.h only includes mm_malloc.h if not compiling
in standalone mode.)
Hack around GCC's broken header by defining the include guard macro ahead of
including xmmintrin.h.
__setrunelocale: Fix asprintf(3) failure not returning an error.
Also fix the style of the asprintf(3) call in __collate_load_tables_l().
Both of these lines were modified away from snprintf(3) during the
import from DragonFly/Illumos.
smb_strdupin() tried to roll a copyin() based strlen to allocate a buffer
and then blindly copyin that size. Of course, a malicious user program
could simultaneously manipulate the buffer, resulting in a non-terminated
string being copied.
Later assumptions in the code rely upon the string being nul-terminated.
Just use copyinstr() and drop the racy sizing.
PR: 222687
Reported by: Meng Xu <meng.xu AT gatech.edu>
Security: possible local DoS
Sponsored by: Dell EMC Isilon
* check mbuf length before doing mtod() and accessing to IP header;
* update oip pointer and all depending pointers after m_pullup();
* remove extra checks and extra parentheses, wrap long lines;
Scott Long [Fri, 29 Sep 2017 04:52:15 +0000 (04:52 +0000)]
Convert sysctl sbuf usage to use a fully dynaic sbuf. This is strictly
needed, but it silences an erroneous Coverity warning and makes the code a
little more logically consistent. Also mark the sysctl as MPSAFE.
Rick Macklem [Thu, 28 Sep 2017 23:05:08 +0000 (23:05 +0000)]
Add the NFS client state flag that enables Flexible File Layout.
This patch adds a NFSSTA_FLEXFILE flag that will be used to enable
Flexible File Layout for the NFSv4.1 pNFS client. It is not yet
used, but will be after a future commit adds Flex File Layout support.
Rick Macklem [Thu, 28 Sep 2017 22:33:01 +0000 (22:33 +0000)]
Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage.
This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use
a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to
avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa()
call to after the completion of parsing of the DeviceInfo reply from
the server. This patch is in preparation for addition of Flex File
Layout support in a future commit.
It only affects the "pnfs" NFSv4.1 client mount option and should not
have changed its semantics.
Nick Hibma [Thu, 28 Sep 2017 19:57:46 +0000 (19:57 +0000)]
Make this compile if NO_SYSCTL_DESCR is defined.
Defining a variable with the description and then only use it in the
SYSCTL declaration led to an unused variable warning. In the SYSCTL the
passed value is discarded using __DESCR.
Alan Cox [Thu, 28 Sep 2017 17:55:41 +0000 (17:55 +0000)]
Optimize vm_object_page_remove() by eliminating pointless calls to
pmap_remove_all(). If the object to which a page belongs has no
references, then that page cannot possibly be mapped.
Split the handlers for pop of invalid selectors from the trap frame
into usermode and kernel variants. Usermode handler is kept as is, it
restores the already loaded parts of the trap frame and jumps to set
up a signal delivery to the user process.
New kernel part of the handler emulates IRET treatment of the segments
which would violate access right. It loads NUL selector in the
segment register which load causes the fault, and then continues the
return to interrupted kernel code. Since invalid selectors in the
segment registers in the kernel mode can only exist while kernel still
enters or exits from userspace, we only zero invalid userspace
selectors. If userspace tries to use the segment register, it gets a
signal, as if the processor segment descriptor cache was reloaded.
Reported by: Maxime Villard <max@m00nbsd.net>
Suggested and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Do not return from interrupt using the POP_FRAME;iret instruction
sequence, always jump to doreti.
The user segments selectors saved on the stack might become invalid
because userspace manipulated LDT in a parallel thread. trap() is
aware of such issue, but it is only prepared to handle it at iret and
segment registers load operations in doreti path.
Also remove POP_FRAME macro because it is no longer used.
Reviewed by: bde, jhb (as part of r323722)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Revert r323722. A better fix will be committed shortly, as well as
some still useful bits of the reverted revision.
The problem with the committed fix is that there are still issues with
returning from NMI, when NMI interrupted kernel in a moment where the
kernel segments selectors were still not loaded into registers. If
this happens, the NMI return would loose the userspace selectors
because r323722 does not reload segment registers on return to kernel
mode.
Fixing the problem is complicated. Since an alternative approach to
handle the original bug exists, it makes sence to stop adding more
complexity.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Warner Losh [Thu, 28 Sep 2017 01:27:00 +0000 (01:27 +0000)]
Tweak performance of nda completions
Use xpt_done_direct in preference to xpt_done when completing a
successful I/O. Continue to use xpt_done when there's an error, or for
completion of the submission of a CCB. This eliminates a context
switch to the cam_doneq thread.
Rick Macklem [Wed, 27 Sep 2017 23:23:41 +0000 (23:23 +0000)]
Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.
John Baldwin [Wed, 27 Sep 2017 23:15:33 +0000 (23:15 +0000)]
Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.
bhnd: Add support for supplying bus I/O callbacks when initializing an EROM
parser.
This allows us to use the EROM parser API in cases where the standard bus
space I/O APIs are unsuitable. In particular, this will allow us to parse
the device enumeration table directly from bhndb(4) drivers, prior to
full attach and configuration of the bridge.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12510
Add bhnd(4) API for explicitly registering BHND platform devices (ChipCommon,
PMU, NVRAM, etc) with the bus, rather than walking the newbus hierarchy to
discover platform devices. These devices are now also refcounted; attempting
to deregister an actively used platform device will return EBUSY.
This resolves a lock ordering incompatibility with bwn(4)'s firmware loading
threads; previously it was necessary to acquire Giant to protect newbus access
when locating and querying the NVRAM device.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12392
Warner Losh [Wed, 27 Sep 2017 19:22:10 +0000 (19:22 +0000)]
Since the human readable name is actually ignored, and not matching a
'human' pnp string, change it to #, the name reserved for fields that
are ignored.