Martin Matuska [Sun, 1 Oct 2017 00:40:23 +0000 (00:40 +0000)]
MFV r324145,324147:
Sync libarchive with vendor.
Relevant vendor changes:
PR #905: Support for Zstandard read and write filters
PR #922: Avoid overflow when reading corrupt cpio archive
Issue #935: heap-based buffer overflow in xml_data (CVE-2017-14166)
OSS-Fuzz 2936: Place a limit on the mtree line length
OSS-Fuzz 2394: Ensure that the ZIP AES extension header is large enough
OSS-Fuzz 573: Read off-by-one error in RAR archives (CVE-2017-14502)
Mark Johnston [Sat, 30 Sep 2017 23:41:28 +0000 (23:41 +0000)]
Have uiomove_object_page() keep accessed pages in the active queue.
Previously, uiomove_object_page() would maintain LRU by requeuing the
accessed page. This involves acquiring one of the heavily contended page
queue locks. Moreover, it is unnecessarily expensive for pages in the
active queue.
As of r254304 the page daemon continually performs a slow scan of the
active queue, with the effect that unreferenced pages are gradually
moved to the inactive queue, from which they can be reclaimed. Prior to
that revision, the active queue was scanned only during shortages of
free and inactive pages, meaning that unreferenced pages could get
"stuck" in the queue. Thus, tmpfs was required to use the inactive queue
and requeue pages in order to maintain LRU. Now that this is no longer
the case, tmpfs I/O operations can use the active queue and avoid the
page queue locks in most cases, instead setting PGA_REFERENCED on
referenced pages to provide pseudo-LRU.
Relevant vendor changes:
PR #905: Support for Zstandard read and write filters
PR #922: Avoid overflow when reading corrupt cpio archive
Issue #935: heap-based buffer overflow in xml_data (CVE-2017-14166)
OSS-Fuzz 2936: Place a limit on the mtree line length
OSS-Fuzz 2394: Ensure that the ZIP AES extension header is large enough
OSS-Fuzz 573: Read off-by-one error in RAR archives (CVE-2017-14502)
Michael Tuexen [Sat, 30 Sep 2017 11:40:18 +0000 (11:40 +0000)]
* Update function definitions.
* Ensure that the datalen always describes the length after the IPv6
header consistently, not matter which protocol us used for probes..
* Document that the default length is 20, not 12.
* Don't send inormation in probe packets which is not needed or
even checked when the responses are processed.
* Address CID 978587.
This is mainly a cleanup preparing the addition of SCTP and TCP
as possible probe packet protocols.
Mention new -n flag.
Remove optional -h from the operation list lines, -h would cause the
utility to exit without performing the action.
Explain the default path behavior, list default path.
Correct example of update performed from the non-default path,
it needs -n and the trailing slash is redundand.
Remove useless BUGS section.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Allow to disable default microcode updates search path with the new
'-n' option.
Look for updates in the default locations only after user-supplied
locations are tried.
If newer microcode files are put into non-standard path, both measures
allow to avoid situation where older update loaded from the default
path first, and then the second update is applied from non-standard
path. Applying intermediate updates might be undesirable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Rick Macklem [Fri, 29 Sep 2017 23:13:01 +0000 (23:13 +0000)]
Add support for Flex File Layout to the pNFS client structures.
This patch modifies the pNFS client layout and deviceinfo structures
to add fields and unions for the Flex File Layout. Until a future
commit adds Flex File layout support, these new fields are not used.
This patch should not affect the "pnfs" option for File Layout.
Ian Lepore [Fri, 29 Sep 2017 22:13:26 +0000 (22:13 +0000)]
Enhance mdmfs(8) to work with tmpfs(5).
Existing scripts and associated config such as rc.initdiskless, rc.d/var,
and others, use mdmfs to create memory filesystems. That program accepts a
size argument which allows SI suffixes and treats an unsuffixed number as a
count of 512 byte sectors. That makes it difficult to convert existing
scripts to use tmpfs instead of mdmfs, because tmpfs treats unsuffixed
numbers as a count of bytes. The script logic to deal with existing user
config that might include suffixed and unsuffixed numbers is... unpleasant.
Also, there is no g'tee that tmpfs will be available. It is sometimes
configured out of small-resource embedded systems to save memory and flash
storage space.
These changes enhance mdmfs(8) so that it accepts two new values for the
'md-device' arg: 'tmpfs' and 'auto'. With tmpfs, the program always uses
tmpfs(5) (and fails if it's not available). With 'auto' the program prefers
tmpfs, but falls back to using md(4) if tmpfs isn't available. It also
handles the -s <size> argument so that the mdconfig interpetation of
unsuffixed numbers applies when tmpfs is used as well, so that existing user
config keeps working after a switch to tmpfs.
A new rc setting, mfs_type, is added to etc/defaults/rc.conf to let users
force the use of tmpfs or md; the default value is "auto".
The GCC xmmintrin.h header brokenly includes mm_malloc.h unconditionally.
(The Clang version of xmmintrin.h only includes mm_malloc.h if not compiling
in standalone mode.)
Hack around GCC's broken header by defining the include guard macro ahead of
including xmmintrin.h.
__setrunelocale: Fix asprintf(3) failure not returning an error.
Also fix the style of the asprintf(3) call in __collate_load_tables_l().
Both of these lines were modified away from snprintf(3) during the
import from DragonFly/Illumos.
smb_strdupin() tried to roll a copyin() based strlen to allocate a buffer
and then blindly copyin that size. Of course, a malicious user program
could simultaneously manipulate the buffer, resulting in a non-terminated
string being copied.
Later assumptions in the code rely upon the string being nul-terminated.
Just use copyinstr() and drop the racy sizing.
PR: 222687
Reported by: Meng Xu <meng.xu AT gatech.edu>
Security: possible local DoS
Sponsored by: Dell EMC Isilon
* check mbuf length before doing mtod() and accessing to IP header;
* update oip pointer and all depending pointers after m_pullup();
* remove extra checks and extra parentheses, wrap long lines;
Scott Long [Fri, 29 Sep 2017 04:52:15 +0000 (04:52 +0000)]
Convert sysctl sbuf usage to use a fully dynaic sbuf. This is strictly
needed, but it silences an erroneous Coverity warning and makes the code a
little more logically consistent. Also mark the sysctl as MPSAFE.
Rick Macklem [Thu, 28 Sep 2017 23:05:08 +0000 (23:05 +0000)]
Add the NFS client state flag that enables Flexible File Layout.
This patch adds a NFSSTA_FLEXFILE flag that will be used to enable
Flexible File Layout for the NFSv4.1 pNFS client. It is not yet
used, but will be after a future commit adds Flex File Layout support.
Rick Macklem [Thu, 28 Sep 2017 22:33:01 +0000 (22:33 +0000)]
Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage.
This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use
a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to
avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa()
call to after the completion of parsing of the DeviceInfo reply from
the server. This patch is in preparation for addition of Flex File
Layout support in a future commit.
It only affects the "pnfs" NFSv4.1 client mount option and should not
have changed its semantics.
Nick Hibma [Thu, 28 Sep 2017 19:57:46 +0000 (19:57 +0000)]
Make this compile if NO_SYSCTL_DESCR is defined.
Defining a variable with the description and then only use it in the
SYSCTL declaration led to an unused variable warning. In the SYSCTL the
passed value is discarded using __DESCR.
Alan Cox [Thu, 28 Sep 2017 17:55:41 +0000 (17:55 +0000)]
Optimize vm_object_page_remove() by eliminating pointless calls to
pmap_remove_all(). If the object to which a page belongs has no
references, then that page cannot possibly be mapped.
Split the handlers for pop of invalid selectors from the trap frame
into usermode and kernel variants. Usermode handler is kept as is, it
restores the already loaded parts of the trap frame and jumps to set
up a signal delivery to the user process.
New kernel part of the handler emulates IRET treatment of the segments
which would violate access right. It loads NUL selector in the
segment register which load causes the fault, and then continues the
return to interrupted kernel code. Since invalid selectors in the
segment registers in the kernel mode can only exist while kernel still
enters or exits from userspace, we only zero invalid userspace
selectors. If userspace tries to use the segment register, it gets a
signal, as if the processor segment descriptor cache was reloaded.
Reported by: Maxime Villard <max@m00nbsd.net>
Suggested and reviewed by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Do not return from interrupt using the POP_FRAME;iret instruction
sequence, always jump to doreti.
The user segments selectors saved on the stack might become invalid
because userspace manipulated LDT in a parallel thread. trap() is
aware of such issue, but it is only prepared to handle it at iret and
segment registers load operations in doreti path.
Also remove POP_FRAME macro because it is no longer used.
Reviewed by: bde, jhb (as part of r323722)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Revert r323722. A better fix will be committed shortly, as well as
some still useful bits of the reverted revision.
The problem with the committed fix is that there are still issues with
returning from NMI, when NMI interrupted kernel in a moment where the
kernel segments selectors were still not loaded into registers. If
this happens, the NMI return would loose the userspace selectors
because r323722 does not reload segment registers on return to kernel
mode.
Fixing the problem is complicated. Since an alternative approach to
handle the original bug exists, it makes sence to stop adding more
complexity.
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Warner Losh [Thu, 28 Sep 2017 01:27:00 +0000 (01:27 +0000)]
Tweak performance of nda completions
Use xpt_done_direct in preference to xpt_done when completing a
successful I/O. Continue to use xpt_done when there's an error, or for
completion of the submission of a CCB. This eliminates a context
switch to the cam_doneq thread.
Rick Macklem [Wed, 27 Sep 2017 23:23:41 +0000 (23:23 +0000)]
Fix a memory leak that occurred in the pNFS client.
When a "pnfs" NFSv4.1 mount was unmounted, it didn't free up the layouts
and deviceinfo structures. This leak only affects "pnfs" mounts and only
when the mount is umounted.
Found while testing the pNFS Flexible File layout client code.
John Baldwin [Wed, 27 Sep 2017 23:15:33 +0000 (23:15 +0000)]
Add UMA_ALIGNOF().
This is a wrapper around _Alignof() that sets the alignment for a zone
to the alignment required by a given type. This allows the compiler to
determine the proper alignment rather than having the programmer try to
guess.
bhnd: Add support for supplying bus I/O callbacks when initializing an EROM
parser.
This allows us to use the EROM parser API in cases where the standard bus
space I/O APIs are unsuitable. In particular, this will allow us to parse
the device enumeration table directly from bhndb(4) drivers, prior to
full attach and configuration of the bridge.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12510
Add bhnd(4) API for explicitly registering BHND platform devices (ChipCommon,
PMU, NVRAM, etc) with the bus, rather than walking the newbus hierarchy to
discover platform devices. These devices are now also refcounted; attempting
to deregister an actively used platform device will return EBUSY.
This resolves a lock ordering incompatibility with bwn(4)'s firmware loading
threads; previously it was necessary to acquire Giant to protect newbus access
when locating and querying the NVRAM device.
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12392
Warner Losh [Wed, 27 Sep 2017 19:22:10 +0000 (19:22 +0000)]
Since the human readable name is actually ignored, and not matching a
'human' pnp string, change it to #, the name reserved for fields that
are ignored.
If the filesystem is not exported directly return NULL.
If no address is given and filesystem is exported using some default
one return it directly, if it doesn't have a default one directly
return NULL.
mbuf: Remove UDP_IPV4_EX, which was never defined.
Add comment to explain the IPV6_EX suffix. The confusion about
these RSS hash type probably stems from the facts that they were
never widely implemented by hardwares.
Reviewed by: rwatson
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D12453
IPV6_EXs in RSS never mean fragment. They mean:
"- Home address from the home address option in the IPv6 destination
options header. If the extension header is not present, use the
Source IPv6 Address.
- IPv6 address that is contained in the Routing-Header-Type-2 from
the associated extension header. If the extension header is not
present, use the Destination IPv6 Address."
UDP_IPV4_EX is an invalid RSS hash type, which will be removed.
tid must be equal to curthread and the target routine was already reading
it anyway, which is not a problem. Not passing it as a parameter allows for
a little bit shorter code in callers.
Rick Macklem [Tue, 26 Sep 2017 23:42:44 +0000 (23:42 +0000)]
Add major and minor version arguments to nfscl_reqstart().
This patch adds "vers" and "minorvers" arguments to nfscl_reqstart().
The patch always passes them in as "0" and that implies no change
in semantics. These arguments will be used by a future commit that
adds support for the Flexible File Layout.
John Baldwin [Tue, 26 Sep 2017 23:24:15 +0000 (23:24 +0000)]
Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration. However, this
resulted in the item being free'd before it's address was used with
wakeup(). As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock. This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue. Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.
The main reason I can think of for deferring a wakeup() would be to
avoid waking up a waiter while holding a lock that the waiter would
need. However, no locks are dropped in between the wakeup() in
workitem_free() and the end of the loop in complete_jseg() as far as I
can tell.
In general I think it is not safe to do a wakeup() after free() as one
cannot control how other parts of the kernel that might reuse the
address for a different wait channel will handle spurious wakeups.
Some x86 class CPUs have accelerated intrinsics for SHA1 and SHA256.
Provide this functionality on CPUs that support it.
This implements CRYPTO_SHA1, CRYPTO_SHA1_HMAC, and CRYPTO_SHA2_256_HMAC.
Correctness: The cryptotest.py suite in tests/sys/opencrypto has been
enhanced to verify SHA1 and SHA256 HMAC using standard NIST test vectors.
The test passes on this driver. Additionally, jhb's cryptocheck tool has
been used to compare various random inputs against OpenSSL. This test also
passes.
So ~4.4-4.6x speedup depending on algorithm choice. This is consistent with
the results the Linux folks saw for 4kB buffers.
The driver borrows SHA update code from sys/crypto sha1 and sha256. The
intrinsic step function comes from Intel under a 3-clause BSDL.[0] The
intel_sha_extensions_sha<foo>_intrinsic.c files were renamed and lightly
modified (added const, resolved a warning or two; included the sha_sse
header to declare the functions).
Theoretically, HMACs do not actually have any limit on key sizes.
Transforms should compact input keys larger than the HMAC block size by
using the transform (hash) on the input key.
(Short input keys are padded out with zeros to the HMAC block size.)
Still, not all FreeBSD crypto drivers that provide HMAC functionality
handle longer-than-blocksize keys appropriately, so enforce a "maximum" key
length in the crypto API for auth_hashes that previously expressed a
requirement. (The "maximum" is the size of a single HMAC block for the
given transform.) Unconstrained auth_hashes are left as-is.
I believe the previous hardcoded sizes were committed in the original
import of opencrypto from OpenBSD and are due to specific protocol
details of IPSec. Note that none of the previous sizes actually matched
the appropriate HMAC block size.
The previous hardcoded sizes made the SHA tests in cryptotest.py
useless for testing FreeBSD crypto drivers; none of the NIST-KAT example
inputs had keys sized to the previous expectations.
The following drivers were audited to check that they handled keys up to
the block size of the HMAC safely:
Hardware accelerated HMAC:
* ccr(4)
* hifn(4)
* sec(4) (Only supports up to 64 byte keys despite claiming to
support SHA2 HMACs, but validates input key sizes)
* cryptocteon (MIPS)
* nlmsec (MIPS)
* rmisec (MIPS) (Amusingly, does not appear to use key material at
all -- presumed broken)
MFV r323535: 8585 improve batching done in zil_commit()
FreeBSD notes:
- this MFV reverts FreeBSD commit r314549 to make the merge easier
- at present our emulation of cv_timedwait_hires is rather poor,
so I elected to use cv_timedwait_sbt directly
Please see the differential revision for details.
Unfortunately, I did not get any positive reviews, so there could be
bugs in the FreeBSD-specific piece of the merge.
Hence, the long MFC timeout.
https://www.illumos.org/issues/8585
The current implementation of zil_commit() can introduce significant
latency, beyond what is inherent due to the latency of the underlying
storage. The additional latency comes from two main problems:
1. When there's outstanding ZIL blocks being written (i.e. there's
already a "writer thread" in progress), then any new calls to
zil_commit() will block waiting for the currently oustanding ZIL
blocks to complete. The blocks written for each "writer thread" is
coined a "batch", and there can only ever be a single "batch" being
written at a time. When a batch is being written, any new ZIL
transactions will have to wait for the next batch to be written,
which won't occur until the current batch finishes.
As a result, the underlying storage may not be used as efficiently
as possible. While "new" threads enter zil_commit() and are blocked
waiting for the next batch, it's possible that the underlying
storage isn't fully utilized by the current batch of ZIL blocks. In
that case, it'd be better to allow these new threads to generate
(and issue) a new ZIL block, such that it could be serviced by the
underlying storage concurrently with the other ZIL blocks that are
being serviced.
2. Any call to zil_commit() must wait for all ZIL blocks in its "batch"
to complete, prior to zil_commit() returning. The size of any given
batch is proportional to the number of ZIL transaction in the queue
at the time that the batch starts processing the queue; which
doesn't occur until the previous batch completes. Thus, if there's a
lot of transactions in the queue, the batch could be composed of
many ZIL blocks, and each call to zil_commit() will have to wait for
all of these writes to complete (even if the thread calling
zil_commit() only cared about one of the transactions in the batch).
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
Ian Lepore [Mon, 25 Sep 2017 23:50:10 +0000 (23:50 +0000)]
Fix the return value from _Unwind_Backtrace() on arm.
If unwinding stops due to hitting the end of the call chain, the return
value is supposed to be _URC_END_OF_STACK; other values indicate internal
errors. The return value from get_eit_entry() is now returned without
translating it to _URC_FAILURE, so that callers can see _URC_END_OF_STACK
when it happens.
Ian Lepore [Mon, 25 Sep 2017 23:24:41 +0000 (23:24 +0000)]
Fix handling of uncaught exceptions in a std::terminate() handler on arm.
When raising an exception, the unwinder searches for a catch handler and if
none is found it should invoke std::terminate() with the uncaught exception
as the "current" exception. Before this change, the terminate handler was
invoked with no exception as current (abi::__cxa_current_exception_type()
returned NULL), because the return value from the unwinder indicated an
internal failure in unwinding. It turns out that was because all errors
from get_eit_entry() were translated to _URC_FAILURE. Now the error is
returned untranslated, which allows _URC_END_OF_STACK to percolate upwards
to throw_exception() in libcxxrt. When it sees that return status it
properly calls std::terminate() with the uncaught exception installed
as the current exception, allowing custom terminate handlers to work
with it.