mav [Sat, 3 Oct 2015 08:06:29 +0000 (08:06 +0000)]
MFC r286710:
6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Dan McDonald <danmcd@omniti.com>
mav [Sat, 3 Oct 2015 08:05:33 +0000 (08:05 +0000)]
MFC r286708: 5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Author: Paul Dagnelie <pcd@delphix.com>
While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.
https://www.illumos.org/issues/5981
When dmu_objset_find_dp gets called with a read lock held, it fans out
the work to the task queue. Each task in turn acquires its own read
lock before calling the callback. If during this process anyone tries
to a acquire a write lock, it will stall all read lock requests.Thus
the tasks will never finish, the read lock of the caller will never
get freed and the write lock never acquired. deadlock.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Arne Jansen <jansen@webgods.de>
https://www.illumos.org/issues/5269
When importing a pool (at boot or with zpool import) with many
filesystem, the process can take minutes. It doesn't matter whether
the pool has been exported cleanly or uncleanly. The problem is that
each dataset has its own log chain. On import, all datasets have to be
checked if there are logs to replay. The idea is to speed up this
process by paralellizing it.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Arne Jansen <jansen@webgods.de>
mav [Sat, 3 Oct 2015 07:57:32 +0000 (07:57 +0000)]
MFC r286683: 5765 add support for estimating send stream size with
lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
Author: Max Grossman <max.grossman@delphix.com>
https://www.illumos.org/issues/5695
In dmu_sync_ready(), a hole block pointer will have it's logical size
explicitly set as it's necessary for replay purposes. To "undo" this,
dmu_sync_done() will zero out any hole that it finds. This becomes a
problem when using the "hole_birth" feature, as this will also wipe out
any birth time that might have happened to be set on the hole.
...
As a fix, the logic to zero out a hole is only applied to old style
holes with a birth time of zero. Holes created with the "hole_birth"
feature enabled will have a non-zero birth time, and will be skipped
(thus preserving the ltime, type, and level information as well).
In addition, zdb was updated to also print the ltime, type, and level
information for these new style holes. Previously, only the logical
birth time would be printed.
Author: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
mav [Sat, 3 Oct 2015 07:50:15 +0000 (07:50 +0000)]
MFC r286625:
5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:48:25 +0000 (07:48 +0000)]
MFC r286605: 5812 assertion failed in zrl_tryenter(): zr_owner==NULL
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:47:33 +0000 (07:47 +0000)]
MFC r286603: 5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:46:03 +0000 (07:46 +0000)]
MFC r286600: 5808 spa_check_logs is not necessary on readonly pools
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Matthew Ahrens <mahrens@delphix.com>
alc [Sat, 3 Oct 2015 07:43:33 +0000 (07:43 +0000)]
MFC r288281
The conversion of kmem_alloc_attr() from operating on a vm map to a vmem
arena in r254025 introduced a bug in the case when an allocation is only
partially successful. Specifically, the vm object lock was not being
acquired before freeing the allocated pages. To address this bug, replace
the existing code by a call to kmem_unback().
Change the type of a variable in kmem_alloc_attr() so that an allocation
of two or more gigabytes won't fail.
Replace the error handling code in kmem_back() by a call to kmem_unback().
mav [Sat, 3 Oct 2015 07:34:21 +0000 (07:34 +0000)]
MFC r286589: 5820 verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig)
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:33:27 +0000 (07:33 +0000)]
MFC r286587: 5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:32:34 +0000 (07:32 +0000)]
MFC r286579: 5313 Allow I/Os to be aggregated across ZIO priority classes
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Will Andrews <willa@SpectraLogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
mav [Sat, 3 Oct 2015 07:29:56 +0000 (07:29 +0000)]
MFC r286575: 5056 ZFS deadlock on db_mtx and dn_holds
Reviewed by: Will Andrews <willa@spectralogic.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Justin Gibbs <justing@spectralogic.com>
Currently, every buffer cached in the L2ARC is accompanied by a 240-byte
header in memory, leading to very high memory consumption when using very
large cache devices. These changes significantly reduce this overhead.
Trunk Optimized
+-----------------+
L1-only | 176 B | 176 B | (same)
+-----------------+
L1 & L2 | 240 B | 208 B | (saved 32 bytes)
+-----------------+
L2-only | 240 B | 128 B | (saved 116 bytes)
+-----------------+
For an average blocksize of 8KB, this means that for the L2ARC, the ratio
of metadata to data has gone down from about 2.92% to 1.56%. For a
'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this
means that we expect a completely full L2ARC to use (1600 GB * 0.0156) /
60GB = 41% of the available memory, down from 78%.
mav [Sat, 3 Oct 2015 07:25:05 +0000 (07:25 +0000)]
MFC r286551: 5694 traverse_prefetcher does not prefetch enough
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: George Wilson <george.wilson@delphix.com>
mav [Sat, 3 Oct 2015 07:24:12 +0000 (07:24 +0000)]
MFC r286549:
5693 ztest fails in dbuf_verify: buf[i] == 0, due to dedup and bp_override
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Matthew Ahrens <mahrens@delphix.com>
mav [Sat, 3 Oct 2015 07:22:24 +0000 (07:22 +0000)]
MFC r286545:
5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
mav [Sat, 3 Oct 2015 07:21:27 +0000 (07:21 +0000)]
MFC r286543: 5592 NULL pointer dereference in dsl_prop_notify_all_cb()
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
mav [Sat, 3 Oct 2015 07:20:26 +0000 (07:20 +0000)]
MFC r286541: 5531 NULL pointer dereference in dsl_prop_get_ds()
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Robert Mustacchi <rm@fingolfin.org>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Rich Lowe <richlowe@richlowe.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: Justin T. Gibbs <justing@spectralogic.com>
mav [Fri, 2 Oct 2015 20:09:16 +0000 (20:09 +0000)]
MFC r279996 (by smh): Allow zvol_geom_worker to process BIO_DELETE's
If zvol_geom_start is called with a BIO_DELETE from a thread which can
sleep it queues it for later processing by the zvol_geom_worker. The
zvol_geom_worker didn't have a delete case so would simply loose the bio
hence preventing the original caller from every completing. In addition
an other unknown types would suffer the same fate.
Allow zvol_geom_worker to process BIO_DELETE's via zvol_strategy and
return unsupported for all unknown bio types.
mav [Fri, 2 Oct 2015 19:59:43 +0000 (19:59 +0000)]
MFC r275780 (by delphij):
Add a loader tunable, vfs.zfs.arc_meta_min, which controls how much metadata
ZFS should keep in ARC at minimum.
In arc_evict(), when doing recycle, take more factors into account by
applying the following policy:
1. If no evictable data, evict metadata;
2. If no evictable metadata, evict data;
3. If we hit arc_meta_limit, evict metadata;
4. If we haven't hit arc_meta_min, evict data;
5* (Illumos only, not present in new FreeBSD code, yet) evict the oldest
cached element from data and metadata.
(FreeBSD) evict the data type specified by caller, which is the
existing behavior.
Note that because of our splitted locks (implemented in r205231 to improve
scalability by reducing lock contention), implementing the fifth Illumos
behavior will not be cheap, so for now just implement the 1-4 and fall back
to current behavior for 5.
Illumos issue:
5368 ARC should cache more metadata
vangyzen [Fri, 2 Oct 2015 14:36:41 +0000 (14:36 +0000)]
MFC r283924
Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
vangyzen [Fri, 2 Oct 2015 14:21:07 +0000 (14:21 +0000)]
MFC r281785
Always send log(9) messages to the message buffer.
It is truer to the semantics of logging for messages to *always*
go to the message buffer, where they can eventually be collected
and, in fact, be put into a log file.
This restores the behavior prior to r70239, which seems to have
changed it inadvertently.
Submitted by: Eric Badger <eric@badgerio.us>
Obtained from: Dell Inc.
gjb [Fri, 2 Oct 2015 02:08:40 +0000 (02:08 +0000)]
MFC r288374:
In addition to the ubldr file, also copy ubldr.bin to the
MS-DOS partition. This will help with transitioning to
a single arm/armv6 userland build which could be used for
all FreeBSD/armv6 images without UBLDR_LOADADDR being set
for each board (ultimately requiring a separate buildworld
for each currently).
jhb [Thu, 1 Oct 2015 22:19:41 +0000 (22:19 +0000)]
MFC 287448:
Add more mmap tests related to character devices.
- Add cdev-related tests for bad args.
- Add two simple tests cases for mapping /dev/zero that test for
MAP_ANON-like behavior.
jhb [Thu, 1 Oct 2015 22:17:27 +0000 (22:17 +0000)]
MFC 286370:
Add various tests to ensure that invalid arguments passed to mmap()
trigger failures.
Note: most of the tests that should provoke an EINVAL error do not
pass on stable/10 as stable/10 does not have the changes that added
more strict parameter checking to mmap(). Those changes will not be
merged to stable/10, so I have disabled them via #if 0 with a comment
explaining why.
jhb [Thu, 1 Oct 2015 21:54:43 +0000 (21:54 +0000)]
MFC 286256:
kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB. However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup. To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.
jhb [Thu, 1 Oct 2015 20:54:19 +0000 (20:54 +0000)]
MFC 284175:
Handle X2APIC entries in the MADT for APICs with an ID < 255. At least one
BIOS has been seen to include such entries even though the relevant specs
require that X2APIC entries only be used for CPUs with an APIC ID >= 255.
This was tested on a system with "plain" local APIC entries in the MADT
to ensure no regressions, but it has not yet been tested on a system with
X2APIC entries in the MADT. Currently such systems do not boot at all,
and with this change they might now boot correctly.
jhb [Thu, 1 Oct 2015 17:09:20 +0000 (17:09 +0000)]
MFC 283624,283630:
Export a list of VM objects in the system via a sysctl. The list can be
examined via 'vmstat -o'. It can be used to determine which files are
using physical pages of memory and how much each is using.
MFC 269727:
Update vmstat usage for last-argument count/wait parameters
Correct the usage in both the manpage and in usage() to indicate
that the wait interval and repetition count may be given either
with the respective -w/-c arguments, or as the final positional
arguments. [0]
The corresponding code to implement the positional arguments has
been conditional on the (always-enabled) BACKWARD_COMPATIBILITY
macro since the original 4.4-lite import. It's no longer reasonable
to remove the functionality, so remove the macro and conditional
instead.
Note that multiple disks may be given on the command line.
While here, sort arguments and apply minor mdoc fixes.
MFC 283613,287374:
Use the cpuset API more consistently:
- Fetch the root set from cpuset_getaffinity() instead of assuming all CPUs
from 0 to hw.ncpu are the root set.
- Use CPU_SETSIZE and CPU_FFS.
- The original notion of halted CPUs the manpage and code refers to is gone.
Use the term "available" instead.
The Sun RPC framework uses a netbuf structure to represent the
transport specific form of a universal transport address. The
structure is expected to be opaque to consumers. In the current
implementation, the structure contains a pointer to a buffer
that holds the actual address.
In rpcbind(8), netbuf structures are copied directly, which would
result in two netbuf structures that reference to one shared
address buffer. When one of the two netbuf structures is freed,
access to the other netbuf structure would result in an undefined
result that may crash the rpcbind(8) daemon.
Fix this by making a copy of the buffer that is going to be freed
instead of doing a shallow copy.
MFC r266588
There is no reason to perform the pmap_remove() on the kernel pmap while
the kmem object lock is held. Do the pmap_remove() before acquiring the
kmem object lock.
MFC r288025
Correct a non-fatal error in vm_pageout_worker(). vm_pageout_worker()
should not assume that vm_pages_needed will remain set while it sleeps.
Other threads can clear vm_pages_needed by performing a sufficient
number of vm_page_free() calls, e.g., process termination. The effect
of this error was that vm_pageout_worker() would free and/or launder
pages when, in fact, there was no shortage of free pages.
Rewrite a nearby comment to describe all of the possible cases and not
just the most common case. The problem being that the comment made
the most common case seem like the only case.
MFC r285282
The intention of r254304 was to scan the active queue continuously.
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.
Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.
MFC r286513, r286784
Revise the text about the atomicity of the defined operations across
multiple processors. In particular, clearly state that the operations
are always atomic when they are applied to the default memory type
that is used by the kernel (and applications).
Stop describing an acquire operation as a read barrier and a release
operation as a write barrier. That description has never been correct,
and it has caused confusion.
Also, explicitly say that a thread doesn't see its own accesses being
reordered. The reordering of a thread's accesses is only (potentially)
visible to another thread.
Use strlcpy() in favor of strncpy() as it's defined to have a nul character
at the end of string buffer, and the code context do expects this to behave
correctly (e.g. strchr).