MFC r252325:
The dtmalloc provider uses the short description of a malloc type as the
function name of its corresponding DTrace probes. These descriptions may
contain whitespace, but probe names cannot, so just replace any whitespace
with underscores when creating probes.
MFC r251238:
SDT probes can directly pass up to five arguments as arguments to
dtrace_probe(). Arguments beyond these five must be obtained in an
architecture-specific way; this can be done through the getargval provider
method, and through dtrace_getarg() if getargval isn't overridden.
This change fixes two off-by-one bugs in the way these arguments are fetched
in FreeBSD's DTrace implementation. First, the SDT provider must set the
aframes parameter to 1 when creating a probe. The aframes parameter controls
the number of frames that dtrace_getarg() will step over in order to find
the frame containing the extra arguments. On FreeBSD, dtrace_getarg() is
called in SDT probe context via
so aframes must be 3 since the arguments are in dtrace_probe()'s frame; it
was previously being called with a value of 2 instead. illumos uses a
different aframes value for SDT probes, but this is because illumos SDT
probes fire by triggering the #UD fault handler rather than calling
dtrace_probe() directly.
The second bug has to do with the way arguments are grabbed out
dtrace_probe()'s frame on amd64. The code currently jumps over the first
stack argument and retrieves the rest of them using a pointer into the
stack. This works on i386 because all of dtrace_probe()'s arguments will be
on the stack and the first argument is the probe ID, which should be
ignored. However, it is incorrect to ignore the first stack argument on
amd64, so we correct the pointer used to access the arguments.
Poor ZFS send / receive performance due to snapshot
hold / release processing (by smh@)
Illumos ZFS issues:
3740 Poor ZFS send / receive performance due to snapshot
hold / release processing
MFV r252215:
Restore a previous behavior before r251646, where when destructing
ZFS snapshot, the ioctl would return ENOENT when it hit any of
them in the errlist (the new behavior was only return ENOENT when
all returns error).
Illumos ZFS issues:
3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl
MFC r251636: illumos #3749 zfs event processing should work on R/O root
filesystems
This log is a modified version of the original one written by gibbs@,
to account for changes made during the illumos RTI process.
Allow ZFS asynchronous event handling to proceed even if the root file
system is mounted read-only. This restriction appears to have been put
in place to avoid errors with updating the configuration cache file.
However:
o The majority of asynchronous event handling does not involve
configuration cache file updates.
o The configuration cache file need not be on the root file system,
so the check was not complete.
o Other classes of errors (e.g. file system full) can also prevent
a successful update yet do not prevent asynchronous event processing.
o Configurations such as NanoBSD never have a read-write root,
so ZFS event processing is permanently disabled in these systems.
o Failure to handle asynchronous events promptly can extend the
window of time that a pool is in a critical state.
At worst, a missed configuration cache update will force the operator to
perform a manual "zfs import" (note -f is not required) to inform the
system about a newly created pool. To minimize the likelihood of this
rare occurrence, configuration cache write failures now emit FMA events
(via devctl) so the operator can take corrective action, and the write
is retried every 5 minutes. The retry interval, in seconds, is tunable
via the sysctl "vfs.zfs.ccw_retry_interval".
As a side effect of reporting configuration cache events, other sysevents,
such as re-silver start/stop, are now also reported via devctl.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
o As is done in zfs_fm.c, provide a manual declaration for
devctl_notify(). Both declarations could be combined
into spa_impl.h, but the declaration is fault management
related, not spa specific. sys/fm/fs/zfs.h would be ideal
if it weren't so public and reserved for FMA string
definitions. I'm open to suggestions on how to improve
this nit while minimizing our divergence from Solaris.
o Use devctl_notify() to implement sysevent support in
spa_event_notify(). The subsystem is EC_ZFS so that
these events can never collide with those emitted in
zfs_fm.c.
o Add the sysctl "vfs.zfs.ccw_retry_interval". The value
defaults to 5 minutes and is used to rate limit, on a
per-pool basis, configuration cache file write attempts.
o Modify spa_async_dispatch to honor configuration cache
write limiting. If other events are pending, a configuration
cache write will be attempted at the same time, so the
rate limiting only applies when the asynchronous dispatch
system is otherwise idle. Async events should be rare
(e.g. device arrival/departure) and configuration cache
writes rarer, so a more complicated system to strictly
honor the retry limit seems unwarranted.
o Remove check in spa_async_dispatch() for the root file
system being read-write.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c:
Instead of silently ignoring configuration cache write
failures, report them via a new FMA event as well as
to the console. The current zfs_ereport_post() doesn't
allow arbitrary name=value pairs to be appended to the
report, so the configuration cache file name is only
available on the console output. This limitation should
be addressed in a future update.
Note: This error report is only posted once per incident,
to avoid spamming.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h:
Add a hrtime_t to the spa data structure to track the
time (via gethrtime()) of the last configuration cache file
write failure. This is referenced in spa_async_dispatch()
to effect the rate limiting.
sys/cddl/contrib/opensolaris/uts/common/sys/fm/fs/zfs.h:
Add FM_EREPORT_ZFS_CONFIG_CACHE_WRITE as an ereport class.
Submitted by: gibbs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
Eric Schrock <eric.schrock@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
MFC r251635: illumos #3747 txg commit callbacks don't work
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:
Fix commit callbacks by moving them to the task's list.
Previously, list_move_tail() returned without doing anything because
the task list was passed as the source rather than destination.
cddl/contrib/opensolaris/cmd/ztest/ztest.c:
Check the commit callback threshold correctly.
Submitted by: will
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
MFC r251634: illumos #3745 zpool create should treat -O mountpoint and -m the same
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c: (change 644608)
This allows specifying a mountpoint using the latter form and having
its value checked and used as it would be using the former form.
As a consequence of this change:
1. The mountpoint property is set in the fsprops nvlist prior
to creating the pool, rather than being set after creating
the pool. To me, this is the proper approach, since it
avoids creating the pool if the mountpoint setting would
cause the command to fail.
2. The mountpoint property, unlike all others, can be specified
more than once. Only the last setting takes effect. This
is to avoid breaking potential existing users that specify
-m more than once.
Submitted by: will
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c
Fix "zpool create -R <whatever> -m <whatever>". Ever since
change 644608, this has been broken. The problem is that some
old code in libzfs_pool.c would force a pool's mountpoint to
"/" when creating a pool with an altroot. That probably
implemented some old policy decision regarding altroots, but it
conflicts with the current manpage. It also had no effect
until 644608, because the zpool command would _always_ change
the pool's mountpoint after creating it. The solution is to
delete the old code from libzfs_pool.c.
Submitted by: asomers
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
MFC r251632: illumos #3743 zfs needs a refcount audit
Audit zap cursor usage and correct missing calls to zap_cursor_fini().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
Correct early exit handling of several functions that
previously failed to close a cursor prior to returning.
Submitted by: gibbs
Audit holders of dmu_bufs and correct missing calls to dmu_buf_rele().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
Correct early exit handling of several functions that
previously failed to release a dmu_buf prior to returning.
Submitted by: will
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
Eric Schrock <eric.schrock@delphix.com>,
George Wilson <george.wilson@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
MFC r251631: illumos #3742 zfs comments need cleaner, more consistent style
- Make more of ZFS's comments use a natural English writing flow.
- Break up long paragraphs, fix various typos and spelling errors.
- Don't prefix a function description with its name when the function
definition immediately follows.
- Remove useless comments.
- Add extra whitespace where it makes the comments more readable.
New comments were separated from this change and added in r251629.
Submitted by: asomers, gibbs, will
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
George Wilson <george.wilson@delphix.com>,
Eric Schrock <eric.schrock@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
Embellish the comments in various components of ZFS. Move some comments
around closer to what they describe. Specifically, answer the questions:
- What are some of the edge cases of the dbuf state machine?
- What does a txg quiesce do?
- When does the DMU notify threads waiting on txg's that they may
proceed?
- How do the calculations for RAIDZ map allocations work?
- What process do the RAIDZ I/O start and done callbacks follow?
While here, adjust the function prototype of dmu_zfetch.c:dmu_zfetch_colinear()
to match its comment which describes its return as a boolean.
Submitted by: asomers, gibbs, will
Reviewed by: Matthew Ahrens <mahrens@delphix.com>,
Eric Schrock <eric.schrock@delphix.com>,
Christopher Siden <christopher.siden@delphix.com>
Sponsored by: Spectra Logic
For ATA_PASSTHROUGH commands, pretend isci(4) supports multiword DMA
by treating it as UDMA.
This fixes a problem introduced in r249933/r249939, where CAM sends
ATA_DSM_TRIM to SATA devices using ATA_PASSTHROUGH_16. scsi_ata_trim()
sets protocol as DMA (not UDMA) which is for multi-word DMA, even
though no such mode is selected for the device. isci(4) would fail
these commands which is the correct behavior but not consistent with
other HBAs, namely LSI's.
smh@ did some further testing on an LSI controller, which rejected
ATA_PASSTHROUGH_16 commands with mode=UDMA_OUT, even though only
a UDMA mode was selected on the device. So this precludes adding
any kind of mode detection in CAM to determine which mode to use on
a per-device basis.
MFC r252471:
Remove forced timeout of in-flight commands from mfi_timeout.
While this prevents commands getting stuck forever there is no way to guarantee
that data from the command hasn't been committed to the device.
In addition older mfi firmware has a bug that would cause the controller to
frequently stall IO for over our timeout value, which when combined with
a forced timeout often resulted in panics in UFS; which would otherwise be
avoided when the command eventually completed if left alone.
For reference this timeout issue is resolved in Dell FW package 21.2.1-0000.
Fixed FW package version for none Dell controller will likely vary.
Now that the necessary infrastructure is in place to ensure hhook points which
register after a khelp module will get hooked, move khelp module initialisation
to the earlier SI_SUB_KLD stage.
Move hhook's per-vnet initialisation to an earlier SYSINIT SI_SUB stage to
ensure all per-vnet related hhook initialisation is completed prior to any
virtualised hhook points attempting registration.
vnet_register_sysinit() requires that a stage later than SI_SUB_VNET be chosen.
There are no per-vnet initialisors in the source tree at this time which run
earlier than SI_SUB_INIT_IF. A quick audit of non-virtualised SYSINITs indicates
there are no subsystems pre SI_SUB_MBUF that would likely be interested in
registering a virtualised hhook point.
Settle on SI_SUB_MBUF as hhook's per-vnet initialisation stage as it's the first
overtly network-related initilisation stage to run after SI_SUB_VNET. If a
subsystem that initialises earlier than SI_SUB_MBUF ends up wanting to register
virtualised hhook points in future, hhook's use of SI_SUB_MBUF will need to be
revisited and would probably warrant creating a dedicated SI_SUB_HHOOK which
runs immediately after SI_SUB_VNET.
Add a private KPI between hhook and khelp that allows khelp modules to insert
hook functions into hhook points which register after the modules were loaded -
potentially useful during boot or if hhook points are dynamically registered.
Internalise handling of virtualised hook points inside
hhook_{add|remove}_hook_lookup() so that khelp (and other potential API
consumers) do not have to care when they attempt to (un)hook a particular hook
point identified by id and type.
Add support for non-virtualised hhook points, which are uniquely identified by
type and id, as compared to virtualised hook points which are now uniquely
identified by type, id and a vid (which for vimage is the pointer to the vnet
that the hhook resides in).
All hhook_head structs for both virtualised and non-virtualised hook points
coexist in hhook_head_list, and a separate list is maintained for hhook points
within each vnet to simplify some vimage-related housekeeping.
When a previous call to sbsndptr() leaves sb->sb_sndptroff at the start of an
mbuf that was fully consumed by the previous call, the mbuf ptr returned by the
current call ends up being the previous mbuf in the sb chain to the one that
contains the data we want.
This does not cause any observable issues because the mbuf copy routines happily
walk the mbuf chain to get to the data at the moff offset, which in this case
means they effectively skip over the mbuf returned by sbsndptr().
We can't adjust sb->sb_sndptr during the previous call for this case because the
next mbuf in the chain may not exist yet. We therefore need to detect the
condition and make the adjustment during the current call.
Fix by detecting the special case of moff being at the start of the next mbuf in
the chain and adjust the required accounting variables accordingly.
Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet. This
gives better feedback and meaningful errors for applications.
Call sshd_precmd instead of sshd_configtest when the operator
requests reload or restart, which, in addition of testing the
configuration, will also generate host keys when they are not
present (previous behavior).
MFC r251482,251733:
r251482:
Correct setting TX random backoff register. This register is
implemented as a 10 bits linear feedback shift register so only
lower 10 bits are valid.
Because this register is used to initialize random backoff interval
register only when resolved duplex is half-duplex, it wouldn't have
caused issues in these days.
r251733:
Fix a typo introduced in r213280. IFM_OPTIONS macro should see
current media word.
MFC r251481:
Do not report current link status if driver is not running.
Reporting link status in driver has a side-effect that makes mii(4)
check current link status. mii(4) will call link status change
callback when it sees link state change. Normally this wouldn't
have problems. However, ASF/IPMI firmware can actively access PHY
regardless of driver's running state such that reporting link
status for not-running interface can generate meaningless link
UP/DOWN messages.
This change also makes dhclient think driver got a valid link
regardless of link establishment so it will bypass dhclient's
initial link status check. I think that wouldn't be issue
though.
MFC r252143:
When RX checksum offloading is active, AX88772B will prepend a
checksum header. The header contains a received frame length but
the defined length for AX88772B is different with other ASIX
controllers. When the RX checksum is off, AX88772B controller does
not prepend a checksum header so driver has to use normal header
length mask.
This change should fix RX errors when RX checksum offloading is
off.
lstewart [Sat, 29 Jun 2013 04:27:04 +0000 (04:27 +0000)]
MFC r251887:
Add new FOREACH_FROM variants of the queue(3) FOREACH macros which can
optionally start the traversal from a previously found element by passing the
element in as "var". Passing a NULL "var" retains the same semantics as the
regular FOREACH macros.
lstewart [Fri, 28 Jun 2013 03:41:23 +0000 (03:41 +0000)]
MFC r251725:
Fix a potential NULL-pointer dereference that would trigger if the hhook
registration site did not provide storage for a copy of the hhook_head struct.
jhb [Thu, 27 Jun 2013 21:02:26 +0000 (21:02 +0000)]
MFC 233662,233677-233678,250418,252166:
If the firmware/BIOS assigns conflicting ranges to BARs then leaving the
BARs alone could result in one device stealing mmio accesses intended to go
to a second device. Previously the PCI bus driver attempted to handle this
case by clearing the BAR to 0 depending on BARs based at 0 not decoding (which
is not guaranteed to be true). Now when a conflicting BAR is detected the
following steps are taken:
1) If hw.pci.realloc_bars (a new tunable) is enabled (default is disabled),
then ignore the current BAR setting from the firmware and attempt to
allocate a fresh resource range for the BAR.
2) If 1) failed (or was disabled), disable decoding for the relevant
BAR type (e.g. disable mem decoding for a memory BAR) and emit a
warning if booting verbose.
jhb [Mon, 24 Jun 2013 18:37:52 +0000 (18:37 +0000)]
MFC 251470:
Do not compare the existing mask of a cpuset with a new mask when changing
the mask of a cpuset. Also, change the cpuset's mask before updating the
masks of all children. Previously changing a cpuset's mask first required
setting the mask to a super-set of both the old and new masks and then
changing it a second time to the new mask.
jhb [Mon, 24 Jun 2013 17:09:28 +0000 (17:09 +0000)]
MFC 250223:
Similar to 233760 and 236717, export some more useful info about the
kernel-based POSIX semaphore descriptors to userland via procstat(1) and
fstat(1):
- Change sem file descriptors to track the pathname they are associated
with and add a ksem_info() method to copy the path out to a
caller-supplied buffer.
- Use ksem_info() to export the path of a semaphore via struct kinfo_file.
- Teach fstat about semaphores and to display their path, mode, and value.
smh [Mon, 24 Jun 2013 15:35:42 +0000 (15:35 +0000)]
Added ZFS TRIM support which is enabled by default. To disable
ZFS TRIM support set vfs.zfs.trim.enabled=0 in loader.conf.
Creating new ZFS pools and adding new devices to existing pools
first performs a full device level TRIM which can take a significant
amount of time. The sysctl vfs.zfs.vdev.trim_on_init can be set to 0
to disable this behaviour.
ZFS TRIM requires the underlying device support BIO_DELETE which
is currently provided by methods such as ATA TRIM and SCSI UNMAP
via CAM, which are typically supported by SSD's.
Stats for ZFS TRIM can be monitored by looking at the sysctl's
under kstat.zfs.misc.zio_trim.
MFC r240868: Add TRIM support
MFC r244155: Renamed zfs trim stats
MFC r244187: Upgrade TRIM free request sizes optimisation
MFC r244188: Added vfs.zfs.vdev.trim_on_init sysctl
MFC r248572: Add TRIM support for L2ARC
MFC r248573: Don't register repair writes in the trim map.
MFC r248574: Improve TXG handling in the TRIM module
MFC r248575: TRIM cache devices based on time instead of TXGs
MFC r248576: Names the ZFS TRIM thread
MFC r248577: Optimisation of TRIM processing
MFC r248602: Fix for building libzpool under i386
MFC r249921: Enabled ZFS TRIM by default
markj [Sat, 22 Jun 2013 05:32:45 +0000 (05:32 +0000)]
MFC r250599:
Add a remark to the effect that a manually started relearn will always
result in the battery being completely drained, even in transparent learning
mode.
markj [Sat, 22 Jun 2013 05:25:12 +0000 (05:25 +0000)]
MFC r246951:
Mark the coretemp(4) sysctls as MPSAFE, ensuring that Giant won't be held
unnecessarily by a user thread waiting to run on a specific CPU after
calling sched_bind().
markj [Sat, 22 Jun 2013 04:52:12 +0000 (04:52 +0000)]
MFC r251166:
Add macros which allow one to define SDT probes with six or seven arguments;
they are needed when porting some of the Solaris providers (ip, iscsi, and
tcp in particular).
dtrace_probe() only takes five arguments from the probe site, so we need to
add the appropriate cast to allow for more than five arguments. The extra
arguments are later copied out of dtrace_probe()'s stack frame by
dtrace_getarg() (or the provider-specific getarg method) as needed.
dteske [Fri, 21 Jun 2013 21:59:58 +0000 (21:59 +0000)]
MFS9->8 r249822:
Update error messages when processing the INDEX file to display the given
path rather than a static string. This makes the error messages consistent
with the rest of the functions which already do the same thing (assumed to
be an oversight of r47055, 13+ years ago). A direct commit to stable/9.
jhb [Fri, 21 Jun 2013 19:30:32 +0000 (19:30 +0000)]
MFC 251637:
Borrow the algorithm from kvm_getprocs() to fix procstat(1) to handle the
case where the process tables grows in between the calls to fetch the size
and fetch the table.
Note that this is not a true MFC as the libprocstat library doesn't exist
in 8.x and the relevant code is in the procstat binary instead.
gahr [Thu, 20 Jun 2013 16:50:05 +0000 (16:50 +0000)]
MFC: r249406
- Do not bail out if stat(2) fails with ENOENT in the spool directory. This
happens if another atrm process removes a job while we're scanning through
the directory.
- While at it, optimize a bit the directory scanning, so that we quit
looping as soon as all jobs specified in argv have been dealt with.
jh [Wed, 19 Jun 2013 18:01:37 +0000 (18:01 +0000)]
MFC r251485:
Revert r238399.
The "failok" option doesn't have any effect at all unless specified in
fstab(5) and combined with the -a flag. The "failok" option is already
documented in fstab(5).
mav [Tue, 18 Jun 2013 09:48:56 +0000 (09:48 +0000)]
MFC r251616:
Don't update provider properties and don't set DISKFLAG_OPEN if d_open()
disk method call returned error. GEOM considers devices in such case as
still closed, and won't call symmetric d_close() for them.
mav [Mon, 17 Jun 2013 14:56:49 +0000 (14:56 +0000)]
MFC r251661:
Replicate r242422 from ata(4) to mvs(4):
Only four specific ATA PIO commands transfer several sectors per DRQ block
(interrupt). All other ATA PIO commands transfer one sector or 512 bytes
at one time. Hardcode these exceptions in mvs(4).
This fixes timeout of READ LOG EXT command used by `smartctl -x /dev/adaX`.
Also it fixes timeout of DOWNLOAD_MICROCODE on `camcontrol fwdownload`.
yongari [Mon, 17 Jun 2013 04:42:02 +0000 (04:42 +0000)]
MFC r251600:
Avoid unnecessary controller reinitialization by checking driver
running state. fxp(4) requires controller reinitialization for the
following cases.
o RX lockup condition on i82557
o promiscuous mode change
o multicast filter change
o WOL configuration
o TSO/VLAN hardware tagging/checksum offloading configuration
o MAC reprogramming after speed/duplex/flow-control resolution
o Any events that result in MAC reprogramming(link UP/DOWN,
remote link partner's restart of auto-negotiation etc)
o Microcode loading/unloading
Apart from above cases which come from hardware limitation, upper
stack also blindly reinitializes controller whenever an IP address
is assigned. After r194573, fxp(4) no longer needs to reinitialize
the controller to program multicast filter after upping the
interface. So keeping track of driver running state should remove
all unnecessary controller reinitializations.
This change will also address endless controller reinitialization
triggered by dhclient(8).
rmacklem [Sat, 15 Jun 2013 01:35:52 +0000 (01:35 +0000)]
MFC: r249623
Both NFS clients can deadlock when using the "rdirplus" mount
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.
jhb [Fri, 14 Jun 2013 22:06:45 +0000 (22:06 +0000)]
MFC 250220:
Fix FIONREAD on regular files. The computed result was being ignored and
it was being passed down to VOP_IOCTL() where it promptly resulted in
ENOTTY due to a missing else for the past 8 years. While here, use a
shared vnode lock while fetching the current file's size.
Merge libzfs_core, zfs deadman thread and other ZFS bugfixes and improvements.
MFC r246619:
Correct spelling of "daemon". No .Dd bump.
Noticed by: Nathan Rich <Nathan.Rich dynastysystems com>
MFC r247187:
Import vendor change to avoid "unitialized variable" warnings.
Illumos ZFS issues:
3522 zfs module should not allow uninitialized variables
MFC r247265:
Merge the ZFS I/O deadman thread from vendor (illumos).
This feature panics the system on hanging ZFS I/O, helps debugging
and resumes failed service.
The panic behavior can be controlled with the loader-only tunables:
vfs.zfs.deadman_enabled (enable or disable panic on stalled ZFS I/O)
vfs.zfs.deadman_synctime (expiration time for stalled ZFS I/O)
By default, ZFS I/O deadman is enabled by default on amd64 and i386
excluding virtual guest machines.
MFC r247348:
Be more verbose on ZFS deadman I/O panic
Patch suggested upstream.
MFC r247398:
Import metaslab_sync() speedup from vendor (illumos).
Illumos ZFS issues:
3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread
3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not
condensing)
3578 transferring the freed map to the defer map should be constant time
3579 ztest trips assertion in metaslab_weight()
MFC r247540:
Fix the zfs_ioctl compat layer to support zfs_cmd size change introduced
in r247265 (ZFS deadman thread). Both new utilities now support the old
kernel and new kernel properly detects old utilities.
For future backwards compatibility, the vfs.zfs.version.ioctl read-only
sysctl has been introduced. With this sysctl zfs utilities will be able
to detect the ioctl interface version of the currently loaded zfs module.
MFC r247585:
Merge new read-only zfs properties from vendor (illumos)
Illumos ZFS issues:
3588 provide zfs properties for logical (uncompressed) space used and
referenced
MFC r248265:
Update zfs.8 manpage date (missing in r247585)
MFC r248267:
Import minor ZFS changes from vendor
Illumos ZFS issues:
3604 zdb should print bpobjs more verbosely (fix zdb hang)
3606 zpool status -x shouldn't warn about old on-disk format
MFC r248571:
MFV 238590, 238592:
In the first zfs ioctl restructuring phase, the libzfs_core library was
introduced. It is a new thin library that wraps around kernel ioctl's.
The idea is to provide a forward-compatible way of dealing with new
features. Arguments are passed in nvlists and not random zfs_cmd fields,
new-style ioctls are logged to pool history using a new method of
history logging.
MFV 247580 [1]:
To address issues of several deadlocks and race conditions the locking
code around dsl_dataset was rewritten and the interface to synctasks
was changed.
User-Visible Changes:
"zfs snapshot" can create more arbitrary snapshots at once (atomically)
"zfs destroy" destroys multiple snapshots at once
"zfs recv" has improved performance
Backward Compatibility:
I have extended the compatibility layer to support full backward
compatibility by remapping or rewriting the responsible ioctl arguments.
Old utilities are fully supported by the new kernel module.
Forward Compatibility:
New utilities work with old kernels with the following restrictions:
- creating, destroying, holding and releasing of multiple snapshots
at once is not supported, this includes recursive (-r) commands
Illumos ZFS issues:
2882 implement libzfs_core
2900 "zfs snapshot" should be able to create multiple,
arbitrary snapshots at once
3464 zfs synctask code needs restructuring
MFC r248976:
Call dmu_snapshot_list_next() in zvol.c with dsl_pool_config lock held
MFC r249004:
Do not check against uninitialized rc and comment out vendor code
MFC r249042:
Fix possible pool hold leak in dmu_send_impl()
Illumos ZFS issues:
3645 dmu_send_impl: possibilty of pool hold leak
MFC r249047 (avg):
spa_open_common: fix argument to zvol_create_minors
Prior to r248571 spa_open was always called with a bare pool name,
but now it is called with a dataset name instead (spa_lookup handles
that).
So, when a ZFS root is mounted spa_open is called with a name of a root
dataset, which can very well be different from the pool name.
But zvol_create_minors should be called with the pool name, because it
performs a recursive traversal of all datasets under the name to find
all those that are volumes.
MFC r249188:
Import vendor change to reduce diff, no effect on FreeBSD.
Illumos ZFS issues:
3517 importing pool with autoreplace=on and "hole" vdevs crashes
syseventd
MFC r249195:
Merge change from vendor to reduce diff only.
ZFS dtrace probes are not supported on FreeBSD yet.
Illumos ZFS issues:
3598 want to dtrace when errors are generated in zfs
MFC r249196:
Provide a fix for kernel panic if receiving recursive deduplicated
streams. Problem reported to vendor.
Illumos ZFS issues:
3692 Panic on zfs receive of a recursive deduplicated stream
MFC r249206:
Merge vendor change - modify time processing in deadman thread.
Illumos ZFS issues:
3618 ::zio dcmd does not show timestamp data
MFC r249207:
Allow zdb to output a histogram of compressed block sizes.
Illumos ZFS issues:
3641 want a histogram of compressed block sizes
MFC r249319:
ZFS expects a copyout of zfs_cmd_t on an ioctl error. Our sys_ioctl()
doesn't copyout in this case.
To solve this a new struct zfs_iocparm_t is introduced consisting of:
- zfs_ioctl_version (future backwards compatibility purposes)
- user space pointer to zfs_cmd_t (copyin and copyout)
- size of zfs_cmd_t (verification purposes)
The copyin and copyout of zfs_cmd_t is now done the illumos (vendor) way
what makes porting of new changes easier and ensures correct behavior if
returning an error.
MFC r249326:
Cast (void *)(uintptr_t) on copyout and copyin of zfs_iocparm_t.zfs_cmd
MFC r249356:
Merge bugfixes accepted and integrated by vendor. Underlying problems
have been reported by us and fixed in r240942 and r249196.
Illumos ZFS issues:
3645 dmu_send_impl: possibilty of pool hold leak
3692 Panic on zfs receive of a recursive deduplicated stream
MFC r249357:
Fix libzfs to report error instead of returning zero if trying to hold or
release a non-existing snapshot of a existing dataset. In recursive case
error is reported if no snapshots with the requested name have been found.
Illumos ZFS issues:
3699 zfs hold or release of a non-existent snapshot does not output
error
MFC r249787:
The zfs synctask code restructuring introduced a new bug that makes it
impossible to set quota and reservation on pools lower than version 22.
Problem has been reported and a solution discussed with vendor.
Illumos ZFS issues:
3739 cannot set zfs quota or reservation on pool version < 22
MFC r249883:
Respect the enoent_ok flag if reporting error for holding an non-existing
snapshot.
Related illumos ZFS issue:
3699 zfs hold or release of a non-existent snapshot does not output error
MFC r249858:
Merge vendor bugfix for a possible deadlock related to async destroy
and improve write performance by introducing a new lock protecting
tx_open_txg.
Illumos ZFS issues:
3642 dsl_scan_active() should not issue I/O to determine if async
destroying is active
3643 txg_delay should not hold the tc_lock
jhb [Fri, 14 Jun 2013 18:42:08 +0000 (18:42 +0000)]
MFC 249767:
- Some BIOSes use an Extended IRQ resource descriptor in _PRS for a link
that uses non-ISA IRQs but use a plain IRQ resource in _CRS. However,
a non-ISA IRQ can't fit into a plain IRQ resource. If we encounter a
link like this, build the resource buffer from _PRS instead of _CRS.
- Set the correct size of the end tag in a resource buffer.
rmacklem [Fri, 14 Jun 2013 00:33:55 +0000 (00:33 +0000)]
MFC: r250177
Fix the getpwnam_r() call in the pname_to_uid() kerberos library function so
that it handles the ERANGE error return case. Without this fix, authentication
of users for certain system setups could fail unexpectedly.
rmacklem [Fri, 14 Jun 2013 00:30:11 +0000 (00:30 +0000)]
MFC: r250176
Fix the getpwuid_r() call in the gssd daemon so that it handles
the ERANGE error return case. Without this fix, authentication
of users for certain system setups could fail unexpectedly.
rmacklem [Fri, 14 Jun 2013 00:02:29 +0000 (00:02 +0000)]
MFC: r251089
Add a patch analygous to r248567, r248581, r251079 to the
old NFS client to avoid the panic reported in the PR by
doing the vnode_pager_setsize() call after unlocking the mutex.