The problem here is that if we ever end up in the error
path, we drop the locks protecting access to the zfsvfs_t
prior to forcibly unmounting the filesystem. Because z_os
is NULL, any thread that had already picked up the zfsvfs_t
and was sitting in ZFS_ENTER() when we dropped our locks
in zfs_resume_fs() will now acquire the lock, attempt to
use z_os, and panic.
Illumos ZFS issues:
3875 panic in zfs_root() after failed rollback
Check for ipmi_attached in ipmi_isa_probe as a suggested alternative to
ipmi_isa_attach. This keeps unintended but harmless noise about "ipmi1"
from appearing in the boot up sequence.
Submitted by: jbh@ (suggested by)
Sponsored by: Yahoo! Inc.
empirical testing showed that 3 seconds is just too slow for GET_DEVICE_ID
to return on newer Dell hardware. Bump to 6 second timeouts until someone
has a better idea on how to handle this
Allow three IOCTLs to be used on suspended pool, restoring state that
existed before IOCTL code refactoring merged change 4445fffb from illumos
at r248571.
This change allows `zpool clear` to be used again to recover suspended pool.
It seems the only was supposed by the code to restore pool operation after
reconnecting lost disks that were required for data completeness. There
are still cases where `zpool clear` command can just safely stuck due to
deadlocks inside ZFS kernel part, but probably that is better then having
no chances to recover at all.
Add NO_RC16 quirk to make da driver avoid using READ CAPACITY(16) command
if possible. Use it for Kingston JetFlash USB sticks, that are known to
return garbage in response to that command.
dim [Tue, 30 Jul 2013 12:33:21 +0000 (12:33 +0000)]
Pull in r186696 from upstream clang trunk:
This patch implements __get_cpuid_max() as an inline and __cpuid()
and __cpuid_count() as macros to be compatible with GCC's cpuid.h.
It also adds bit_<foo> constants for the various feature bits as
described in version 039 (May 2011) of Intel's SDM Volume 2 in the
description of the CPUID instruction. The list of bit_<foo>
constants is a bit exhaustive (GCC doesn't do near this many). More
bits could be added from a newer version of SDM if desired.
Patch by John Baldwin!
This should fix several ports which depend on this functionality being
available.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.
* random(4) device doesn't really depend on rijndael-*. Yarrow, however, does.
* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
random_adaptor is basically an adapter that plugs in to random(4).
random_adaptor can only be plugged in to random(4) very early in bootup.
Unplugging random_adaptor from random(4) is not supported, and is probably a
bad idea anyway, due to potential loss of entropy pools.
We currently have 3 random_adaptors:
+ yarrow
+ rdrand (ivy.c)
+ nehemeiah
* Remove platform dependent logic from probe.c, and move it into
corresponding registration routines of each random_adaptor provider.
probe.c doesn't do anything other than picking a specific random_adaptor
from a list of registered ones.
* If the kernel doesn't have any random_adaptor adapters present then the
creation of /dev/random is postponed until next random_adaptor is kldload'ed.
* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
system wide one.
* Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option.
The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow.
* random(4) device doesn't really depend on rijndael-*. Yarrow, however, does.
* Add random_adaptors.[ch] which is basically a store of random_adaptor's.
random_adaptor is basically an adapter that plugs in to random(4).
random_adaptor can only be plugged in to random(4) very early in bootup.
Unplugging random_adaptor from random(4) is not supported, and is probably a
bad idea anyway, due to potential loss of entropy pools.
We currently have 3 random_adaptors:
+ yarrow
+ rdrand (ivy.c)
+ nehemeiah
* Remove platform dependent logic from probe.c, and move it into
corresponding registration routines of each random_adaptor provider.
probe.c doesn't do anything other than picking a specific random_adaptor
from a list of registered ones.
* If the kernel doesn't have any random_adaptor adapters present then the
creation of /dev/random is postponed until next random_adaptor is kldload'ed.
* Fix randomdev_soft.c to refer to its own random_adaptor, instead of a
system wide one.
Various fixes to the mlxen(4) driver:
- Remove an incorrect assertion that can trigger when downing an interface.
- Stop the interface during detach to avoid panics when unloading the
driver.
- A few locking fixes to be more consistent with other FreeBSD drivers:
- Protect if_drv_flags with the driver lock, not atomic ops
- Hold the driver lock when adjusting multicast state.
- Hold the driver lock while adjusting if_capenable.
PR: kern/180791 [1,2]
Submitted by: Shakar Klein @ Mellanox [1,2]
MFC after: 3 days
Partially close race between calls of orphan() method from GEOM and close()
method from ZFS core, that reliably causes use-after-free panic if SSD vdev
detached during inititial erase.
Fix returning incorrect bio_resid value with failed BIO_DELETE requests.
Neither residual length reported for ATA/SCSI command nor one from another
BIO_DELETE request are in any way related to the value to be returned.
- Relax the restriction on the member interfaces with LLAs. Two or more
LLAs on the member interfaces are actually harmless when the parent
interface does not have a LLA.
- Add net.link.bridge.allow_llz_overlap. This is a knob to allow LLAs on
a bridge and the member interfaces at the same time. The default is 0.
marius [Sun, 28 Jul 2013 12:29:10 +0000 (12:29 +0000)]
- Add const-qualifiers to the arguments of isonum_*().
- According to ISO 9660 7.1.2, isonum_712() should return a signed value.
- Try to get isonum_*() closer to style(9).
When creation of the v_pollinfo raced and our instance of vpollinfo
must be destroyed, knlist_clear() and seldrain() calls could be
avoided, since vpollinfo was not used. More, the knlist_clear()
calling protocol requires the knlist locked, which is not true at the
call site.
Split the destruction into the helper destroy_vpollinfo_free(), and
call it when raced, instead of destroy_vpollinfo().
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
adrian [Sun, 28 Jul 2013 04:53:00 +0000 (04:53 +0000)]
Refactor the VAP transmit path code into a utility function that both
the normal and the mesh transmit paths can use.
The API is a bit horrible because it both consumes the mbuf and frees
the node reference regardless of whether it succeeds or not.
It's a hold-over from how the code behaves; it'd be nice to have it
not free the node reference / mbuf if TX fails and let the caller
decide what to do.
DTrace: re-apply r249426 now that the underlying issues have been solved.
Merge change from illumos:
3519 DTrace fails to resolve const types from fbt
3520 dtrace internal error -- token type 316 is not a valid D
compilation token
3521 clean up dtrace unit tests
DTrace: re-merge remainder of r249367 (original from Illumos).
Bring back some important fixes from Illumos:
3022 DTrace: keys should not affect the sort order when sorting by value
3023 it should be possible to dereference dynamic variables
3024 D integer narrowing needs some work
We particularly avoid the LD_NOLAZYLOAD changes that Illumos made
as those don't apply to FreeBSD and were causing problems in
interactive mode.
Synchronize device cache on close only if there were some write operations.
While these operations are not really needed otherwise, at least for SCSI
they may cause extra errors if some other initiator holds write exclusive
reservation on the LUN (SYNCHRONIZE CACHE handled as "write" operation).
Use kern_ioctl() rather than ioctl() for testing the FBT provider, since the
latter doesn't exist in FreeBSD. All the tests under fbtprovider pass now.
At some point after stable/7 the ACPI and ISA interfaces to the IPMI controller
no longer have the parent in the device tree. This causes the identify
function in ipmi_isa.c to attempt to probe and poke at the ISA IPMI interface
Move the check for ipmi_attached out of the ipmi_isa_attach function and into
the ipmi_isa_identify function. Remove the check of the device tree for
ipmi devices attached.
This probing appears to make Broadcom management firmware on Dell machines
crash and emit NMI EISA warnings at various times requiring power cycles
of the machines to restore.
Bump MAX_TIMEOUT to 6 seconds as a hack for super slow IPMI interfaces that
need longer to respond to our intial probes on startup.
Tested on Dell R410, R510, R815, HP DL160G6
This is MFC candidate for 9.2R
Reviewed by: peter
MFC after: 2 weeks
Sponsored by: Yahoo! Inc.
marius [Sat, 27 Jul 2013 15:28:31 +0000 (15:28 +0000)]
- Set the System Identifier in the Primary Volume Descriptor to FreeBSD
rather than NetBSD.
- Correctly set the Expiration Time in the Primary Volume Descriptor;
according to ISO 9660 8.4.26.1 unspecified date and time are denoted
by the digit 0 in RBP 1 to 16 but the number 0 in RBP 17. [1]
- Merge iso9660_rrip.c rev. 1.11 from NetBSD: name_len should be read
as unsigned byte. [2]
Note: This is according to ISO 9660 9.1.10.
- Rock Ridge TF entries should use a length of 5, because after the 4
bytes of generic SUSP header there is one byte of flags. See typedef
of ISO_RRIP_TF in iso9660_rrip.h. [1]
Submitted by: Thomas Schmitt [1]
Obtained from: NetBSD [2]
MFC after: 3 days
Introduce 3 seconds timeout on `graid stop` command (mostly with -f flag).
Since completion waiting goes in g_event thread, it may cause GEOM deadlock
if consumer on top (for example, ZFS) uses g_event thread for closing.
jeff [Fri, 26 Jul 2013 23:22:05 +0000 (23:22 +0000)]
Improve page LRU quality and simplify the logic.
- Don't short-circuit aging tests for unmapped objects. This biases
against unmapped file pages and transient mappings.
- Always honor PGA_REFERENCED. We can now use this after soft busying
to lazily restart the LRU.
- Don't transition directly from active to cached bypassing the inactive
queue. This frees recently used data much too early.
- Rename actcount to act_delta to be more consistent with use and meaning.
Add support for packet-sniffing tracers to cxgbe(4). This works with
all T4 and T5 based cards and is useful for analyzing TSO, LRO, TOE, and
for general purpose monitoring without tapping any cxgbe or cxl ifnet
directly.
Tracers on the T4/T5 chips provide access to Ethernet frames exactly as
they were received from or transmitted on the wire. On transmit, a
tracer will capture a frame after TSO segmentation, hw VLAN tag
insertion, hw L3 & L4 checksum insertion, etc. It will also capture
frames generated by the TCP offload engine (TOE traffic is normally
invisible to the kernel). On receive, a tracer will capture a frame
before hw VLAN extraction, runt filtering, other badness filtering,
before the steering/drop/L2-rewrite filters or the TOE have had a go at
it, and of course before sw LRO in the driver.
There are 4 tracers on a chip. A tracer can trace only in one direction
(tx or rx). For now cxgbetool will set up tracers to capture the first
128B of every transmitted or received frame on a given port. This is a
small subset of what the hardware can do. A pseudo ifnet with the same
name as the nexus driver (t4nex0 or t5nex0) will be created for tracing.
The data delivered to this ifnet is an additional copy made inside the
chip. Normal delivery to cxgbe<n> or cxl<n> will be made as usual.
/* watch cxl0, which is the first port hanging off t5nex0. */
# cxgbetool t5nex0 tracer 0 tx0 (watch what cxl0 is transmitting)
# cxgbetool t5nex0 tracer 1 rx0 (watch what cxl0 is receiving)
# cxgbetool t5nex0 tracer list
# tcpdump -i t5nex0 <== all that cxl0 sees and puts on the wire
If you were doing TSO, a tcpdump on cxl0 may have shown you ~64K
"frames" with no L3/L4 checksum but this will show you the frames that
were actually transmitted.
adrian [Fri, 26 Jul 2013 19:41:13 +0000 (19:41 +0000)]
Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.
* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1
tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.
jeff [Fri, 26 Jul 2013 19:06:14 +0000 (19:06 +0000)]
- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
uses of kmem_malloc().
make path matching in devfs rules consistent and sane (and safer)
Before this change path matching had the following features:
- for device nodes the patterns were matched against full path
- in the above case '/' in a path could be matched by a wildcard
- for directories and links only the last component was matched
So, for example, a pattern like 're*' could match the following entries:
- re0 device
- responder/u0 device
- zvol/recpool directory
Although it was possible to work around this behavior (once it was spotted
and understood), it was very confusing and contrary to documentation.
Now we always match a full path for all types of devfs entries (devices,
directories, links) and a '/' has to be matched explicitly.
This behavior follows the shell globbing rules.
This change is originally developed by Jaakko Heinonen.
Many thanks!
marius [Fri, 26 Jul 2013 14:23:25 +0000 (14:23 +0000)]
- Once we have shifted arguments thrice, base-bits-dir is $1 rather than $4.
Introduce $BASEBITSDIR for clarity and in order to avoid repeating this
mistake in the future. Fixing this ensures that we pick up the newly built
boot code and loader native to the target, which is especially relevant
when cross-building release images.
- It is pointless to specify an endianess for ISO 9660 images so strip that.
marius [Fri, 26 Jul 2013 14:22:03 +0000 (14:22 +0000)]
Ensure that makefs.h is included when using ufs_bswap.h so the FFS_EI macro
is picked up when defined. Previously, ffs_subr.c was always built without
support for opposite endianess as it doesn't include makefs.h on its own.
Assume that all Apple products using interface class 255, subclass 253
and protocol 1 are USB ethernet adapters. This avoids keeping and updating
the product list every now and then. This patch will add support for the
USB ethernet interface found in the IPAD.
GCC can generate bogus dwarf attributes with DW_AT_byte_size
set to 0xFFFFFFFF.
The issue was originaly detected in NetBSD but it has been
adapted for portability and to avoid compiler warnings.
Enhance the description of NOTE_TRACK:
- NOTE_TRACK has never triggered a NOTE_TRACK event from the parent pid.
If NOTE_FORK is set, the listener will get a NOTE_FORK event from
the parent pid, but not a separate NOTE_TRACK event.
- Explicitly note that the event added to monitor the child process
preserves the fflags from the original event.
- Move the description of NOTE_TRACKERR under NOTE_TRACK as it is not a
bit for the user to set (which is what this list pupports to be).
Also, explicitly note that if an error occurs, the NOTE_CHILD event
will not be generated.