yongari [Sun, 30 Jun 2013 05:12:18 +0000 (05:12 +0000)]
Fix triggering false watchdog timeout when controller is in PAUSE
state. Previously it used to check if controller has sent a
PAUSE frame to the remote peer.
Reported by: David Imhoff via Brad Smith <brad@OpenBSD.org>
Submitted by: davidch (initial version)
Reviewed by: davidch, David Imhoff <dimhoff_devel@xs4all.nl>
gonzo [Sat, 29 Jun 2013 23:51:17 +0000 (23:51 +0000)]
- Fix IMAPx registers values calculation
- Initialize SMAPx registers too although they're unused in QEMU
- Do not pass IO/MEM resources to upper bus for activation, handle them locally.
Previously ACTIVATE method of upper bus was no-op so nothing bad
happened. But now FDT maps physaddr to vaddr and it causes
troubles: fdtbus_activate_resource resource assumes that
bustag/bushandle are already set which in this case is wrong.
delphij [Sat, 29 Jun 2013 22:04:04 +0000 (22:04 +0000)]
- Modify swapon(8) so that it uses most of geli(8) defaults for swap,
which is presently: AES-XTS, no authentication. Create provider
with pagesize as sectorsize by default.
- Rewrite parsing code for geli(8)-backed swap options, now options
are required to be exact match, and unrecognized options will trigger
a warning.
- Don't initialize GELI device if it's already initialized. This
restores previous behavior.
- Don't duplicate file descriptor when working with geli(8) and
gbde(8) as there is no need to communicate with the utility other
than exit status.
- When calling swap_on_off_* routines, which_prog can only be SWAP_ON
or SWAP_OFF. Eliminate unneeded case branches by replacing switch
with if's.
- Plug a few memory leaks.
adrian [Sat, 29 Jun 2013 19:57:57 +0000 (19:57 +0000)]
Don't log anything if npkts == 0.
This occurs at RX DMA start, even though the RX FIFO has plenty of
space. I'll go figure out why, but this shouldn't cause people to
be spammed by these messages.
scottl [Sat, 29 Jun 2013 17:48:59 +0000 (17:48 +0000)]
Introduce accessors for the ccb status word. Convert one (of many more)
modules to use it, will convert the others once the appropriate shed
color is selected by consensus.
pfg [Sat, 29 Jun 2013 01:35:28 +0000 (01:35 +0000)]
Bring some updates from ufs_lookup to ext2fs.
r156418:
Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems. This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).
r183079:
- Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.
r187528:
Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.
davide [Fri, 28 Jun 2013 21:00:08 +0000 (21:00 +0000)]
- Trim an unused and bogus Makefile for mount_smbfs.
- Reconnect with some minor modifications, in particular now selsocket()
internals are adapted to use sbintime units after recent'ish calloutng
switch.
davide [Fri, 28 Jun 2013 20:32:48 +0000 (20:32 +0000)]
Properly use v_data field. This magically worked (even if wrong) until
now because v_data is the first field of the structure, but it's not
something we should rely on.
trociny [Fri, 28 Jun 2013 18:07:41 +0000 (18:07 +0000)]
Rework r252313:
The filedesc lock may not be dropped unconditionally before exporting
fd to sbuf: fd might go away during execution. While it is ok for
DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode
here, for other types it is unsafe.
Instead, drop the lock in export_fd_to_sb(), after preparing data in
memory and before writing to sbuf.
jhb [Fri, 28 Jun 2013 16:33:45 +0000 (16:33 +0000)]
Make a pass over this page to correct and clarify a few things as well as
some general word-smithing.
- Don't claim that adaptive mutexes have a timeout (they don't).
- Don't treat pool mutexes as a separate primitive in a few places.
- Describe sleepable read-mostly locks as a separate lock type and add
them to the various tables.
- Don't claim that sx locks are less efficient. That hasn't been true in
a few years now.
- Describe lockmanager locks next to sx locks since they are very similar
in terms of rules, etc., and so that all the lock primitives are
grouped together before the non-lock primitives.
- Similarly, move the section on Giant after the description of all the
non-lock primitives to preserve grouping.
- Condition variables work on several types of locks, not just mutexes.
- Add a bit of language to compare/contrast condition variables with
sleep/wakeup.
- Add a note about why pause(9) is unique.
- Add some language to define bounded vs unbounded sleeps and explain
why they are treated separately (bounded sleeps only need CPU time
to make forward progress).
- Don't state that using mtx_sleep() is a bad idea. It is in fact rather
necessary.
- Rework the interaction table a bit. First, it did not include really
include sleepable rmlocks and it left out lockmgr entirely. To get
things to fit, combine similar lock types into the same column / row,
and explicitly state what "sleep" means. The notes about recursion
and lock order were also a bit banal (lock order is always important,
not just in the few places annotated here), so remove them. In
particular, the lock order note would need to be on just about every
cell. If we want to document recursion I think a better approach
would be a separate table summarizing the recursion rules for each
lock as having too many notes clutters the table.
- Tweak the tables to use less indentation so everything still fits with
the added columns.
- Correct a few cells in the context mode table.
- Use mdoc markup instead of explicit markup in a few places.
rstone [Fri, 28 Jun 2013 15:55:30 +0000 (15:55 +0000)]
Correct a bug that prevented deadlkres from (almost) ever firing.
deadlkres was using a reversed test to check whether ticks had rolled over.
This meant that deadlkres could only fire after ticks had rolled over.
This test was actually unnecessary as deadlkres only ever took the
difference of ticks values which is safe even in the presence of ticks
rollover. Remove the tests entirely. Now deadlkres will properly fire
after a lock has been held after the timeout period.
grehan [Fri, 28 Jun 2013 06:05:33 +0000 (06:05 +0000)]
Make sure all CPUID values are handled, instead of exiting the
bhyve process when an unhandled one is encountered.
Hide some additional capabilities from the guest (e.g. debug store).
This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.
Submitted by: tycho nightingale at pluribusnetworks com
Reviewed by: neel
Reported by: many
jeff [Fri, 28 Jun 2013 03:51:20 +0000 (03:51 +0000)]
- Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.
markj [Fri, 28 Jun 2013 03:14:40 +0000 (03:14 +0000)]
The dtmalloc provider uses the short description of a malloc type as the
function name of its corresponding DTrace probes. These descriptions may
contain whitespace, but probe names cannot, so just replace any whitespace
with underscores when creating probes.
andrew [Thu, 27 Jun 2013 22:26:56 +0000 (22:26 +0000)]
Support reading registers r0-r3 when unwinding. There is a seperate
instruction to load these. We only hit it when unwinding past an trap frame
as in C r0-r3 would never have been saved onto the stack.
jhb [Thu, 27 Jun 2013 20:21:54 +0000 (20:21 +0000)]
Make detaching drivers from PCI devices more robust. While here, fix a
bug where a PCI device would be powered down if it failed to probe, but
not when its driver was detached (e.g. via kldunload).
- Add a new helper method resource_list_release_active() which forcefully
releases any active resources of a specified type from a resource list.
- Add a bus_child_detached method for the PCI bus driver which forces any
active resources to be released (and whines to the console if it finds
any) and then powers the device down.
- Call pci_child_detached() if we fail to probe a device when a driver
is kldloaded. This isn't perfect but can avoid leaking resources
from a probe() routine in the kldload case.
hrs [Thu, 27 Jun 2013 18:28:45 +0000 (18:28 +0000)]
- Add vnode-backed swap space specification support. This is enabled when
device names "md" or "md[0-9]*" and a "file" option are specified in
/etc/fstab like this:
md none swap sw,file=/swap.bin 0 0
- Add GBDE/GELI encrypted swap space specification support, which
rc.d/encswap supported. The /etc/fstab lines are like the following:
.eli devices accepts aalgo, ealgo, keylen, and sectorsize as options.
swapctl(8) can understand an encrypted device in the command line
like this:
# swapctl -a /dev/ada2p1.bde
- "-L" flag is added to support "late" option to defer swapon until
rc.d/mountlate runs.
- rc.d script change:
rc.d/encswap -> removed
rc.d/addswap -> just display a warning message if $swapfile is defined
rc.d/swap1 -> renamed to rc.d/swap
rc.d/swaplate -> newly added to support "late" option
These changes alleviate a race condition between device creation/removal
and swapon/swapoff.
jimharris [Thu, 27 Jun 2013 00:08:25 +0000 (00:08 +0000)]
Add firmware replacement and activation support to nvmecontrol(8) through
a new firmware command.
NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions. This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image. It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.
Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.
Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days
jimharris [Wed, 26 Jun 2013 23:53:54 +0000 (23:53 +0000)]
Add log page support to nvmecontrol(8) through a new logpage command.
This includes pretty printers for all of the standard NVMe log pages
(Error, SMART/Health, Firmware), as well as hex output for non-standard
or vendor-specific log pages.
Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days
jimharris [Wed, 26 Jun 2013 23:32:45 +0000 (23:32 +0000)]
Fail any passthrough command whose transfer size exceeds the controller's
max transfer size. This guards against rogue commands coming in from
userspace.
Also add KASSERTS for the virtual address and unmapped bio cases, if the
transfer size exceeds the controller's max transfer size.
jimharris [Wed, 26 Jun 2013 23:27:17 +0000 (23:27 +0000)]
Use MAXPHYS to specify the maximum I/O size for nvme(4).
Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed. The larger I/O size is very
beneficial for upcoming firmware download support. This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.
jimharris [Wed, 26 Jun 2013 23:20:08 +0000 (23:20 +0000)]
Create #defines for NVME_CTRLR_PREFIX and NVME_NS_PREFIX for the "nvme"
and "ns" strings, rather than hardcoding the string values throughout the
nvmecontrol code base.
jimharris [Wed, 26 Jun 2013 23:11:20 +0000 (23:11 +0000)]
Add an nvme_function structure array, defining the name, C function and
usage message for each nvmecontrol command. This helps reduce some code
clutter both now and for future commits which will add logpage and
firmware support to nvmecontrol(8).
Also move helper function prototypes to the end of the header file, after
the per-command functions.
jimharris [Wed, 26 Jun 2013 22:08:45 +0000 (22:08 +0000)]
For ATA_PASSTHROUGH commands, pretend isci(4) supports multiword DMA
by treating it as UDMA.
This fixes a problem introduced in r249933/r249939, where CAM sends
ATA_DSM_TRIM to SATA devices using ATA_PASSTHROUGH_16. scsi_ata_trim()
sets protocol as DMA (not UDMA) which is for multi-word DMA, even
though no such mode is selected for the device. isci(4) would fail
these commands which is the correct behavior but not consistent with
other HBAs, namely LSI's.
smh@ did some further testing on an LSI controller, which rejected
ATA_PASSTHROUGH_16 commands with mode=UDMA_OUT, even though only
a UDMA mode was selected on the device. So this precludes adding
any kind of mode detection in CAM to determine which mode to use on
a per-device basis.
gibbs [Wed, 26 Jun 2013 20:39:07 +0000 (20:39 +0000)]
In the Xen block front driver, take advantage of backends that
support cache flush and write barrier commands.
sys/dev/xen/blkfront/block.h:
Add per-command flag that specifies that the I/O queue must
be frozen after this command is dispatched. This is used
to implement "single-stepping".
Remove the unused per-command flag that indicates a polled
command.
Add block device instance flags to record backend features.
Add a block device instance flag to indicate the I/O queue
is frozen until all outstanding I/O completes.
Enhance the queue API to allow the number of elements in a
queue to be interrogated.
Prefer "inline" to "__inline".
sys/dev/xen/blkfront/blkfront.c:
Formalize queue freeze semantics by adding methods for both
global and command-associated queue freezing.
Provide mechanism to freeze the I/O queue until all outstanding
I/O completes. Use this to implement barrier semantics
(BIO_ORDERED) when the backend does not support
BLKIF_OP_WRITE_BARRIER commands.
Implement BIO_FLUSH as either a BLKIF_OP_FLUSH_DISKCACHE
command or a 0 byte write barrier. Currently, all publicly
available backends perform a diskcache flush when processing
barrier commands, and this frontend behavior matches what
is done in Linux.
Simplify code by using new queue length API.
Report backend features during device attach and via sysctl.