jimharris [Tue, 26 Mar 2013 21:00:18 +0000 (21:00 +0000)]
Create struct nvme_status.
NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.
While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.
jimharris [Tue, 26 Mar 2013 20:32:57 +0000 (20:32 +0000)]
By default, always escalate to controller reset when an I/O times out.
While aborts are typically cleaner than a full controller reset, many times
an I/O timeout indicates other controller-level issues where aborts may not
work. NVMe drivers for other operating systems are also defaulting to
controller reset rather than aborts for timed out I/O.
emaste [Tue, 26 Mar 2013 20:32:46 +0000 (20:32 +0000)]
Always define and use PROGNAME
This avoids having separate cases in the install rule for PROGNAME set and
not set. This is a minor cleanup in advance of further support for
standalone debug files.
adrian [Tue, 26 Mar 2013 20:04:45 +0000 (20:04 +0000)]
Implement the replacement EDMA FIFO code.
(Yes, the previous code temporarily broke EDMA TX. I'm sorry; I should've
actually setup ATH_BUF_FIFOEND on frames so txq->axq_fifo_depth was
cleared!)
This code implements a whole bunch of sorely needed EDMA TX improvements
along with CABQ TX support.
The specifics:
* When filling/refilling the FIFO, use the new TXQ staging queue
for FIFO frames
* Tag frames with ATH_BUF_FIFOPTR and ATH_BUF_FIFOEND correctly.
For now the non-CABQ transmit path pushes one frame into the TXQ
staging queue without setting up the intermediary link pointers
to chain them together, so draining frames from the txq staging
queue to the FIFO queue occurs AMPDU / MPDU at a time.
* In the CABQ case, manually tag the list with ATH_BUF_FIFOPTR and
ATH_BUF_FIFOEND so a chain of frames is pushed into the FIFO
at once.
* Now that frames are in a FIFO pending queue, we can top up the
FIFO after completing a single frame. This means we can keep
it filled rather than waiting for it drain and _then_ adding
more frames.
* The EDMA restart routine now walks the FIFO queue in the TXQ
rather than the pending queue and re-initialises the FIFO with
that.
* When restarting EDMA, we may have partially completed sending
a list. So stamp the first frame that we see in a list with
ATH_BUF_FIFOPTR and push _that_ into the hardware.
* When completing frames, only check those on the FIFO queue.
We should never ever queue frames from the pending queue
direct to the hardware, so there's no point in checking.
* Until I figure out what's going on, make sure if the TXSTATUS
for an empty queue pops up, complain loudly and continue.
This will stop the panics that people are seeing. I'll add
some code later which will assist in ensuring I'm populating
each descriptor with the correct queue ID.
* When considering whether to queue frames to the hardware queue
directly or software queue frames, make sure the depth of
the FIFO is taken into account now.
* When completing frames, tag them with ATH_BUF_BUSY if they're
not the final frame in a FIFO list. The same holding descriptor
behaviour is required when handling descriptors linked together
with a link pointer as the hardware will re-read the previous
descriptor to refresh the link pointer before contiuning.
* .. and if we complete the FIFO list (ie, the buffer has
ATH_BUF_FIFOEND set), then we don't need the holding buffer
any longer. Thus, free it.
Tested:
* AR9380/AR9580, STA and hostap
* AR9280, STA/hostap
TODO:
* I don't yet trust that the EDMA restart routine is totally correct
in all circumstances. I'll continue to thrash this out under heavy
multiple-TXQ traffic load and fix whatever pops up.
jimharris [Tue, 26 Mar 2013 19:58:17 +0000 (19:58 +0000)]
Add handling for controller fatal status (csts.cfs).
On any I/O timeout, check for csts.cfs==1. If set, the controller
is reporting fatal status and we reset the controller immediately,
rather than trying to abort the timed out command.
This changeset also includes deferring the controller start portion
of the reset to a separate task. This ensures we are always performing
a controller start operation from a consistent context.
jimharris [Tue, 26 Mar 2013 19:50:46 +0000 (19:50 +0000)]
Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).
Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.
adrian [Tue, 26 Mar 2013 19:46:51 +0000 (19:46 +0000)]
Add per-TXQ EDMA FIFO staging queue support.
Each set of frames pushed into a FIFO is represented by a list of
ath_bufs - the first ath_buf in the FIFO list is marked with
ATH_BUF_FIFOPTR; the last ath_buf in the FIFO list is marked with
ATH_BUF_FIFOEND.
Multiple lists of frames are just glued together in the TAILQ as per
normal - except that at the end of a FIFO list, the descriptor link
pointer will be NULL and it'll be tagged with ATH_BUF_FIFOEND.
For non-EDMA chipsets this is a no-op - the ath_txq frame list (axq_q)
stays the same and is treated the same.
For EDMA chipsets the frames are pushed into axq_q and then when
the FIFO is to be (re) filled, frames will be moved onto the FIFO
queue and then pushed into the FIFO.
So:
* Add a new queue in each hardware TXQ (ath_txq) for staging FIFO frame
lists. It's a TAILQ (like the normal hardware frame queue) rather than
the ath9k list-of-lists to represent FIFO entries.
* Add new ath_buf flags - ATH_TX_FIFOPTR and ATH_TX_FIFOEND.
* When allocating ath_buf entries, clear out the flag value before
returning it or it'll end up having stale flags.
* When cloning ath_buf entries, only clone ATH_BUF_MGMT. Don't clone
the FIFO related flags.
* Extend ath_tx_draintxq() to first drain the FIFO staging queue, _then_
drain the normal hardware queue.
Tested:
* AR9280, hostap
* AR9280, STA
* AR9380/AR9580 - hostap
markj [Tue, 26 Mar 2013 19:43:18 +0000 (19:43 +0000)]
Invert the meaning of -S (added in r247405) and document its meaning. Also,
don't carp about the watchdog command taking too long until after the
watchdog has been patted, and don't carp via warnx(3) unless -S is set
since syslog(3) already logs to standard error otherwise.
Discussed with: alfred
Reviewed by: alfred
Approved by: emaste (co-mentor)
markj [Tue, 26 Mar 2013 18:46:40 +0000 (18:46 +0000)]
Make sure to set OBJS consistently in the cases where SRCS is and isn't
already defined. Setting it with "+=" makes it possible for other make
scripts (e.g. bsd.dtrace.mk) to include additional object files in the
linker arguments.
jimharris [Tue, 26 Mar 2013 18:37:36 +0000 (18:37 +0000)]
Enable asynchronous event requests on non-Chatham devices.
Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.
jimharris [Tue, 26 Mar 2013 18:27:22 +0000 (18:27 +0000)]
Break out the code for completing an nvme_tracker object into a separate
function.
This allows for completions outside the normal completion path, for example
when an ABORT command fails due to the controller reporting the targeted
command does not exist. This is mainly for protection against a faulty
controller, but we need to clean up our internal request nonetheless.
alc [Tue, 26 Mar 2013 17:30:40 +0000 (17:30 +0000)]
Introduce vm_radix_isleaf() and use it in a couple places. As compared to
using vm_radix_node_page() == NULL, the compiler is able to generate one
less conditional branch when vm_radix_isleaf() is used. More use cases
involving the inner loops of vm_radix_insert(), vm_radix_lookup{,_ge,_le}(),
and vm_radix_remove() will follow.
mav [Tue, 26 Mar 2013 07:55:24 +0000 (07:55 +0000)]
geom_slice.c and its consumers like GEOM_LABEL are not touching the data
unless hotspots are used. Pass G_PF_ACCEPT_UNMAPPED flag through except
such rare cases (obsolete GEOM_SUNLABEL and GEOM_BSD).
adrian [Tue, 26 Mar 2013 04:56:54 +0000 (04:56 +0000)]
Remove the mcast path calls to ath_hal_gettxdesclinkptr() for axq_link -
they're no longer needed for the legacy path and they're not wanted
for the EDMA path.
* This code only really gets called when burst beacons are used;
it glues multiple CABQ queues together when sending to the hardware.
* More thorough bursted beacon testing! (first requires some work with
the beacon queue code for bursted beacons, as that currently uses the
link pointer and will fail on EDMA chips.)
kan [Tue, 26 Mar 2013 01:17:06 +0000 (01:17 +0000)]
Do not pass unmapped buffers to drivers that cannot handle them
In physio, check if device can handle unmapped IO and pass an
appropriately mapped buffer to the driver strategy routine. The
only driver in the tree that can handle unmapped buffers is one
exposed by GEOM, so mark it as such with the new flag in the
driver cdevsw structure.
This fixes insta-panics on hosts, running dconschat, as /dev/fwmem
is an example of the driver that makes use of physio routine, but
bypasses the g_down thread, where the buffer gets mapped normally.
pfg [Mon, 25 Mar 2013 20:38:09 +0000 (20:38 +0000)]
Dtrace: Add SUN MDB-like type-aware print() action.
Merge change from illumos:
1694 Add type-aware print() action
This is a very nice feature implemented in upstream Dtrace.
A complete description is available here:
http://dtrace.org/blogs/eschrock/2011/10/26/your-mdb-fell-into-my-dtrace/
This change bumps the DT_VERS_* number to 1.9.0 in
accordance to what is done in illumos.
While here also include some minor cleanups to ease further merging
and appease clang with a fix by Fabian Keil.
trociny [Mon, 25 Mar 2013 19:12:36 +0000 (19:12 +0000)]
hrStorageSize and hrStorageUsed are 32 bit integers, reporting a fs
size and usage in hrStorageAllocationUnits. If the file system has
more than 2^31 allocations it can not be shown correctly and the
meters are useless.
In such cases follow net-snmp behaviour and increase
hrStorageAllocationUnits so the values fit under INT_MAX.
melifaro [Mon, 25 Mar 2013 14:30:34 +0000 (14:30 +0000)]
Unlock IPMI sc while performing requests via KCS and SMIC interfaces.
It is already done in SSIF interface code.
This reduces contention/spinning reported by many users.
PR: kern/172166
Submitted by: Eric van Gyzen <eric at vangyzen.net>
MFC after: 2 weeks
mav [Mon, 25 Mar 2013 13:58:17 +0000 (13:58 +0000)]
Read Asynchronous Notification statuses only if Port Multiplier or ATAPI
device are connected. ATA disks are not using ANs, while the extra register
read operation is quite expensive.
davide [Mon, 25 Mar 2013 09:43:50 +0000 (09:43 +0000)]
Cache the callout precision argument as part of the informations required
for migrating callouts to new CPU. This value is passed to
callout_cc_add() in order to update properly precision field in case of
rescheduling/migration.
mav [Mon, 25 Mar 2013 08:50:51 +0000 (08:50 +0000)]
Depending on combination of running commands (NCQ/non-NCQ) try to avoid
extra read from PxCI/PxSACT registers. If only NCQ commands are running, we
don't really need PxCI. If only non-NCQ commands are running we don't need
PxSACT. Mixed set may happen only on controllers with FIS-based switching
when port multiplier is attached, and then we have to read both registers.
mav [Mon, 25 Mar 2013 05:45:24 +0000 (05:45 +0000)]
In GEOM DISK:
- Replace single done mutex with per-disk ones. On system with several
disks on several HBAs that removes small, but measurable lock congestion.
- Modify disk destruction process to not destroy the mutex prematurely.
- Remove some extra pointer derefences.
jilles [Sun, 24 Mar 2013 22:48:45 +0000 (22:48 +0000)]
sh(1): Mention possible ambiguities with $(( and ((.
In some other shells, things like $((a);(b)) are command substitutions.
Also, there are shells that have an extension ((ARITH)) that evaluates an
arithmetic expression and returns status 1 if the result is zero, 0
otherwise. This extension may lead to ambiguity with two subshells starting
in sequence.
mckusick [Sun, 24 Mar 2013 22:37:10 +0000 (22:37 +0000)]
Note that output is in seconds, not msec.
KNF indentation.
No functional change.
No change to printf strings.
No change to casting of printf arguments.
alc [Sun, 24 Mar 2013 16:43:07 +0000 (16:43 +0000)]
Micro-optimize the control flow in a few places. Eliminate a panic call
that could never be reached in vm_radix_insert(). (If the pointer being
checked by the panic call were ever NULL, the immmediately preceding loop
would have already crashed on a NULL pointer dereference.)
mav [Sun, 24 Mar 2013 10:14:25 +0000 (10:14 +0000)]
Fix long known deadlock between geom dev destruction and d_close() call.
Use destroy_dev_sched_cb() to not wait for device destruction while holding
GEOM topology lock (that actually caused deadlock). Use request counting
protected by mutex to properly wait for outstanding requests completion in
cases of device closing and geom destruction. Unlike r227009, this code
does not block taskqueue thread for indefinite time, waiting for completion.
mav [Sun, 24 Mar 2013 03:15:20 +0000 (03:15 +0000)]
Make g_wither_washer() to not loop by itself, but only when there was some
more topology change done that may require its attention. Add few missing
g_do_wither() calls in respective places to signal it.
This fixes potential infinite loop here when some provider is withered, but
still opened or connected for some reason and so can not be destroyed. For
example, see r227009 and r227510.
dim [Sun, 24 Mar 2013 01:35:37 +0000 (01:35 +0000)]
Compile contrib/tzcode/stdtime/localtime.c with -fwrapv, since it relies
on signed integer overflow wrapping. Otherwise mktime(3) and timegm(3)
can hang, in case the timestamp passed in struct tm is not representable
in a time_t. Specifically, any timestamp after 2038-01-19 03:14:07, in
combination with a 32-bit time_t.
Note that it would be better to change the code to not rely on undefined
behaviour, but it is contributed code, and it is not entirely trivial to
fix the issue properly.
adrian [Sun, 24 Mar 2013 00:03:12 +0000 (00:03 +0000)]
Overhaul the TXQ locking (again!) as part of some beacon/cabq timing
related issues.
Moving the TX locking under one lock made things easier to progress on
but it had one important side-effect - it increased the latency when
handling CABQ setup when sending beacons.
This commit introduces a bunch of new changes and a few unrelated changs
that are just easier to lump in here.
The aim is to have the CABQ locking separate from other locking.
The CABQ transmit path in the beacon process thus doesn't have to grab
the general TX lock, reducing lock contention/latency and making it
more likely that we'll make the beacon TX timing.
The second half of this commit is the CABQ related setup changes needed
for sane looking EDMA CABQ support. Right now the EDMA TX code naively
assumes that only one frame (MPDU or A-MPDU) is being pushed into each
FIFO slot. For the CABQ this isn't true - a whole list of frames is
being pushed in - and thus CABQ handling breaks very quickly.
The aim here is to setup the CABQ list and then push _that list_ to
the hardware for transmission. I can then extend the EDMA TX code
to stamp that list as being "one" FIFO entry (likely by tagging the
last buffer in that list as "FIFO END") so the EDMA TX completion code
correctly tracks things.
Major:
* Migrate the per-TXQ add/removal locking back to per-TXQ, rather than
a single lock.
* Leave the software queue side of things under the ATH_TX_LOCK lock,
(continuing) to serialise things as they are.
* Add a new function which is called whenever there's a beacon miss,
to print out some debugging. This is primarily designed to help
me figure out if the beacon miss events are due to a noisy environment,
issues with the PHY/MAC, or other.
* Move the CABQ setup/enable to occur _after_ all the VAPs have been
looked at. This means that for multiple VAPS in bursted mode, the
CABQ gets primed once all VAPs are checked, rather than being primed
on the first VAP and then having frames appended after this.
Minor:
* Add a (disabled) twiddle to let me enable/disable cabq traffic.
It's primarily there to let me easily debug what's going on with beacon
and CABQ setup/traffic; there's some DMA engine hangs which I'm finally
trying to trace down.
* Clear bf_next when flushing frames; it should quieten some warnings
that show up when a node goes away.
Tested:
* AR9280, STA/hostap, up to 4 vaps (staggered)
* AR5416, STA/hostap, up to 4 vaps (staggered)
TODO:
* (Lots) more AR9380 and later testing, as I may have missed something here.
* Leverage this to fix CABQ hanling for AR9380 and later chips.
* Force bursted beaconing on the chips that default to staggered beacons and
ensure the CABQ stuff is all sane (eg, the MORE bits that aren't being
correctly set when chaining descriptors.)
adrian [Sat, 23 Mar 2013 23:51:11 +0000 (23:51 +0000)]
CABQ calculation changes to try and fix some weird corner cases leading
to stuck beacons.
* Set the cabq readytime (ie, how long to burst for) to 50% of the total
beacon interval time
* fix the cabq adjustment calculation based on how the beacon offset is
calculated (the SWBA/DBA time offset.)
This is all still a bit magic voodoo but it does seem to have further
quietened issues with missed/stuck beacons under my local testing.
In any case, it better matches what the reference HAL implements.
kib [Sat, 23 Mar 2013 22:23:15 +0000 (22:23 +0000)]
Do not call malloc(M_WAITOK) while bodev->fence_lock mutex is
held. The ttm_buffer_object_transfer() does not need the mutex locked
at all, except for the call to the driver sync_obj_ref() method.
Reported and tested by: dumbbell
MFC after: 2 weeks
mm [Sat, 23 Mar 2013 21:34:10 +0000 (21:34 +0000)]
Merge bugfix from vendor master branch:
Limit write requests to at most INT_MAX.
This prevents a certain common programming error (passing -1 to write)
from leading to other problems deeper in the library.
ian [Sat, 23 Mar 2013 17:17:06 +0000 (17:17 +0000)]
Don't check and warn about pmap mismatch on every call to busdma sync.
With some recent busdma refactoring, sometimes it happens that a sync
op gets called when bus_dmamap_load() never got called, which results
in a spurious warning about a map mismatch when no sync operations will
actually happen anyway. Now the check is done only if a sync operation
is actually performed, and the result of the check is a panic, not just
a printf.
Reviewed by: cognet (who prevented me from donning a point hat)
will [Sat, 23 Mar 2013 16:34:56 +0000 (16:34 +0000)]
ZFS: Fix a panic while unmounting a busy filesystem.
This particular scenario was easily reproduced using a NFS export. When the
first 'zfs unmount' occurred, it returned EBUSY via this path, while
vflush() had flushed references on the filesystem's root vnode, which in
turn caused its v_interlock to be destroyed. The next time 'zfs unmount'
was called, vflush() tried to obtain this lock, which caused this panic.
Since vflush() on FreeBSD is a definitive call, there is no need to check
vfsp->vfs_count after it completes. Simply #ifdef sun this check.
will [Sat, 23 Mar 2013 15:11:53 +0000 (15:11 +0000)]
Extend taskqueue(9) to enable per-taskqueue callbacks.
The scope of these callbacks is primarily to support actions that affect the
taskqueue's thread environments. They are entirely optional, and
consequently are introduced as a new API: taskqueue_set_callback().
This interface allows the caller to specify that a taskqueue requires a
callback and optional context pointer for a given callback type.
The callback types included in this commit can be used to register a
constructor and destructor for thread-local storage using osd(9). This
allows a particular taskqueue to define that its threads require a specific
type of TLS, without the need for a specially-orchestrated task-based
mechanism for startup and shutdown in order to accomplish it.
Two callback types are supported at this point:
- TASKQUEUE_CALLBACK_TYPE_INIT, called by every thread when it starts, prior
to processing any tasks.
- TASKQUEUE_CALLBACK_TYPE_SHUTDOWN, called by every thread when it exits,
after it has processed its last task but before the taskqueue is
reclaimed.
While I'm here:
- Add two new macros, TQ_ASSERT_LOCKED and TQ_ASSERT_UNLOCKED, and use them
in appropriate locations.
- Fix taskqueue.9 to mention taskqueue_start_threads(), which is a required
interface for all consumers of taskqueue(9).
mav [Sat, 23 Mar 2013 13:11:54 +0000 (13:11 +0000)]
Make `systat -vmstat` to use suffixes to display big floating point numbers
that are not fitting into the specified field width, same as done for ints.
In particular that allows to properly display disk tps above 100k, that are
reachable with modern SSDs.
avg [Sat, 23 Mar 2013 08:48:44 +0000 (08:48 +0000)]
fbt_typoff_init: fix an off by one in determining required memory size
This issue would be silent most of the time, but if the requested memory
is a multiple of a page size, then accessing one element beyond the end
would lead to a kernel page fault.
Otherwise, the unlucky last type would just be inaccessible.
Reported by: glebius
Tested by: glebius
MFC after: 6 days