MFC r220777:
- Tune different wait loops to cut some more milliseconds from reset time.
- Do not call ahci_start() before device signature received. It is required
by the specification and caused non-fatal reset timeouts on AMD chipsets.
MFC r220657:
Some changes around hot-plug and interface power-management:
- use ATA_SE_EXCHANGED (SError.DIAG.X) bit to detect hot-plug events when
power-management enabled and ATA_SE_PHY_CHANGED (SError.DIAG.N) can't be
trusted;
- on controllers supporting staggered spin-up (SS) put unused channels
into Listen state instead of Off. It should still save some power, but
allow plug-in events to be detected;
- on controllers supporting cold presence detection (CPD), when power
management enabled, use CPD events to detect hot-plug in addition to PHY
events.
MFC r220602:
Improve SATA Asynchronous Notification feature support in CAM:
- make SATA SIMs announce capabilities to handle SDB with Notification bit;
- make PMP driver honor this SIMs capability;
- make SATA XPT to negotiate and enable this feature for ATAPI devices.
This feature allows supporting SATA ATAPI devices to inform system about
some events happened, that may require attention. In my case this allows
LG GH22LS50 SATA DVR-RW drive to report tray open/close events. Events
reported to CAM in form of AC_SCSI_AEN async. Further they could be used
as a hints for checking device status and reporting media change to upper
layers, for example, via spoiling mechanism of GEOM.
MFC r220591:
As soon as siis_reset() doesn't waits for device readiness, but only for
controller port readiness (that should set just after PHY ready signal),
reduce wait time from 10s to 1s before trying more aggressive reset method.
This should improve system responsibility in some failure conditions.
MFC r220576:
Refactor hard-reset implementation in ahci(4).
Instead of spinning in a tight loop for up to 15 seconds, polling for device
readiness while it spins up, return reset completion just after PHY reports
"connect well" or 100ms connection timeout. If device was found, use callout
for checking device readiness with 100ms period up to full 31 second timeout.
This fixes system freeze for 5-10 seconds on drives hot plug-in.
MFC r217877, r217883:
Hardware supported by siis(4) allows software control over activity LEDs.
Expose that functionality to led(4) OR-ing it with regular LED activity.
MFC r218596, r218605:
Disable NCQ for multiport Marvell 88SX61XX SATA controllers. Simultaneous
active I/O to several disks (copying large file on ZFS) causes timeout after
just a few seconds of run. Single port 88SX6111 seems like not affected.
Skip reading transferred bytes count for these controllers. It works for
88SX6111, but 88SX6145 always returns zero there. Haven't tested others,
but better to be safe.
MFC r220412, r220414, r220454, r220618, r220814:
- Make ada(4) driver to control device write cache, same as ata(4) does.
Add kern.cam.ada.write_cache sysctl/tunable to control it alike hw.ata.wc.
- Add kern.cam.ada.X.write_cache tunables/sysctls to control write caching
on per-device basis.
- While adding support for per-device sysctls, merge from graid branch
support for ADA_TEST_FAILURE kernel option, which opens few more sysctl,
allowing to simulate read and write errors for testing purposes.
MFC r214989:
When requesting sense data for SIM not doing it automatically (such as
ATAPI or USB), request only as much data as requested by consumer.
On the way back -- report how much sense data we have actually received.
Remove a check in udp6_send() that prevented v4-mapped v6 addresses from
working. We store v4 and v6 addresses as a union but for v4-mapped
addresses only store the 32bits w/o the ::ffff: word. That failed the
check as for example 127.0.0.1 would be ::7f00:1 rather than ::ffff:7f00:1
and the IN6_IS_ADDR_V4MAPPED() never worked here. Given we can hardly get
here with an unbound local address or invalid inp_vflags remove the check.
After r219579 and r219779 unbreak v4-mapped v6 sockets for UDP
some more. Similar to what we do for TCP check for v4-mapped
addresses and then handle them or the normal v6 address case.
For either set inp_vflags before calling into the pcb connect
function so that we have an unambiguous view in case we need to
set the local address or port.
In some cases as udp6_connect() without an earlier bind(2) to an
address, v4-mapped sockets allowed and a non mapped destination
address, we can end up here with both v4 and v6 indicated:
inp_vflag = (INP_IPV4|INP_IPV6|INP_IPV6PROTO)
In that case however laddrp is NULL as the IPv6 path does not
pass in a copy currently.
MFC 220430,220431,220452,220460:
If a system call does not request a full interrupt return, use a fast
path via the sysretq instruction to return from the system call. This
resolves most of the performance regression in system call microbenchmarks
between 7 and 8 on amd64.
While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.
Fix IPv6 ND. After r219562 we in nd6_ns_input() were erroneously always
passing the cached proxydl reference (sockaddr_dl initialized or not) to
nd6_na_output(). nd6_na_output() will thus assume a proxy NA. Revert to
conditionally passing either &proxydl or NULL if no proxy case desired.
Tested by: ipv6gw and ref9-i386
Tested by: Pete French (petefrench ingresso.co.uk on stable)
Reported by: Pete French (petefrench ingresso.co.uk on stable)
Reported by: bz, simon on Y! cluster
Reported by: kib
PR: kern/151908
X-Early-MFC: yes
Rework change made at r203146. Instead of reporting all wire errors as
SCSI status errors to CAM (that was wrong, as it too often turned retriable
wire errors into non-retriable REQUEST SENSE errors), do it only for STALL
errors on control pipe of the CBI devices. STALL on control pipe is just
a one of the ways to report error for CBI devices.
When specifying the -t option (send tag in front of message), this tag
should also be forwarded to the remote logging host, not only when the
logging is done locally.
If building (custom) FreeBSD images people tend to patch param.h. In case
this happens just before the build is started (within the same second)
CHECK_TIME actually triggers thinking param.h is in the future (see f_Xtime,
c_Xtime logi in find(1) sources for the details in !F_EXACTTIME case).
Using the -mtime -0s (seconds, rather than no unit) avoids this 1s race.
VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.
The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec
MFC 220156:
Clamp the initial advertised receive window when responding to a SYN/ACK
to the maximum allowed window. Growing the window too large would cause
an underflow in the calculations in tcp_output() to decide if a window
update should be sent which would prevent the persist timer from being
started if data was pending and the other end of the connection advertised
an initial window size of 0.
MFC 220126:
- Enable an extra debugging bootverbose printf when probing ISA PNP cards
listing each card as it is found on non-PC98 (PC98 already had this).
- Increase the length of the DELAY() used before timing out while reading
PNP resource data.
MFC 219865:
Add pci_find_cap() as an alias for pci_find_extcap() to ease driver
portability with 9+ where pci_find_extcap() has been renamed to
pci_find_cap().
MFC 219717,220363:
- Add more details to the 'show battery' command including more raw
capacity values, charge cycle count, temperature, and more detailed
status.
- Add the ability to manage the state of write caching when the battery
back-up is missing or dead. The current state of this field is reported
in 'mfiutil cache <volume>' and can be adjusted via
'mfiutil cache <volume> bad-bbu-write-cache <enable|disable>'. This
setting should generally be disabled to avoid data loss.
MFC r220376: Allow strerror(0) and strerror_r(0, ...).
Of course, strerror_r() may still fail with ERANGE.
Although the POSIX specification said this could fail with EINVAL and
doing this likely indicates invalid use of errno, most other
implementations permitted it, various POSIX testsuites require it to
work (matching the older sys_errlist array) and apparently some
applications depend on it.
MFC r220461:
Remove setting of PCB_FULL_IRET at the places where we are going to call
update_gdt_{f,g}sbase. The functions set the flag when td == curthread,
and sysarch is always called with curthread.
* Add the readline(3) API to libedit. The libedit versions of
{readline,history}.h are in /usr/include/edit so as to not conflict with
the GNU libreadline versions. To use the libedit readline(3) one should
add "-I/usr/include/edit" to their Makefile
(spelled "-I${DESTDIR}/${INCLUDEDIR}/edit" within the FreeBSD source tree).
* Enable its use in the BSD licensed utilities that support readline(3).
* histedit.h is moved into libedit's directory
MFC: 220152
This patch fixes the Experimental NFS client to properly deal with 32 bit or
64 bit fileid's in NFSv2 and NFSv3. Without this fix, invalid casting (and sign
extension) was creating problems for any fileid greater than 2^31.
We discovered this because we have test clusters with more than 2 billion
allocated files and 64-bit ino_t's (and friend structures).
Handle the special ruleset 0 in devfs_ruleset_use(). An attempt set the
current ruleset to 0 with command "devfs ruleset 0" triggered a KASSERT
in devfs_ruleset_create().
In g_eli_read_done() and g_eli_write_done(), for a bio with
bio_children > 1, g_destroy_bio() is never called and the bio
leaks. Fix this by calling g_destroy_bio() earlier, before the check.
Submitted by: Victor Balada Diaz <victor@bsdes.net> (initial version)
Use timeout from configuration file not only when sending and receiving,
but also when establishing connection.
r220007 (pjd):
Add mapsize to the header just before sending the packet.
Before it could change later and we were sending invalid mapsize.
Some time ago I added optimization where when nodes are connected for the
first time and there were no writes to them yet, there is no initial full
synchronization. This bug prevented it from working.
r220266 (pjd):
Handle the problem described in r220264 by using GEOM GATE queue of unlimited
length. This should fix deadlocks reported by HAST users.
r220270 (pjd):
Allow to disable sends or receives on a socket using shutdown(2) by
interpreting NULL 'data' argument passed to proto_common_send() or
proto_common_recv() as a will to do so.
r220271 (pjd):
Declare directions for sockets between primary and secondary.
In HAST we use two sockets - one for only sending the data and one for only
receiving the data.
r220272 (pjd):
When we are operating on blocking socket and get EAGAIN on send(2) or recv(2)
this means that request timed out. Translate the meaningless EAGAIN to
ETIMEDOUT to give administrator a hint that he might need to increase timeout
in configuration file.
r220273 (pjd):
Handle ENOBUFS on send(2) by retrying for a while and logging the problem.
r220274 (pjd):
Increase default timeout from 5 seconds to 20 seconds. 5 seconds is definitely
to short under heavy load and I was experiencing those timeouts in my recent
tests.
GEOM has an internal mechanism to deal with ENOMEM errors returned via
g_io_deliver(). In such case it increases 'pace' counter on each ENOMEM and
reschedules the request. The 'pace' counter is decreased for each request going
down, but until 'pace' is greater than zero, GEOM will handle at most 10
requests per second. For GEOM GATE users that are proxy to local GEOM providers
(like ggatel(8) and HAST) we can end up with almost permanent slow down of GEOM
down queue. This is because once we reach GEOM GATE queue limit, we return
ENOMEM to the GEOM. This means that we have, eg. 1024 I/O requests in the GEOM
GATE queue. To make room in the queue and stop returning ENOMEM we need to
proceed the requests of course, but those requests are handled by userland
daemons that handle them by reading/writing also from/to local GEOM providers.
For example with HAST, a new requests comes to /dev/hast/data, which is GEOM
GATE provider. GEOM GATE passes the request to hastd(8) and hastd(8)
reads/writes from/to /dev/da0. Once we reach GEOM GATE queue limit, to free up
a slot in GEOM GATE queue, hastd(8) has to read/write from/to /dev/da0, but
this request will also be very slow, because GEOM now slows down all the
requests. We end up with full queue that we can unload at the speed of 10
requests per second. This simply looks like a deadlock.
Fix it by allowing userland daemons that work with both GEOM GATE and local
GEOM providers to specify unlimited queue size, so GEOM GATE will never return
ENOMEM to the GEOM.
MFC 220382:
Correct 'list scan' description in the examples. The previous description
was incorrect - 'list scan' does not actually do a scan, but instead lists
the results of the background 'scan' cache.
Make `make tinderbox` work with MAKEOBJDIRPREFIX set (or in possibly other
combinations) by forcing FAILFILE into .CURDIR as we do for all other
universe output files. [1] Similarly make FAILFILE start with "_." as well.
Push a possible "unbind" in some situation from in6_pcbsetport() to
callers. This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.
Make sure the locally cached value of rt->rt_gateway stays stable,
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.
For now remove options FLOWTABLE from the remaining GENERIC kernel
configurations and make it opt-in for those who want it. LINT will
still build it.
While it may be a perfect win in some scenarios, it still troubles users
(see PRs) in general cases. In addition we are still allocating resources
even if disabled by sysctl and still leak arp/nd6 entries in case of
interface destruction.
Discussed with: qingli (2010-11-24, just never executed)
Discussed with: juli (OCTEON1)
PR: kern/148018, kern/155604, kern/144917, kern/146792
MFC r220249,220252:
r220249:
64bit DMA caused data corruption. Unfortunately there is no known
workaround to use 64bit DMA.
Disable 64bit DMA on Attansic L1 controller.
- Remove bsdlabel.5
- Remove bsdlabel test-script that was full of broken assumptions
- Remove dead code depending on __alpha__
- Widen fields that display partition offset/length.