sbruno [Fri, 18 May 2012 19:48:33 +0000 (19:48 +0000)]
MFC of head thunderbolt support for mfi(4)
r233711 -- IFV head_mfi into head for initial thunderbolt support
r233768 -- atomic_t --> mfi_atomic
r233805 -- fix tinderbuild, move megasas_sge to mfivar.h
r233877 -- remove atomic.h from includes
r235014 -- fix reading of sector >= 2^32 or 2^21, repair RAID handling
r235016 -- style(9)
r235040 -- fix returns from mfi_tbolt_sync_map_info()
r235318 -- repair panic on PAE i386
r235321 -- repair the repair of panics on PAE i386
jhb [Fri, 18 May 2012 18:53:28 +0000 (18:53 +0000)]
MFC 234186
If a linker file contains at least one module, but all of the modules
fail to load (the MOD_LOAD event fails) during a kldload(2), unload the
linker file and fail the kldload(2) with ENOEXEC.
sbruno [Fri, 18 May 2012 18:29:44 +0000 (18:29 +0000)]
MFC r235210
Modify the binding of queues to attach to as many CPUs
as possible when using more than one igb(4) adapter. This
means that queues will not be bound to the same CPUs if
there are more CPUs availble.
This is only applicable to a system that has multiple interfaces.
jhb [Thu, 17 May 2012 15:45:00 +0000 (15:45 +0000)]
MFC 233708,234154:
Fix a few issues with transmit handling in em(4) and igb(4):
- Do not define the foo_start() methods or set if_start in the ifnet if
multiq transmit is enabled. Also, set if_transmit and if_qflush before
ether_ifattach rather than after when multiq transmit is enabled. This
helps to ensure that the drivers never try to mix different transmit
methods.
- Properly restart transmit during resume. igb(4) was not restarting it
at all, and em(4) was restarting even if the link was down and was
calling the wrong method if multiq transmit was enabled.
- Remove all the 'more' handling for transmit completions. Transmit
completion processing does not have a processing limit, so it always
runs to completion and never has more work to do when it returns.
Instead, the previous code was returning 'true' anytime there were
packets in the queue that weren't still in the process of being
transmitted. The effect was that the driver would continuously
reschedule a task to process TX completions in effect running at 100%
CPU polling the hardware until it finished transmitting all of the
packets in the ring. Now it will just wait for the next TX completion
interrupt.
- Restart packet transmission when the link becomes active.
- Fix the MSI-X queue interrupt handlers to restart packet transmission if
there are pending packets in the relevant software queue (IFQ or buf_ring)
after processing TX completions. This is the root cause for the OACTIVE
hangs as if the MSI-X queue handler drained all the pending packets from
the TX ring, nothing would ever restart it. As such, remove some
previously-added workarounds to reschedule a task to poll the TX ring
anytime OACTIVE was set.
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().
jhb [Wed, 16 May 2012 21:07:19 +0000 (21:07 +0000)]
MFC 234152:
Allow device_busy() and device_unbusy() to be invoked while a device is
being attached. This is implemented by adding a new DS_ATTACHING state
while a device's DEVICE_ATTACH() method is being invoked. A driver is
required to not fail an attach of a busy device. The device's state will
be promoted to DS_BUSY rather than DS_ACTIVE() if the device was marked
busy during DEVICE_ATTACH()
jhb [Wed, 16 May 2012 20:05:31 +0000 (20:05 +0000)]
MFC 218221,233709,233781,233793:
- Use a dedicated taskqueue with a thread that runs at a software-interrupt
priority for the periodic polling of the machine check registers.
- Don't malloc() new MCA records for machine checks logged due to a
CMCI or MC# exception. Instead, use a pre-allocated pool of records.
When a CMCI or MC# exception fires, schedule a task to refill the pool.
The pool is sized to hold at least one record per available machine
bank, and one record per CPU. This should handle the case of all CPUs
triggering a single bank at once as well as the case a single CPU
triggering all of its banks. The periodic scans still use malloc()
since they are run from a safe context.
- Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).
delphij [Wed, 16 May 2012 20:02:29 +0000 (20:02 +0000)]
MFC r234244:
The scandir(3) function expects fourth parameter, compar, be in type of:
int (*compar)(const struct dirent **, const struct dirent **)
The current code defines sortq() to accept two void *, then cast them
to const struct dirent **. Because the code does not really need this
cast, we can eliminate the casts by changing the function prototype
to match scandir(3) expectation.
jimharris [Wed, 16 May 2012 00:03:58 +0000 (00:03 +0000)]
MFC r235043:
Fix off-by-one error in sati_inquiry_block_device_translate_data(). Bug would
result in INQUIRY VPD 0x81 to SATA devices to return only 63 bytes of data
instead of 64 during SCSI/ATA translation.
tuexen [Mon, 14 May 2012 10:14:43 +0000 (10:14 +0000)]
MFC r235282:
Only provide the supported features in the SCTP_ASSOC_CHANGE notif
if the state is SCTP_COMM_UP or SCTP_RESTART.
While there, do some cleanups.
tuexen [Mon, 14 May 2012 10:12:03 +0000 (10:12 +0000)]
MFC r235280:
Remove a constant which is only used on non-FreeBSD platform.
(The actual code for the socket option handling has been #ifdefed
out forever...)
rmacklem [Sun, 13 May 2012 20:28:43 +0000 (20:28 +0000)]
MFC: r234742
It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.
emaste [Fri, 11 May 2012 01:28:25 +0000 (01:28 +0000)]
MFC r234138:
Support percent-encoded user and password
RFC 1738 specifies that any ":", "@", or "/" within a user name or
password in a URL is percent-encoded, to avoid ambiguity with the use
of those characters as URL component separators.
daichi [Thu, 10 May 2012 20:37:56 +0000 (20:37 +0000)]
MFC: 234867 and 234944
- fixed a vnode lock hang-up issue.
- fixed an incorrect lock status issue.
- fixed an incorrect lock issue of unionfs root vnode removed.
(pointed out by keith)
- fixed an infinity loop issue.
(pointed out by dumbbell)
- changed to do LK_RELEASE expressly when unlocked.
- fixed a unionfs_readdir math issue
Submitted by: ozawa@ongs.co.jp, Matthew Fleming <mfleming@isilon.com>
kib [Thu, 10 May 2012 11:08:09 +0000 (11:08 +0000)]
MFC r234981:
Move the code to call the callout callback into the helper function
softclock_call_cc(). While there, move some common code to callout_cc_del().
kib [Thu, 10 May 2012 10:56:46 +0000 (10:56 +0000)]
MFC r234952:
Mark the migrating callouts with CALLOUT_DFRMIGRATION flag. The flag is
cleared by callout_stop_safe() when the function detects a migration,
besides returning the success. The softclock() rechecks the flag for
migrating callout and cancels its execution if the flag was cleared
meantime.
kib [Wed, 9 May 2012 15:57:59 +0000 (15:57 +0000)]
MFC r211706:
On shared object unload, in __cxa_finalize, call and clear all installed
atexit and __cxa_atexit handlers that are either installed by unloaded
dso, or points to the functions provided by the dso.
Use _rtld_addr_phdr to locate segment information from the address of
private variable belonging to the dso, supplied by crtstuff.c. Provide
utility function __elf_phdr_match_addr to do the match of address against
dso executable segment.
Call back into libthr from __cxa_finalize using weak
__pthread_cxa_finalize symbol to remove any atfork handler which
function points into unloaded object.
The rtld needs private __pthread_cxa_finalize symbol to not require
resolution of the weak undefined symbol at initialization time. This
cannot work, since rtld is relocated before sym_zero is set up.
MFC r211894:
Do not call __pthread_cxa_finalize with invalid struct dl_phdr_info.
Requested and tested by: Peter Jeremy <peter rulingia com>
kib [Wed, 9 May 2012 15:16:38 +0000 (15:16 +0000)]
MFC r211705:
Introduce implementation-private rtld interface _rtld_addr_phdr, which
fills struct dl_phdr_info for the shared object that contains the
specified address, if any.
Requested and tested by: Peter Jeremy <peter rulingia com>
tuexen [Wed, 9 May 2012 15:11:47 +0000 (15:11 +0000)]
MFC r235064:
Honor SCTP_ENABLE_STREAM_RESET socket option when processing incoming
requests. Fix also the provided result in the response and use names
as specified in RFC 6525.
pho [Wed, 9 May 2012 10:05:02 +0000 (10:05 +0000)]
MFC: r234932
Added D_TRACKCLOSE to sndstat_cdevsw to fix the situation when
another process is in open() or stat() for the device node, then
close() from the owning process does not result in cdevsw close
method call. This fixes the pemanent "Device busy" seen.
Changed the sndstat_lock from mutex to sx. This allows to extend
the region covered by the lock, to include the uiomove() call in
sndstat_read() and bufptr increment. This fixes the "panic:
sbuf_put_byte called with finished or corrupt sbuf" seen.
uqs [Tue, 8 May 2012 08:19:07 +0000 (08:19 +0000)]
Sync traceroute(8) with head.
Merges r215937,216184 and r211062,215880,220968:
- Remove unused traceroute(8) contrib code from head
- make WARNS=3 clean
- fix an operator precedence bug for TCP tracerouting
- Remove unneeded struct timezone passed to gettimeofday().
- Remove clause 3 and 4 from TNF licenses.
- Check return code of setuid() in traceroute.
marius [Mon, 7 May 2012 07:04:41 +0000 (07:04 +0000)]
MFC: r225203 (partial)
Attempt to make break-to-debugger and alternative break-to-debugger more
accessible:
(1) Always compile in support for breaking into the debugger if options
KDB is present in the kernel.
(2) Disable both by default, but allow them to be enabled via tunables
and sysctls debug.kdb.break_to_debugger and
debug.kdb.alt_break_to_debugger.
(3) options BREAK_TO_DEBUGGER and options ALT_BREAK_TO_DEBUGGER continue
to behave as before -- only now instead of compiling in
break-to-debugger support, they change the default values of the
above sysctls to enable those features by default. Current kernel
configurations should, therefore, continue to behave as expected.
(4) Migrate alternative break-to-debugger state machine logic out of
individual device drivers into centralised KDB code. This has a
number of upsides, but also one downside: it's now tricky to release
sio spin locks when entering the debugger, so we don't. However,
similar logic does not exist in other device drivers, including uart.
(5) dcons requires some special handling; unlike other console types, it
allows overriding KDB's own debugger selection, so we need a new
interface to KDB to allow that to work.
GENERIC kernels will now support break-to-debugger as long as appropriate
boot/run-time options are set, which should improve the debuggability of
kernels significantly.
MFC: r225214 (partial)
Follow up to r225203 refining break-to-debugger run-time configuration
improvements:
(1) Implement new model in previously missed at91 UART driver
(2) Move BREAK_TO_DEBUGGER and ALT_BREAK_TO_DEBUGGER from opt_comconsole.h
to opt_kdb.h (spotted by np)
(3) Garbage collect now-unused opt_comconsole.h
Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file. With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.
This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore. This means that the multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.
melifaro [Sat, 5 May 2012 11:34:27 +0000 (11:34 +0000)]
MFC r234572
Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.
glebius [Sat, 5 May 2012 10:05:13 +0000 (10:05 +0000)]
Merge 234342 from head:
When we receive an ICMP unreach need fragmentation datagram, we take
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.
This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.
To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az
Reported by: Andrey Zonov <andrey zonov.org>
Reviewed by: andre (previous version of patch)
hselasky [Fri, 4 May 2012 15:55:31 +0000 (15:55 +0000)]
MFC r234803 and r234961:
Add support for Multi-TT mode of modern USB HUBs.
This will give you more bandwidth for isochronous
FULL speed applications connected through a
High Speed HUB.
davidxu [Thu, 3 May 2012 03:05:18 +0000 (03:05 +0000)]
Merge 233103, 233912 from head:
233103:
Some software think a mutex can be destroyed after it owned it, for
example, it uses a serialization point like following:
pthread_mutex_lock(&mutex);
pthread_mutex_unlock(&mutex);
pthread_mutex_destroy(&muetx);
They think a previous lock holder should have already left the mutex and
is no longer referencing it, so they destroy it. To be maximum compatible
with such code, we use IA64 version to unlock the mutex in kernel, remove
the two steps unlocking code.
233912:
umtx operation UMTX_OP_MUTEX_WAKE has a side-effect that it accesses
a mutex after a thread has unlocked it, it event writes data to the mutex
memory to clear contention bit, there is a race that other threads
can lock it and unlock it, then destroy it, so it should not write
data to the mutex memory if there isn't any waiter.
The new operation UMTX_OP_MUTEX_WAKE2 try to fix the problem. It
requires thread library to clear the lock word entirely, then
call the WAKE2 operation to check if there is any waiter in kernel,
and try to wake up a thread, if necessary, the contention bit is set again
by the operation. This also mitgates the chance that other threads find
the contention bit and try to enter kernel to compete with each other
to wake up sleeping thread, this is unnecessary. With this change, the
mutex owner is no longer holding the mutex until it reaches a point
where kernel umtx queue is locked, it releases the mutex as soon as
possible.
Performance is improved when the mutex is contensted heavily. On Intel
i3-2310M, the runtime of a benchmark program is reduced from 26.87 seconds
to 2.39 seconds, it even is better than UMTX_OP_MUTEX_WAKE which is
deprecated now. http://people.freebsd.org/~davidxu/bench/mutex_perf.c
Special code for stable/8:
And add code to detect if the UMTX_OP_MUTEX_WAKE2 is available.
mav [Wed, 2 May 2012 07:22:58 +0000 (07:22 +0000)]
MFC r234415:
Some improvements to GEOM MULTIPATH:
- Implement "configure" command to allow switching operation mode of
running device on-fly without destroying and recreation.
- Implement Active/Read mode as hybrid of Active/Active and Active/Passive.
In this mode all paths not marked FAIL may handle reads same time,
but unlike Active/Active only one path handles write requests at any
point in time. It allows to closer follow original write request order
if above layers need it for data consistency (not waiting for requisite
write completion before sending dependent write).
- Hide duplicate messages about device status change.
- Remove periodic thread wake up with 10Hz rate.
mav [Wed, 2 May 2012 07:05:20 +0000 (07:05 +0000)]
MFC r222643:
When possible, join ranges of subsequest BIO_DELETE requests to handle more
(up to 2048 instead of 256 or even 64) of them with single TRIM request.
OCZ Vertex2/Vertex3 SSDs can handle no more then 64 ranges per TRIM request.
Due to lack of BIO_DELETE clustering now, it means that we could delete no
more then 2MB per request (on FS with 32K block) with limited request rate.
This change increases delete rate on Vertex2 from 250MB/s to 950MB/s.
MFC r222643:
Increase maximum supported number of ranges per TRIM command from 256 to 512
to use full potential of Intel X25-M SSDs. On synthetic test with 32K ranges
it gives about 20% speedup, which probably costs more then 2K of RAM.
mav [Wed, 2 May 2012 06:52:00 +0000 (06:52 +0000)]
Merge ATA_CAM compatibility shims. While 8-STABLE doesn't have ATA_CAM
enabled by default, this should make migration easier for users enabling
it manually.
r221071:
Add shim to simplify migration to the CAM-based ATA. For each new adaX
device in /dev/ create symbolic link with adY name, trying to mimic old ATA
numbering. Imitation is not complete, but should be enough in most cases to
mount file systems without touching /etc/fstab.
r221384:
Do not report legacy unit numbers (do not create legacy aliases) for disks
on port multiplier ports above first two. They don't fit into ATA_STATIC_ID
scheme and so can't be mapped properly. No need to pollute dev.