bz [Sun, 20 May 2012 20:25:57 +0000 (20:25 +0000)]
MFC r231532:
MFp4 204292:
Ignore the NAT_T extension types so we can at least dump the SADB from
the in-base libipsec/setkey without error when NAT_T support is present
in the kernel, though not printing the additional information yet.
However in case there is no NAT_T support in kernel still consider them
to be an error.
bz [Sat, 19 May 2012 22:50:22 +0000 (22:50 +0000)]
MFC r234643:
Do not toggle IFCAP_TSO4 if we would also do TSO6. Given the driver does
not currently announce/support TSO6 that cannot happen. Clean it up anyway
for consistency.
bz [Sat, 19 May 2012 22:16:12 +0000 (22:16 +0000)]
MFC r234620:
If we pass down 64k - L2 hdr size + 1 to 64K L3+ data adding an ether
header will make the data go over the 64k limits announced to busdma as
maxsize and the transaction will fail.
With TSO this can result in a TCP regression due to the lost packet.
According to the data sheets ixgbe(4) 82598 and 82599 can handle up to
256k so increase the maximum.
Reported by: Jon Kåre Hellan, UNINETT (jon.kare.hellan uninett.no)
Tested by: Jon Kåre Hellan, UNINETT (jon.kare.hellan uninett.no)
bz [Sat, 19 May 2012 18:33:08 +0000 (18:33 +0000)]
MFC r231767:
Fix PAWS (Protect Against Wrapped Sequence numbers) in cases when
hz >> 1000 and thus getting outside the timestamp clock frequenceny of
1ms < x < 1s per tick as mandated by RFC1323, leading to connection
resets on idle connections.
Always use a granularity of 1ms using getmicrouptime() making all but
relevant callouts independent of hz.
Use getmicrouptime(), not getmicrotime() as the latter may make a jump
possibly breaking TCP nfsroot mounts having our timestamps move forward
for more than 24.8 days in a second without having been idle for that
long.
bz [Sat, 19 May 2012 14:32:47 +0000 (14:32 +0000)]
MFC r233713:
Remove the magic mfi_array is 288 bytes and just use the
sizeof the array since it is not 288 bytes.
Change reporting of a "SYSTEM" disk to "JBOD" to match
LSI MegaCli and firmware reporting.
This means that mfiutil command to "create jbod" is now a
little confusing since a RAID per drive is not really what
LSI defines JBOD to be. This should be fixed in the future
and support added to really create LSI JBOD and enable that
feature on cards that support it.
sbruno [Fri, 18 May 2012 19:48:33 +0000 (19:48 +0000)]
MFC of head thunderbolt support for mfi(4)
r233711 -- IFV head_mfi into head for initial thunderbolt support
r233768 -- atomic_t --> mfi_atomic
r233805 -- fix tinderbuild, move megasas_sge to mfivar.h
r233877 -- remove atomic.h from includes
r235014 -- fix reading of sector >= 2^32 or 2^21, repair RAID handling
r235016 -- style(9)
r235040 -- fix returns from mfi_tbolt_sync_map_info()
r235318 -- repair panic on PAE i386
r235321 -- repair the repair of panics on PAE i386
jhb [Fri, 18 May 2012 18:53:28 +0000 (18:53 +0000)]
MFC 234186
If a linker file contains at least one module, but all of the modules
fail to load (the MOD_LOAD event fails) during a kldload(2), unload the
linker file and fail the kldload(2) with ENOEXEC.
sbruno [Fri, 18 May 2012 18:29:44 +0000 (18:29 +0000)]
MFC r235210
Modify the binding of queues to attach to as many CPUs
as possible when using more than one igb(4) adapter. This
means that queues will not be bound to the same CPUs if
there are more CPUs availble.
This is only applicable to a system that has multiple interfaces.
jhb [Thu, 17 May 2012 15:45:00 +0000 (15:45 +0000)]
MFC 233708,234154:
Fix a few issues with transmit handling in em(4) and igb(4):
- Do not define the foo_start() methods or set if_start in the ifnet if
multiq transmit is enabled. Also, set if_transmit and if_qflush before
ether_ifattach rather than after when multiq transmit is enabled. This
helps to ensure that the drivers never try to mix different transmit
methods.
- Properly restart transmit during resume. igb(4) was not restarting it
at all, and em(4) was restarting even if the link was down and was
calling the wrong method if multiq transmit was enabled.
- Remove all the 'more' handling for transmit completions. Transmit
completion processing does not have a processing limit, so it always
runs to completion and never has more work to do when it returns.
Instead, the previous code was returning 'true' anytime there were
packets in the queue that weren't still in the process of being
transmitted. The effect was that the driver would continuously
reschedule a task to process TX completions in effect running at 100%
CPU polling the hardware until it finished transmitting all of the
packets in the ring. Now it will just wait for the next TX completion
interrupt.
- Restart packet transmission when the link becomes active.
- Fix the MSI-X queue interrupt handlers to restart packet transmission if
there are pending packets in the relevant software queue (IFQ or buf_ring)
after processing TX completions. This is the root cause for the OACTIVE
hangs as if the MSI-X queue handler drained all the pending packets from
the TX ring, nothing would ever restart it. As such, remove some
previously-added workarounds to reschedule a task to poll the TX ring
anytime OACTIVE was set.
- Use a dedicated task to handle deferred transmits from the if_transmit
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().
jhb [Wed, 16 May 2012 21:07:19 +0000 (21:07 +0000)]
MFC 234152:
Allow device_busy() and device_unbusy() to be invoked while a device is
being attached. This is implemented by adding a new DS_ATTACHING state
while a device's DEVICE_ATTACH() method is being invoked. A driver is
required to not fail an attach of a busy device. The device's state will
be promoted to DS_BUSY rather than DS_ACTIVE() if the device was marked
busy during DEVICE_ATTACH()
jhb [Wed, 16 May 2012 20:05:31 +0000 (20:05 +0000)]
MFC 218221,233709,233781,233793:
- Use a dedicated taskqueue with a thread that runs at a software-interrupt
priority for the periodic polling of the machine check registers.
- Don't malloc() new MCA records for machine checks logged due to a
CMCI or MC# exception. Instead, use a pre-allocated pool of records.
When a CMCI or MC# exception fires, schedule a task to refill the pool.
The pool is sized to hold at least one record per available machine
bank, and one record per CPU. This should handle the case of all CPUs
triggering a single bank at once as well as the case a single CPU
triggering all of its banks. The periodic scans still use malloc()
since they are run from a safe context.
- Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).
delphij [Wed, 16 May 2012 20:02:29 +0000 (20:02 +0000)]
MFC r234244:
The scandir(3) function expects fourth parameter, compar, be in type of:
int (*compar)(const struct dirent **, const struct dirent **)
The current code defines sortq() to accept two void *, then cast them
to const struct dirent **. Because the code does not really need this
cast, we can eliminate the casts by changing the function prototype
to match scandir(3) expectation.
jimharris [Wed, 16 May 2012 00:03:58 +0000 (00:03 +0000)]
MFC r235043:
Fix off-by-one error in sati_inquiry_block_device_translate_data(). Bug would
result in INQUIRY VPD 0x81 to SATA devices to return only 63 bytes of data
instead of 64 during SCSI/ATA translation.
tuexen [Mon, 14 May 2012 10:14:43 +0000 (10:14 +0000)]
MFC r235282:
Only provide the supported features in the SCTP_ASSOC_CHANGE notif
if the state is SCTP_COMM_UP or SCTP_RESTART.
While there, do some cleanups.
tuexen [Mon, 14 May 2012 10:12:03 +0000 (10:12 +0000)]
MFC r235280:
Remove a constant which is only used on non-FreeBSD platform.
(The actual code for the socket option handling has been #ifdefed
out forever...)
rmacklem [Sun, 13 May 2012 20:28:43 +0000 (20:28 +0000)]
MFC: r234742
It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.
emaste [Fri, 11 May 2012 01:28:25 +0000 (01:28 +0000)]
MFC r234138:
Support percent-encoded user and password
RFC 1738 specifies that any ":", "@", or "/" within a user name or
password in a URL is percent-encoded, to avoid ambiguity with the use
of those characters as URL component separators.
daichi [Thu, 10 May 2012 20:37:56 +0000 (20:37 +0000)]
MFC: 234867 and 234944
- fixed a vnode lock hang-up issue.
- fixed an incorrect lock status issue.
- fixed an incorrect lock issue of unionfs root vnode removed.
(pointed out by keith)
- fixed an infinity loop issue.
(pointed out by dumbbell)
- changed to do LK_RELEASE expressly when unlocked.
- fixed a unionfs_readdir math issue
Submitted by: ozawa@ongs.co.jp, Matthew Fleming <mfleming@isilon.com>
kib [Thu, 10 May 2012 11:08:09 +0000 (11:08 +0000)]
MFC r234981:
Move the code to call the callout callback into the helper function
softclock_call_cc(). While there, move some common code to callout_cc_del().
kib [Thu, 10 May 2012 10:56:46 +0000 (10:56 +0000)]
MFC r234952:
Mark the migrating callouts with CALLOUT_DFRMIGRATION flag. The flag is
cleared by callout_stop_safe() when the function detects a migration,
besides returning the success. The softclock() rechecks the flag for
migrating callout and cancels its execution if the flag was cleared
meantime.
kib [Wed, 9 May 2012 15:57:59 +0000 (15:57 +0000)]
MFC r211706:
On shared object unload, in __cxa_finalize, call and clear all installed
atexit and __cxa_atexit handlers that are either installed by unloaded
dso, or points to the functions provided by the dso.
Use _rtld_addr_phdr to locate segment information from the address of
private variable belonging to the dso, supplied by crtstuff.c. Provide
utility function __elf_phdr_match_addr to do the match of address against
dso executable segment.
Call back into libthr from __cxa_finalize using weak
__pthread_cxa_finalize symbol to remove any atfork handler which
function points into unloaded object.
The rtld needs private __pthread_cxa_finalize symbol to not require
resolution of the weak undefined symbol at initialization time. This
cannot work, since rtld is relocated before sym_zero is set up.
MFC r211894:
Do not call __pthread_cxa_finalize with invalid struct dl_phdr_info.
Requested and tested by: Peter Jeremy <peter rulingia com>
kib [Wed, 9 May 2012 15:16:38 +0000 (15:16 +0000)]
MFC r211705:
Introduce implementation-private rtld interface _rtld_addr_phdr, which
fills struct dl_phdr_info for the shared object that contains the
specified address, if any.
Requested and tested by: Peter Jeremy <peter rulingia com>
tuexen [Wed, 9 May 2012 15:11:47 +0000 (15:11 +0000)]
MFC r235064:
Honor SCTP_ENABLE_STREAM_RESET socket option when processing incoming
requests. Fix also the provided result in the response and use names
as specified in RFC 6525.
pho [Wed, 9 May 2012 10:05:02 +0000 (10:05 +0000)]
MFC: r234932
Added D_TRACKCLOSE to sndstat_cdevsw to fix the situation when
another process is in open() or stat() for the device node, then
close() from the owning process does not result in cdevsw close
method call. This fixes the pemanent "Device busy" seen.
Changed the sndstat_lock from mutex to sx. This allows to extend
the region covered by the lock, to include the uiomove() call in
sndstat_read() and bufptr increment. This fixes the "panic:
sbuf_put_byte called with finished or corrupt sbuf" seen.
uqs [Tue, 8 May 2012 08:19:07 +0000 (08:19 +0000)]
Sync traceroute(8) with head.
Merges r215937,216184 and r211062,215880,220968:
- Remove unused traceroute(8) contrib code from head
- make WARNS=3 clean
- fix an operator precedence bug for TCP tracerouting
- Remove unneeded struct timezone passed to gettimeofday().
- Remove clause 3 and 4 from TNF licenses.
- Check return code of setuid() in traceroute.
marius [Mon, 7 May 2012 07:04:41 +0000 (07:04 +0000)]
MFC: r225203 (partial)
Attempt to make break-to-debugger and alternative break-to-debugger more
accessible:
(1) Always compile in support for breaking into the debugger if options
KDB is present in the kernel.
(2) Disable both by default, but allow them to be enabled via tunables
and sysctls debug.kdb.break_to_debugger and
debug.kdb.alt_break_to_debugger.
(3) options BREAK_TO_DEBUGGER and options ALT_BREAK_TO_DEBUGGER continue
to behave as before -- only now instead of compiling in
break-to-debugger support, they change the default values of the
above sysctls to enable those features by default. Current kernel
configurations should, therefore, continue to behave as expected.
(4) Migrate alternative break-to-debugger state machine logic out of
individual device drivers into centralised KDB code. This has a
number of upsides, but also one downside: it's now tricky to release
sio spin locks when entering the debugger, so we don't. However,
similar logic does not exist in other device drivers, including uart.
(5) dcons requires some special handling; unlike other console types, it
allows overriding KDB's own debugger selection, so we need a new
interface to KDB to allow that to work.
GENERIC kernels will now support break-to-debugger as long as appropriate
boot/run-time options are set, which should improve the debuggability of
kernels significantly.
MFC: r225214 (partial)
Follow up to r225203 refining break-to-debugger run-time configuration
improvements:
(1) Implement new model in previously missed at91 UART driver
(2) Move BREAK_TO_DEBUGGER and ALT_BREAK_TO_DEBUGGER from opt_comconsole.h
to opt_kdb.h (spotted by np)
(3) Garbage collect now-unused opt_comconsole.h
Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file. With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.
This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore. This means that the multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.
melifaro [Sat, 5 May 2012 11:34:27 +0000 (11:34 +0000)]
MFC r234572
Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.
glebius [Sat, 5 May 2012 10:05:13 +0000 (10:05 +0000)]
Merge 234342 from head:
When we receive an ICMP unreach need fragmentation datagram, we take
proposed MTU value from it and update the TCP host cache. Then
tcp_mss_update() is called on the corresponding tcpcb. It finds the
just allocated entry in the TCP host cache and updates MSS on the
tcpcb. And then we do a fast retransmit of what we have in the tcp
send buffer.
This sequence gets broken if the TCP host cache is exausted. In this
case allocation fails, and later called tcp_mss_update() finds nothing
in cache. The fast retransmit is done with not reduced MSS and is
immidiately replied by remote host with new ICMP datagrams and the
cycle repeats. This ping-pong can go up to wirespeed.
To fix this:
- tcp_mss_update() gets new parameter - mtuoffer, that is like
offer, but needs to have min_protoh subtracted.
- tcp_mtudisc() as notification method renamed to tcp_mtudisc_notify().
- tcp_mtudisc() now accepts not a useless error argument, but proposed
MTU value, that is passed to tcp_mss_update() as mtuoffer.
Reported by: az
Reported by: Andrey Zonov <andrey zonov.org>
Reviewed by: andre (previous version of patch)
hselasky [Fri, 4 May 2012 15:55:31 +0000 (15:55 +0000)]
MFC r234803 and r234961:
Add support for Multi-TT mode of modern USB HUBs.
This will give you more bandwidth for isochronous
FULL speed applications connected through a
High Speed HUB.