Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
Mark Johnston [Tue, 15 Sep 2015 23:56:31 +0000 (23:56 +0000)]
Ensure that the MAD agent's delayed taskqueue is completely stopped
before proceeding. Otherwise, nothing prevents it from running after the
MAD agent struct has been been freed, and this results in a use-after-free
when the task's ta_pending count is incremented in the callout handler.
John Baldwin [Tue, 15 Sep 2015 22:16:21 +0000 (22:16 +0000)]
Threads holding a read lock of a sleepable rm lock are not permitted
to sleep. The rmlock implementation enforces this by disabling
sleeping when a read lock is acquired. To simplify the implementation,
sleeping is disabled for most of the duration of rm_rlock. However,
it doesn't need to be disabled until the lock is acquired. If a
sleepable rm lock is contested, then rm_rlock may need to acquire the
backing sx lock. This tripped the overly-broad assertion. Fix by
relaxing the assertion around the call to sx_xlock().
In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)
Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)
Simplify nd6_cache_lladdr:
* Move isRouter calculation code to separate nd6_is_router() function.
* Make nd6_cache_lladdr() return void: its return value hasn't been used
since r53541 KAME import in 1999.
Zbigniew Bodek [Tue, 15 Sep 2015 11:21:16 +0000 (11:21 +0000)]
Perform I2C transmission in a single burst when mode is "none" or not set
Some more automated I2C controllers cannot explicitly create
START/STOP/etc. conditions on the bus.
Instead, the correct condition is set automatically according
to the pending transfer status.
This particular behavior can cause trouble if some I2C slave
requires sending address offset within the chip followed by
the actual data or command. In that case we cannot assume that
the driver will not STOP immediately after sending
offset.
To avoid that, do not split offset transfer from data transfer
for default transmission modes and do exactly that if requested
in command line (stop-start and repeated-start modes).
This more generic approach should cover special cases like
the one described.
* Require explicitl lle unlink prior to calling llentry_delete().
This one slightly decreases time of holding afdata wlock.
* While here, make nd6_free() return void. No one has used its return value
since r186119.
Mark Johnston [Tue, 15 Sep 2015 05:16:26 +0000 (05:16 +0000)]
Remove an unneeded typedef of ip6_t from the DTrace ip provider library.
It causes an error when ipfilter is enabled, since ipl.ko contains an
identical typedef.
Mark Johnston [Tue, 15 Sep 2015 05:09:17 +0000 (05:09 +0000)]
Preserve the device queue status before retrying a sense request in
chdone(). Previously, the retry could clear the CAM_DEV_QFRZN bit in the
CCB status, leaving the queue frozen.
Submitted by: Jeff Miller <Jeff.Miller@isilon.com>
Reviewed by: ken
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
Adrian Chadd [Tue, 15 Sep 2015 03:01:40 +0000 (03:01 +0000)]
Replace the scan event input path hack with the new rx-stats based method.
This allows for arbitrary channel info to be placed in the input call rather
than the totally gross hack of overriding ic_curchan.
Without this I'm sure ic_curchan setting was racing with the scan code
setting the channel itself..
Eric van Gyzen [Mon, 14 Sep 2015 19:17:25 +0000 (19:17 +0000)]
Fix the handling of IPv6 On-Link Redirects.
On receipt of a redirect message, install an interface route for the
redirected destination. On removal of the corresponding Neighbor Cache
entry, remove the interface route.
This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.
This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.
Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.
When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi. (Thanks!)
These other test cases also passed in 2012:
* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
RTF_HOST|RTF_GATEWAY route for the destination)
* the redirected-to-self case, with IPv4 and IPv6
* a valid IPv4 redirect
All testing in 2012 was done with WITNESS and INVARIANTS.
Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
Mark Kelley <mark_kelley@dell.com> in 2012,
TC Telkamp <terence_telkamp@dell.com> in 2012
PR: 152791
Reviewed by: melifaro (current rev), bz (earlier rev)
Approved by: kib (mentor)
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
* Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures
Implement callout_drain_async(), inspired by the projects/hps_head
branch.
This function is used to drain a callout via a callback instead of
blocking the caller until the drain is complete. Refer to the
callout_drain_async() manual page for a detailed description.
Limitation: If a lock is used with the callout, the callout can only
be drained asynchronously one time unless the callout_init_mtx()
function is called again. This limitation is not present in
projects/hps_head and will require more invasive changes to the
timeout code, which was not in the scope of this patch.
To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.
Implementation notes:
If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.
Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.
The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.
Existing network adapter limits will be updated separately.
Marius Strobl [Sun, 13 Sep 2015 21:59:56 +0000 (21:59 +0000)]
- Sanity check that the parent ranges given in the "ranges" property
of PCI-EBus-bridges actually match the BARs as specified in and
required by [1, p. 113 f.]. Doing so earlier would have simplified
diagnosing a bug in QEMU/OpenBIOS getting the mapping of child
addresses wrong, which still needs to be fixed there.
In theory, we could try to change the BARs accordingly if we hit
this problem. However, at least with real machines changing the
decoding likely won't work, especially if the PCI-EBus-bridge is
beneath an APB one. So implementing such functionality generally
is rather pointless.
- Actually change the allocation type of EBus resources if they
change from SYS_RES_MEMORY to SYS_RES_IOPORT when mapping them
to PCI ranges in ebus_alloc_resource() and passing them up to
bus_activate_resource(9). This may happen with the QEMU/OpenBIOS
PCI-EBus-bridge but not real ones. Still, this is only cleans up
the code and the result of resource allocation and activation is
unchanged.
- Change the remainder of printf(9) to device_printf(9) calls and
canonicalize their wording.
MFC after: 1 week
Peripheral Component Interconnect Input Output Controller,
Part No.: 802-7837-01, Sun Microelectronics, March 1997 [1]
Adrian Chadd [Sun, 13 Sep 2015 19:17:26 +0000 (19:17 +0000)]
Disable mgmt frame sending in if_rsu.
The firmware in this NIC sends management frames. So far I'm not sure which
ones it handles and which ones it doesn't handle - but this is what openbsd
does.
The association messages are handled by the firmware; the key negotiation
for 802.1x and WPA are done as raw frames, not management frames.
This successfully allows it to associate to my home networks whereas it didn't
work beforehand.
Tested:
* RTL8712, cut 3, STA mode
TODO:
* The firmware does send a join response with a status code; that should be
logged in a more obvious way to assist with debugging. Ie, the firmware
is the thing that is saying "couldn't join, sorry!", not net80211.
Sean Bruno [Sun, 13 Sep 2015 18:26:05 +0000 (18:26 +0000)]
Update em(4) with D3162 after testing further on hardware that failed
to attach with the last version of this commit. This commit fixes
attach failures on "ICH8" class devices via modifications to
e1000_init_nvm_params_ich8lan()
- Fix compiler warning in 80003es2lan.c
- Add return value handler for e1000_*_kmrn_reg_80003es2lan
- Fix usage of DEBUGOUT
- Remove unnecessary variable initializations.
- Removed unused variables (complaints from gcc).
- Edit defines in 82571.h.
- Add workaround for igb hw errata.
- Shared code changes for Skylake/I219 support.
- Remove unused OBFF and LTR functions.
Tested by some of the folks that reported breakage in previous incarnation.
Thanks to AllanJude, gjb, gnn, tijl for tempting fate with their machines.
Xin LI [Sun, 13 Sep 2015 07:15:14 +0000 (07:15 +0000)]
MFV r287623: 5997 FRU field not set during pool creation and never
updated
ZFS already supports storing the vdev FRU in a vdev property. There
is code in libzfs to work with this property, and there is code in
the zfs-retire FMA module that looks for that information. But there
is no code actually setting or updating the FRU.
To address this, ZFS is changed to send a handful of new events
whenever a vdev is added, attached, cleared, or onlined, as well
as when a pool is created or imported.
Note that syseventd is not currently available on FreeBSD and thus
some work is needed to actually support the new ZFS events (e.g. in
zfsd) to actually use this capability, this changeset is mostly a
diff reduction from upstream.
Adrian Chadd [Sun, 13 Sep 2015 04:12:51 +0000 (04:12 +0000)]
* fiddle with some more of the debugging output
* yes, when a "sta disconnect" message comes through we should, like,
disconnect things. We're not currently generating beacon miss messages,
and net80211 isn't disconnecting things via software beacon miss receive.
Marius Strobl [Sun, 13 Sep 2015 00:08:04 +0000 (00:08 +0000)]
Merge r286374 from x86:
Formally pair store_rel(&smp_started) with load_acq(&smp_started).
Similarly to x86, this change is mostly a NOP due to the kernel
being run in total store order.
Adrian Chadd [Sat, 12 Sep 2015 23:10:34 +0000 (23:10 +0000)]
if_rsu debug fixes:
* use an ath/iwn style debug bitmap - it's still global rather than per-device,
but it's better than debug levels
* disable bgscan - it just makes things unstable/unpredictable for now.
Marius Strobl [Sat, 12 Sep 2015 22:49:32 +0000 (22:49 +0000)]
- Factor out the common and generic parts of the sparc64 host-PCI-bridge
drivers into the revived sys/sparc64/pci/ofw_pci.c, previously already
serving a similar purpose. This has been done with sun4v in mind, which
explains a) the otherwise not that obvious scheme employed and b) why
reusing sys/powerpc/ofw/ofw_pci.c was even lesser an option.
- Add a workaround for QEMU once again not emulating real machines, in
this case by not providing the OFW_PCI_CS_MEM64 range. [1]
Michael Tuexen [Sat, 12 Sep 2015 17:08:51 +0000 (17:08 +0000)]
Cleanup the handling of error causes for ERROR chunks. This fixes
an inconsistency of the padding handling. The final padding is
now considered to be a chunk padding.
Xin LI [Sat, 12 Sep 2015 09:56:23 +0000 (09:56 +0000)]
MFV r287699: 6214 zpools going south
In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced. This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.
How many demand read didn't have to wait for I/O
because of predictive prefetch. (more is better)
zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.
The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Arne Jansen <sensille@gmx.net>