Bjoern A. Zeeb [Fri, 25 May 2012 02:23:26 +0000 (02:23 +0000)]
MFp4 bz_ipv6_fast:
Add code to handle pre-checked TCP checksums as indicated by mbuf
flags to save the entire computation for validation if not needed.
In the IPv6 TCP output path only compute the pseudo-header checksum,
set the checksum offset in the mbuf field along the appropriate flag
as done in IPv4.
In tcp_respond() just initialize the IPv6 payload length to 0 as
ip6_output() will properly set it.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Bjoern A. Zeeb [Fri, 25 May 2012 02:19:17 +0000 (02:19 +0000)]
MFp4 bz_ipv6_fast:
Defer checksum calulations on UDP6 output and respect the mbuf
flags set by NICs having done checksum validation for us already,
thus saving the computing time in the input path as well.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Bjoern A. Zeeb [Fri, 25 May 2012 02:17:16 +0000 (02:17 +0000)]
MFp4 bz_ipv6_fast:
Add support for delayed checksum calculations in the IPv6
output path. We currently cannot offload to the card if we
add extension headers (which incl. fragmentation).
Fix two SCTP offload support copy&paste bugs: calculate
checksums if fragmenting and no need to flag IPv4 header
checksums in the IPv6 forwarding path.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Adrian Chadd [Fri, 25 May 2012 02:07:59 +0000 (02:07 +0000)]
Prepare for improved (read: pcie) suspend/resume support.
* Flesh out the pcie disable method for 11n chips, as they were defaulting
to the AR5212 (empty) PCIe disable method.
* Add accessor macros for the HAL PCIe enable/disable calls.
* Call disable on ath_suspend()
* Call enable on ath_resume()
NOTE:
* This has nothing to do with the NIC sleep/run state - the NIC still
will stay in network-run state rather than supporting network-sleep
state. This is preparation work for supporting correct suspend/resume
WARs for the 11n PCIe NICs.
TODO:
* It may be feasible at this point to keep the chip powered down during
initial probe/attach and only power it up upon the first configure/reset
pass. This however would require correct (for values of "correct")
tracking of the NIC power configuration state from the driver and that
just isn't attempted at the moment.
Tested:
* AR9280 on my Lenovo T60, but with no suspend/resume pass (yet).
Bjoern A. Zeeb [Fri, 25 May 2012 01:48:15 +0000 (01:48 +0000)]
MFp4 bz_ipv6_fast:
Hide the ip6aux functions. The only one referenced outside ip6_input.c
is not compiled in yet (__notyet__) in route6.c (r235954). We do have
accessor functions that should be used.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
X-MFC: KPI?
Bjoern A. Zeeb [Fri, 25 May 2012 01:13:39 +0000 (01:13 +0000)]
MFp4 bz_ipv6_fast:
Factor out the tcp_hc_getmtu() call. As the comments say it
applies to both v4 and v6, so only write it once making it easier
to read the protocol family specifc code.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Bjoern A. Zeeb [Thu, 24 May 2012 23:03:23 +0000 (23:03 +0000)]
MFp4 bz_ipv6_fast:
Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future,
though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Bjoern A. Zeeb [Thu, 24 May 2012 22:00:48 +0000 (22:00 +0000)]
MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).
We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Marcel Moolenaar [Thu, 24 May 2012 21:23:13 +0000 (21:23 +0000)]
A few improvements:
1. Define all registers. These definitions are needed to support
the FCM driver for direct-connect NAND.
2. Repurpose lbc_read_reg() and lbc_write_reg() for use by localbus
attached device drivers. Use bus_space functions directly in the
lbc driver itself.
3. Be smarter about programming LAWs and mapping memory. The ranges
defined in the FDT are per bank (= chip select) and since we can
have up to 8 banks, we could easily use more than 8 LAWs or TLB
enrties when per-bank memory ranges need multiple LAWs or TLBs
due to alignment or size constraints.
We now combine all memory ranges into the fewest possible set of
contiguous regions and program the hardware for that. Thus, a
cleverly written FDT with 8 devices may still only need 1 LAW or
1 TLB entry. Note that the memory ranges can be assigned randomly
to the banks. We sort as we build to handle that.
4. Support the FCM when programming the OR register. This is mostly
for documention purposes as we do not have a way to define the
mode for a bank.
5. Remove Semihalf-ism: do not define DEBUG (only to undefine it
again).
Marcel Moolenaar [Thu, 24 May 2012 21:07:10 +0000 (21:07 +0000)]
Just return if the size of the window is 0. This can happen when the
FDT does not define all ranges possible for a particular node (e.g.
PCI).
While here, only update the trgt_mem and trgt_io pointers if there's
no error. This avoids that we knowingly write an invalid target (= -1).
Marcel Moolenaar [Thu, 24 May 2012 20:58:40 +0000 (20:58 +0000)]
o Rename kernload_ap to bp_kernelload. This to introduce a common prefix
for variables that live in the boot page.
o Add bp_trace (yes, it's in the boot page) that gets zeroed before we
try to wake a core and to which the core being woken can write markers
so that we know where the core was in case it doesn't wake up. The
boot code does not yet write markers (too follow).
o Disable the boot page translation to allow the last 4K page to be used
for whatever we please. It would get mapped otherwise.
o Fix kernstart in the case of SMP. The start argument is typically page
aligned due to the alignment requirements that come with having a boot
page. The point of using trunc_page is that we get the actual load
address given that the entry point is immediately following the ELF
headers. In the SMP case this ended up exactly 4K after the load
address. Hence subtracting 1 from start.
Marcel Moolenaar [Thu, 24 May 2012 20:45:44 +0000 (20:45 +0000)]
Fix the memory barriers for CPUs that do not like lwsync and wedge or cause
exceptions early enough during boot that the kernel will do ithe same.
Use lwsync only when compiling for LP64 and revert to the more proven isync
when compiling for ILP32. Note that in the end (i.e. between revision 222198
and this change) ILP32 changed from using sync to using isync. As per Nathan
the isync is needed to make sure I/O accesses are properly serialized with
locks and isync tends to be more effecient than sync.
While here, undefine __ATOMIC_ACQ and __ATOMIC_REL at the end of the file
so as not to leak their definitions.
Marcel Moolenaar [Thu, 24 May 2012 20:24:49 +0000 (20:24 +0000)]
Preset (clear) the ranges we're supposed to fill from the FDT. If a
particular range (either I/O memory or I/O port) is not defined in
the FDT, we're not handing uninitialized structures back to our caller.
Marcel Moolenaar [Thu, 24 May 2012 20:12:46 +0000 (20:12 +0000)]
Allow building for the PowerPC EABI by providing a dummy __eabi()
function. The purpose of the __eabi() function is to set up the
runtime and is called first thing by main(). The runtime is already
set up for us prior to caling main, so there's nothing to do for
us in the EABI case.
Marcel Moolenaar [Thu, 24 May 2012 20:00:58 +0000 (20:00 +0000)]
Fix an inconsistency I just ran into for LDADD and DPADD. The description
for both of them use different, and presumably wrong, variables in the
example. They set LDFILES and SRCLIB respectively. I guess that's what
DPADD and LDADD were called first ...
Marcel Moolenaar [Thu, 24 May 2012 19:48:15 +0000 (19:48 +0000)]
Work better with how make/bmake works:
1. Avoid a cd back into ${.CURDIR} to run mkbuiltins when we know make
will first cd into ${.OBJDIR}. Keep the cwd to what make sets it to.
2. Don't tell mkbuiltins where to write to (= ${.OBJDIR}), but where to
get sources from (= ${.CURDIR}). This to compensate for point 1.
This fixes a problem with bmake's mk files that optimize ${.OBJDIR} to
expand to "." after changing cwd, not taking into account that the
target is pretty much undoing that and not getting the full path to the
object tree anymore.
Bjoern A. Zeeb [Thu, 24 May 2012 18:25:09 +0000 (18:25 +0000)]
MFp4 bz_ipv6_fast:
Introduce a (for now copied stripped down) in6_cksum_pseudo()
function. We should be able to use this from in6_cksum() but
we should also ponder possible MD specific improvements.
It takes an extra csum argument to allow for easy checks as
will be done by the upper layer protocol input paths.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Gleb Smirnoff [Thu, 24 May 2012 18:22:57 +0000 (18:22 +0000)]
Revert r220768 for ng_ksocket. This node is special and
when it is cloning, its constructor method may be called
in a context that isn't allowed to sleep.
Bjoern A. Zeeb [Thu, 24 May 2012 18:05:10 +0000 (18:05 +0000)]
MFp4 bz_ipv6_fast:
Optimize in6_cksum(), re-ordering work and limiting variable
initialization, removing a bzero() for mostly re-initialized
struct values, making use of the newly introduced in6_getscope(),
as well as converting an if/panic to a KASSERT().
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Alan Cox [Thu, 24 May 2012 15:25:35 +0000 (15:25 +0000)]
MF amd64 r233097, r233122
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.
Alexander Motin [Thu, 24 May 2012 14:07:44 +0000 (14:07 +0000)]
MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.
Alexander Motin [Thu, 24 May 2012 11:07:39 +0000 (11:07 +0000)]
MFprojects/zfsd:
- Add low-level support for SATA Enclosure Management Bridge (SEMB)
devices -- SATA equivalents of the SCSI SES/SAF-TE devices.
- Add some utility functions for SCSI SAF-TE devices access.
Maksim Yevmenkin [Wed, 23 May 2012 18:56:29 +0000 (18:56 +0000)]
Tweak condition for disabling allocation from per-CPU buckets in
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.
John Baldwin [Wed, 23 May 2012 13:45:52 +0000 (13:45 +0000)]
Rework the previous change to honor MADT processor IDs when probing
processor objects. Instead of forcing the new-bus CPU objects to use
a unit number equal to pc_cpuid, adjust acpi_pcpu_get_id() to honor the
MADT IDs by default. As with the previous change, setting
debug.acpi.cpu_unordered to 1 in the loader will revert to the old
behavior.
John Baldwin [Wed, 23 May 2012 13:41:12 +0000 (13:41 +0000)]
Only check to see if a memory resource is a PCI ROM BAR when activating
and deactivating PCI resources. Previously, if a device had more than
48 MSI interrupts, then activating message 48 (which has a rid == PCIR_BIOS)
would incorrectly try to enable the PCI ROM BAR.
Tested by: Olivier Cinquin ocinquin uci edu
MFC after: 3 days
Pyun YongHyeon [Wed, 23 May 2012 03:35:08 +0000 (03:35 +0000)]
Don't force max payload size to 128. Root complex and Endpoint will
negotiate with each other on the TLP payload size so blindly
forcing the size to 128 can cause a completion error which in turn
will stop device.
Reported by: Geans Pin < geanspin <> broadcom dot com >
MFC after: 5 days
Pyun YongHyeon [Wed, 23 May 2012 01:20:25 +0000 (01:20 +0000)]
Make IPMI work in the bce driver even when the interface is
configured down. Formerly, IPMI communication was lost whenever the
interface was not up. The reason was that the BCE_EMAC_MODE
register was not configured with the correct media settings. There
are two parts to the fix.
First, resetting the chip in bce_reset() causes the BCE_EMAC_MODE
register to be initialized to a default value that does not
necessarily correspond to the actual media settings. The fix
implemented here is a bit of a hack. Ideally, at the end of
bce_reset() we would poll the PHY to determine the negotiated media,
and then we would set the BCE_EMAC_MODE register accordingly. That
is difficult, since the PHY is abstracted behind the MII layer and is
not supposed to be queried directly from the MAC driver. Instead,
we read the BCE_EMAC_MODE register at the beginning of bce_reset()
and then restore its media bits to their original values before
returning. If IPMI is up and running, then the link is already
established and the BCE_EMAC_MODE register is already set appropriately
when bce_reset() is called. If IPMI is not running, no harm is
done by preserving the BCE_EMAC_MODE settings. The driver will set
the register properly once the interface is configured up and link
is established.
Second, bce_miibus_statchg() is sometimes called when the link is
down. In that case, the reported media settings are invalid.
Formerly, the driver used them anyway to setup the BCE_EMAC_MODE
register. We now avoid changing any MAC registers unless link is
active and the reported media settings are valid.
Submitted by: jdp
Tested by: jdp
MFC after: 5 days
Adrian Chadd [Tue, 22 May 2012 19:50:21 +0000 (19:50 +0000)]
Re-up the TX ath_buf limit from 128 to 512.
I'll have to leave this high for now, until I've done some significant
surgery with how ath_bufs (and descriptors) are handled.
This should significantly cut down on the opportunities for a full TX
queue hanging traffic. I'll continue making things work though; I'm
mostly doing this for users. :)
Adrian Chadd [Tue, 22 May 2012 19:37:12 +0000 (19:37 +0000)]
Fix some corner cases in the ieee80211_send_bar() handling.
* If the first call succeeded but failed to transmit, a timer would
reschedule it via bar_timeout(). Unfortunately bar_timeout() didn't
check the return value from the ieee80211_send_bar() reattempt and
if that failed (eg the driver ic_raw_xmit() failed), it would never
re-arm the timer.
* If BARPEND is cleared (which ieee80211_send_bar() will do if it can't
TX), then re-arming the timer isn't enough - once bar_timeout() occurs,
it'll see BARPEND is 0 and not run through the rest of the routine.
So when rearming the timer, also set that flag.
* If the TX wasn't occuring, bar_tx_complete() wouldn't be called and the
driver callback wouldn't be called either. So the driver had no idea
that the BAR TX attempt had failed. In the ath(4) case, TX would stay
paused.
(There's no callback to indicate that BAR TX had failed or not;
only a "BAR TX was attempted". That's a separate, later problem.)
So call the driver callback (ic_bar_response()) before the ADDBA session
is torn down, so it has a chance of being notified that things didn't
quite go to plan.
I've verified that yes, this does suspend traffic for ath(4), retry BAR
TX even if the driver is failing ic_raw_xmit(), and then eventually giving
up and sending a DELBA. I'll address the "out of ath_buf" issue in ath(4)
in a subsequent commit - this commit just fixes the edge case where any
driver is (way) out of internal buffers/descriptors and fails frame TX.
Fix world after byacc import:
- old yacc(1) use to magicially append stdlib.h, while new one don't
- new yacc(1) do declare yyparse by itself, fix redundant declaration of
'yyparse'
Fix panic with RACCT that could occur in low memory (or out of swap)
situations, due to fork1() calling racct_proc_exit() without calling
racct_proc_fork() first.
Submitted by: Mateusz Guzik <mjguzik at gmail dot com> (earlier version)
Reviewed by: Mateusz Guzik <mjguzik at gmail dot com>