alc [Sat, 26 May 2012 06:10:25 +0000 (06:10 +0000)]
Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.
Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.
kib [Sat, 26 May 2012 05:28:47 +0000 (05:28 +0000)]
Add a vn_bmap_seekhole(9) vnode helper which can be used by any
filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA
commands for lseek(2).
adrian [Fri, 25 May 2012 17:53:57 +0000 (17:53 +0000)]
Add some AR5416/AR5418 WAR's for power-on and suspend/resume:
* Now that ah_configPCIE is called for both power on and suspend/resume,
make sure the right bit(s) are cleared and set when suspending and
resuming. Specifically:
+ force disable/enable the PCIe PHY upon suspend/resume;
+ reprogram the PCIe WAR register when resuming and upon power-on.
* Add a recipe which powers down any PCIe PHY hardware inside the AR5416
(which is the PCI variant) to save on power. I have (currently) no way
to test exactly how much power is saved, if any.
Tested on:
* AR5416 cardbus - although unfortunately pccard/cbb/cardbus currently
detaches the NIC upon suspend, I don't think it's a proper test case.
* AR5418 PCIe attached to expresscard - since we're not doing PCIe APSM,
it's also not likely a full/good test case.
In both instances I went through a handful of suspend/resume cycles and
ensured that the STA vap reassociated correctly.
TODO:
* Setup a laptop to simply sit in a suspend/resume loop, making sure that
the NIC always correctly comes back;
* Start doing suspend/resume tests with actual traffic going on in the
background, as I bet this process is all quite racy at the present;
* Test adhoc/hostap mode, just to be completely sure it's working correctly;
* See if I can jury rig an external power source to an AR5416 to test out
whether ah_disablePCIE() works.
adrian [Fri, 25 May 2012 16:45:56 +0000 (16:45 +0000)]
* According to the reference code, AR_WA_D3_L1_DISBABLE is bit 14.
* Add some other WAR bits (very usefully described too) in preparation for
porting over some suspend/resume fixes from ath9k/Atheros.
gabor [Fri, 25 May 2012 09:30:16 +0000 (09:30 +0000)]
- Only use multi-threading for large files
- Do not use mmap() by default; it can be enabled by --mmap
- Add some minor optimizations for -u
- Update manual page according to the changes
mav [Fri, 25 May 2012 08:30:09 +0000 (08:30 +0000)]
Add tunable/sysctl kern.cam.pmp.hide_special, controlling whether special
PMP ports such as PMP configuration or SEMB should be exposed or hidden.
These ports were always hidden before as useless and sometimes promatic.
But with updated ses driver supporting SEMB it is no longer so straight.
Keep ports hidden by default to avoid probe request ttimeouts if SEP is
not connected to PMP's SEMB via I2C, that is very often situation.
bz [Fri, 25 May 2012 08:17:59 +0000 (08:17 +0000)]
In case forwarding is turned on for a given address family, refuse to
queue the packet for LRO and tell the driver to directly pass it on.
This avoids re-assembly and later re-fragmentation problems when
forwarding.
It's not the best solution but the simplest and most effective for
the moment.
Should have been done: ages ago
Discussed with and by: many
MFC after: 3 days
mav [Fri, 25 May 2012 07:57:17 +0000 (07:57 +0000)]
Remove sleep() from invalidate call in ses driver, waiting for daemon
process exit. Instead use CAM's standard reference counting to prevent
periph going away until process won't complete. I think that sleep in
single CAM SWI thread is not a good idea and may lead to deadlocks if
daemon process waits for some command completion. Combined with recent
patch avoiding use of CAM SWI for ATA it just causes panics because of
sleeps prohibited in interrupt thread context.
avg [Fri, 25 May 2012 07:32:26 +0000 (07:32 +0000)]
device_add_child: protect against child device with no driver but fixed unit number
This combination doesn't make sense, unit numbers should be hardwired
only in context of a known driver. The wildcard devices should have
wildcard unit numbers.
gber [Fri, 25 May 2012 06:48:42 +0000 (06:48 +0000)]
Fix resolving symbol names on ARM.
On ARM, binutils are adding '$a' symbols in the symbol table for
every function (in addition to normal symbol). When gprof(1) looks
up symbol name, it often reads '$a' instead of proper function name,
because it find it first. With this fix, when read symbol name
begins with '$' and previous symbol has the same address, it will
use previous symbol name (which is proper function name).
alc [Fri, 25 May 2012 05:28:14 +0000 (05:28 +0000)]
Correct an error in pmap_pv_reclaim(). In a rare case, when it should have
returned NULL, it might instead return a pointer to a page that it had just
unmapped.
bz [Fri, 25 May 2012 02:23:26 +0000 (02:23 +0000)]
MFp4 bz_ipv6_fast:
Add code to handle pre-checked TCP checksums as indicated by mbuf
flags to save the entire computation for validation if not needed.
In the IPv6 TCP output path only compute the pseudo-header checksum,
set the checksum offset in the mbuf field along the appropriate flag
as done in IPv4.
In tcp_respond() just initialize the IPv6 payload length to 0 as
ip6_output() will properly set it.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
bz [Fri, 25 May 2012 02:19:17 +0000 (02:19 +0000)]
MFp4 bz_ipv6_fast:
Defer checksum calulations on UDP6 output and respect the mbuf
flags set by NICs having done checksum validation for us already,
thus saving the computing time in the input path as well.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
bz [Fri, 25 May 2012 02:17:16 +0000 (02:17 +0000)]
MFp4 bz_ipv6_fast:
Add support for delayed checksum calculations in the IPv6
output path. We currently cannot offload to the card if we
add extension headers (which incl. fragmentation).
Fix two SCTP offload support copy&paste bugs: calculate
checksums if fragmenting and no need to flag IPv4 header
checksums in the IPv6 forwarding path.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
adrian [Fri, 25 May 2012 02:07:59 +0000 (02:07 +0000)]
Prepare for improved (read: pcie) suspend/resume support.
* Flesh out the pcie disable method for 11n chips, as they were defaulting
to the AR5212 (empty) PCIe disable method.
* Add accessor macros for the HAL PCIe enable/disable calls.
* Call disable on ath_suspend()
* Call enable on ath_resume()
NOTE:
* This has nothing to do with the NIC sleep/run state - the NIC still
will stay in network-run state rather than supporting network-sleep
state. This is preparation work for supporting correct suspend/resume
WARs for the 11n PCIe NICs.
TODO:
* It may be feasible at this point to keep the chip powered down during
initial probe/attach and only power it up upon the first configure/reset
pass. This however would require correct (for values of "correct")
tracking of the NIC power configuration state from the driver and that
just isn't attempted at the moment.
Tested:
* AR9280 on my Lenovo T60, but with no suspend/resume pass (yet).
bz [Fri, 25 May 2012 01:48:15 +0000 (01:48 +0000)]
MFp4 bz_ipv6_fast:
Hide the ip6aux functions. The only one referenced outside ip6_input.c
is not compiled in yet (__notyet__) in route6.c (r235954). We do have
accessor functions that should be used.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
X-MFC: KPI?
bz [Fri, 25 May 2012 01:13:39 +0000 (01:13 +0000)]
MFp4 bz_ipv6_fast:
Factor out the tcp_hc_getmtu() call. As the comments say it
applies to both v4 and v6, so only write it once making it easier
to read the protocol family specifc code.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
bz [Thu, 24 May 2012 23:03:23 +0000 (23:03 +0000)]
MFp4 bz_ipv6_fast:
Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future,
though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
bz [Thu, 24 May 2012 22:00:48 +0000 (22:00 +0000)]
MFp4 bz_ipv6_fast:
in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).
We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
marcel [Thu, 24 May 2012 21:23:13 +0000 (21:23 +0000)]
A few improvements:
1. Define all registers. These definitions are needed to support
the FCM driver for direct-connect NAND.
2. Repurpose lbc_read_reg() and lbc_write_reg() for use by localbus
attached device drivers. Use bus_space functions directly in the
lbc driver itself.
3. Be smarter about programming LAWs and mapping memory. The ranges
defined in the FDT are per bank (= chip select) and since we can
have up to 8 banks, we could easily use more than 8 LAWs or TLB
enrties when per-bank memory ranges need multiple LAWs or TLBs
due to alignment or size constraints.
We now combine all memory ranges into the fewest possible set of
contiguous regions and program the hardware for that. Thus, a
cleverly written FDT with 8 devices may still only need 1 LAW or
1 TLB entry. Note that the memory ranges can be assigned randomly
to the banks. We sort as we build to handle that.
4. Support the FCM when programming the OR register. This is mostly
for documention purposes as we do not have a way to define the
mode for a bank.
5. Remove Semihalf-ism: do not define DEBUG (only to undefine it
again).
marcel [Thu, 24 May 2012 21:07:10 +0000 (21:07 +0000)]
Just return if the size of the window is 0. This can happen when the
FDT does not define all ranges possible for a particular node (e.g.
PCI).
While here, only update the trgt_mem and trgt_io pointers if there's
no error. This avoids that we knowingly write an invalid target (= -1).
marcel [Thu, 24 May 2012 20:58:40 +0000 (20:58 +0000)]
o Rename kernload_ap to bp_kernelload. This to introduce a common prefix
for variables that live in the boot page.
o Add bp_trace (yes, it's in the boot page) that gets zeroed before we
try to wake a core and to which the core being woken can write markers
so that we know where the core was in case it doesn't wake up. The
boot code does not yet write markers (too follow).
o Disable the boot page translation to allow the last 4K page to be used
for whatever we please. It would get mapped otherwise.
o Fix kernstart in the case of SMP. The start argument is typically page
aligned due to the alignment requirements that come with having a boot
page. The point of using trunc_page is that we get the actual load
address given that the entry point is immediately following the ELF
headers. In the SMP case this ended up exactly 4K after the load
address. Hence subtracting 1 from start.
marcel [Thu, 24 May 2012 20:45:44 +0000 (20:45 +0000)]
Fix the memory barriers for CPUs that do not like lwsync and wedge or cause
exceptions early enough during boot that the kernel will do ithe same.
Use lwsync only when compiling for LP64 and revert to the more proven isync
when compiling for ILP32. Note that in the end (i.e. between revision 222198
and this change) ILP32 changed from using sync to using isync. As per Nathan
the isync is needed to make sure I/O accesses are properly serialized with
locks and isync tends to be more effecient than sync.
While here, undefine __ATOMIC_ACQ and __ATOMIC_REL at the end of the file
so as not to leak their definitions.
marcel [Thu, 24 May 2012 20:24:49 +0000 (20:24 +0000)]
Preset (clear) the ranges we're supposed to fill from the FDT. If a
particular range (either I/O memory or I/O port) is not defined in
the FDT, we're not handing uninitialized structures back to our caller.
marcel [Thu, 24 May 2012 20:12:46 +0000 (20:12 +0000)]
Allow building for the PowerPC EABI by providing a dummy __eabi()
function. The purpose of the __eabi() function is to set up the
runtime and is called first thing by main(). The runtime is already
set up for us prior to caling main, so there's nothing to do for
us in the EABI case.
marcel [Thu, 24 May 2012 20:00:58 +0000 (20:00 +0000)]
Fix an inconsistency I just ran into for LDADD and DPADD. The description
for both of them use different, and presumably wrong, variables in the
example. They set LDFILES and SRCLIB respectively. I guess that's what
DPADD and LDADD were called first ...
marcel [Thu, 24 May 2012 19:48:15 +0000 (19:48 +0000)]
Work better with how make/bmake works:
1. Avoid a cd back into ${.CURDIR} to run mkbuiltins when we know make
will first cd into ${.OBJDIR}. Keep the cwd to what make sets it to.
2. Don't tell mkbuiltins where to write to (= ${.OBJDIR}), but where to
get sources from (= ${.CURDIR}). This to compensate for point 1.
This fixes a problem with bmake's mk files that optimize ${.OBJDIR} to
expand to "." after changing cwd, not taking into account that the
target is pretty much undoing that and not getting the full path to the
object tree anymore.
bz [Thu, 24 May 2012 18:25:09 +0000 (18:25 +0000)]
MFp4 bz_ipv6_fast:
Introduce a (for now copied stripped down) in6_cksum_pseudo()
function. We should be able to use this from in6_cksum() but
we should also ponder possible MD specific improvements.
It takes an extra csum argument to allow for easy checks as
will be done by the upper layer protocol input paths.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
glebius [Thu, 24 May 2012 18:22:57 +0000 (18:22 +0000)]
Revert r220768 for ng_ksocket. This node is special and
when it is cloning, its constructor method may be called
in a context that isn't allowed to sleep.
bz [Thu, 24 May 2012 18:05:10 +0000 (18:05 +0000)]
MFp4 bz_ipv6_fast:
Optimize in6_cksum(), re-ordering work and limiting variable
initialization, removing a bzero() for mostly re-initialized
struct values, making use of the newly introduced in6_getscope(),
as well as converting an if/panic to a KASSERT().
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
alc [Thu, 24 May 2012 15:25:35 +0000 (15:25 +0000)]
MF amd64 r233097, r233122
With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.
mav [Thu, 24 May 2012 14:07:44 +0000 (14:07 +0000)]
MFprojects/zfsd:
Revamp the CAM enclosure services driver.
This updated driver uses an in-kernel daemon to track state changes and
publishes physical path location information\for disk elements into the
CAM device database.
mav [Thu, 24 May 2012 11:07:39 +0000 (11:07 +0000)]
MFprojects/zfsd:
- Add low-level support for SATA Enclosure Management Bridge (SEMB)
devices -- SATA equivalents of the SCSI SES/SAF-TE devices.
- Add some utility functions for SCSI SAF-TE devices access.