Lawrence Stewart [Sat, 16 Oct 2010 05:37:45 +0000 (05:37 +0000)]
- Switch the "net.inet.tcp.reass.cursegments" and
"net.inet.tcp.reass.maxsegments" sysctl variables to be based on UMA zone
stats. The value returned by the cursegments sysctl is approximate owing to
the way in which uma_zone_get_cur is implemented.
- Discontinue use of V_tcp_reass_qsize as a global reassembly segment count
variable in the reassembly implementation. The variable was used without
proper synchronisation and was duplicating accounting done by UMA already. The
lack of synchronisation was particularly problematic on SMP systems
terminating many TCP sessions, resulting in poor TCP performance for
connections with non-zero packet loss.
Sponsored by: FreeBSD Foundation
Reviewed by: andre, gnn, rpaulo (as part of a larger patch)
MFC after: 2 weeks
Lawrence Stewart [Sat, 16 Oct 2010 04:41:45 +0000 (04:41 +0000)]
Change uma_zone_set_max to return the effective value of "nitems" after
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
Lawrence Stewart [Sat, 16 Oct 2010 04:14:45 +0000 (04:14 +0000)]
- Simplify implementation of uma_zone_get_max.
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.
Marius Strobl [Fri, 15 Oct 2010 23:34:31 +0000 (23:34 +0000)]
Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
Jung-uk Kim [Fri, 15 Oct 2010 21:39:51 +0000 (21:39 +0000)]
Move setting power state for children into a separate function as they were
essentially the same. This also restores hw.pci.do_power_resume tunable,
which was broken since r211430.
Andreas Tobler [Fri, 15 Oct 2010 20:08:16 +0000 (20:08 +0000)]
Add three new drivers for fan control and temperature reading on the
PowerMac7,2.
- The fcu driver lets us read and write the fan RPMs for all fans in the
PowerMac7,2. This driver is PowerMac specific.
- The ds1775 is a driver to read the temperature for the drive bay sensor.
- The max6690 is another driver to read temperatures. Here it is used to
read the inlet, the backside and the U3 heatsink temperature.
An additional driver, the ad7417, will follow later.
Thanks to nwhitehorn for guiding me through this driver development.
Marius Strobl [Fri, 15 Oct 2010 15:46:58 +0000 (15:46 +0000)]
Now that all previous users of mii_phy_probe() have been converted
in r213893 and r213894 to use mii_attach() instead remove the former
and along with it the "EVIL HACK".
Currently only opt_compat.h is included by the mps(4) driver. Also
enable /dev/mps0, which was missing from my previous patches enabling
f/w upload and download.
Alan Cox [Fri, 15 Oct 2010 15:23:34 +0000 (15:23 +0000)]
Update pmap_extract() to handle 1GB page mappings. Some device drivers
use pmap_extract() rather than pmap_kextract() on direct map addresses.
Thus, pmap_extract() needs to be able to deal with 1GB page mappings if
we are to use 1GB page mappings for the direct map. (See r197580.)
Marius Strobl [Fri, 15 Oct 2010 15:00:30 +0000 (15:00 +0000)]
Converted the remainder of the NIC drivers to use the mii_attach()
introduced in r213878 instead of mii_phy_probe(). Unlike r213893 these
are only straight forward conversions though.
Marius Strobl [Fri, 15 Oct 2010 14:52:11 +0000 (14:52 +0000)]
Convert the PHY drivers to honor the mii_flags passed down and convert
the NIC drivers as well as the PHY drivers to take advantage of the
mii_attach() introduced in r213878 to get rid of certain hacks. For
the most part these were:
- Artificially limiting miibus_{read,write}reg methods to certain PHY
addresses; we now let mii_attach() only probe the PHY at the desired
address(es) instead.
- PHY drivers setting MIIF_* flags based on the NIC driver they hang
off from, partly even based on grabbing and using the softc of the
parent; we now pass these flags down from the NIC to the PHY drivers
via mii_attach(). This got us rid of all such hacks except those of
brgphy() in combination with bce(4) and bge(4), which is way beyond
what can be expressed with simple flags.
While at it, I took the opportunity to change the NIC drivers to pass
up the error returned by mii_attach() (previously by mii_phy_probe())
and unify the error message used in this case where and as appropriate
as mii_attach() actually can fail for a number of reasons, not just
because of no PHY(s) being present at the expected address(es).
David E. O'Brien [Thu, 14 Oct 2010 23:28:31 +0000 (23:28 +0000)]
Embellish this testcase a little bit to be more clear what the output is
and why. The first case is correct usage which has but one correct output.
The 2nd and 3rd cases are incorrect usage in which the exact output is
not standardized and various shells give various allowable output.
Fixes to mps_user_command():
- fix the leak of command struct on error
- simplify the cleanup logic
- EINPROGRESS is not a fatal error
- buggy comment and error message
Marius Strobl [Thu, 14 Oct 2010 22:01:40 +0000 (22:01 +0000)]
Add a NetBSD-compatible mii_attach(), which is intended to eventually
replace mii_phy_probe() altogether. Compared to the latter the advantages
of mii_attach() are:
- intended to be called multiple times in order to attach PHYs in multiple
passes (f.e. in order to only use sub-ranges of the 0 to MII_NPHY - 1
range)
- being able to pass along the capability mask from the NIC to the PHY
drivers
- being able to specify at which address (phyloc) to probe for a PHY
(instead of always probing at all addresses from 0 to MII_NPHY - 1)
- being able to specify which PHY instance (offloc) to attach
- being able to pass along MIIF_* flags from the NIC to the PHY drivers
(f.e. as required to indicated to the PHY drivers that flow control is
supported by the NIC driver, which actually is the motivation for this
change).
While at it, I used the opportunity to get rid of some hacks in mii(4)
like miibus_probe() generally doing work besides sheer probing and the
"EVIL HACK" (which will vanish entirely along with mii_phy_probe()) by
passing the struct ifnet pointer via an argument of mii_attach() as well
as to fix some resource leaks in mii(4) in case something fails.
Commits which will update the PHY drivers to honor the MII flags passed
down from the NIC drivers and take advantage of mii_attach() to get rid
of certain types of hacks in NIC and PHY drivers as well as a conversion
of the remaining uses of mii_phy_probe() will follow shortly.
Marius Strobl [Thu, 14 Oct 2010 21:46:53 +0000 (21:46 +0000)]
Explicitly lower the PIL to 0 as part of enabling interrupts, similar to
what is done on other platforms. Unlike as with the sched_throw(NULL)
called on BSPs during their startup apparently there's nothing which will
reliably lower it on APs. I'm unsure why this only came up on V215 though,
breaking these with r207248. My best guess is that these are the only
supported ones so far fast enough to loose some race.
Revert most of r197682 (EHCI Hardware BUG workaround). Implement
proper solution which is to not use the TERMINATE pointer, but rather
link to a halted TD. The initial fix was due to a misunderstanding
about how the EHCI hardware works. Thanks to Alan Stern for clearing
this up. This patch can increase mass storage read performance
significantly when the IRQ rate is less than 8000 IRQ/s.
Marius Strobl [Thu, 14 Oct 2010 21:34:53 +0000 (21:34 +0000)]
- In the spirit of r212559 add a comment describing what will eventually
lower the PIL.
- Just as with the AP ensure that the (S)TICK timer(s) are in a known
state when starting BSPs.
Avoid using endless retransmission at EHCI hardware level, hence this hide
errors from the applications. Only use endless retransmission while in the
non-addressed state on a High-Speed device.
- Add missing LibUSB API functions:
* libusb_strerror()
* libusb_get_driver[_np]()
* libusb_detach_kernel_driver[_np]()
- Factor out setting of non-blocking flag inside libusb.
- Add missing NULL check after libusb_get_device() call.
- Correct some wrong error codes due to copy and paste error.
PR: usb/150546
Submitted by: Robert Jenssen, Alexander Leidinger
Approved by: thompsa (mentor)
Rui Paulo [Thu, 14 Oct 2010 19:19:19 +0000 (19:19 +0000)]
Revert r213765. This is required because our build infrastructure uses
the host lex instead of the lex built during buildworld. I will MFC the
lex changes soon and in a few weeks this I'll commit again r213765.
Pyun YongHyeon [Thu, 14 Oct 2010 18:31:40 +0000 (18:31 +0000)]
Make sure to not use stale ip/tcp header pointers. The ip/tcp
header parser uses m_pullup(9) to get access to mbuf chain.
m_pullup(9) can allocate new mbuf chain and free old one if the
space left in the mbuf chain is not enough to hold requested
contiguous bytes. Previously drivers can use stale ip/tcp header
pointer if m_pullup(9) returned new mbuf chain.
Reported by: Andrew Boyer (aboyer <> averesystems dot com)
MFC after: 10 days
Pyun YongHyeon [Thu, 14 Oct 2010 17:22:38 +0000 (17:22 +0000)]
It seems some multi-port dc(4) controllers shares SROM of the first
port such that reading station address from second port always
returned 0xFF:0xFF:0xFF:0xFF:0xFF:0xFF Unfortunately it seems there
is no easy way to know whether SROM is shared or not. Workaround
the issue by traversing dc(4) device list and see whether we're
using second port and use station address of controller 0 as base
station address of second port.
Re-work the internals of adding items to the driver's scatter-gather
list. Use the new internals to simplify adding transaction context
elements, and in future diffs, more complicated SGLs.
Gleb Smirnoff [Thu, 14 Oct 2010 11:20:23 +0000 (11:20 +0000)]
Enable the shared memory reference clock driver. The GPS devices are
getting more and more popular, as source of precise time, and the gpsd
daemon from ports is using the shared memory to synchronize with ntpd.
David Xu [Thu, 14 Oct 2010 08:01:33 +0000 (08:01 +0000)]
In kern_sigtimedwait(), move initialization code out of process lock,
instead of using SIGISMEMBER to test every interesting signal, just
unmask the signal set and let cursig() return one, get the signal
after it returns, call reschedule_signal() after signals are blocked
again.
In kern_sigprocmask(), don't call reschedule_signal() when it is
unnecessary.
In reschedule_signal(), replace SIGISEMPTY() + SIGISMEMBER() with
sig_ffs(), rename variable 'i' to sig.
David E. O'Brien [Wed, 13 Oct 2010 23:29:09 +0000 (23:29 +0000)]
Do not assume in growstackstr() that a "precious" character will be
immediately written into the stack after the call. Instead let the caller
manage the "space left".
Previously, growstackstr()'s assumption causes problems with STACKSTRNUL()
where we want to be able to turn a stack into a C string, and later
pretend the NUL is not there.
This fixes a bug in STACKSTRNUL() (that grew the stack) where:
1. STADJUST() called after a STACKSTRNUL() results in an improper adjust.
This can be seen in ${var%pattern} and ${var%%pattern} evaluation.
2. Memory leak in STPUTC() called after a STACKSTRNUL().
Use a safer mechanism for determining if a task is currently running,
that does not rely on the lifetime of pointers being the same. This also
restores the task KBI.
Pyun YongHyeon [Wed, 13 Oct 2010 22:29:48 +0000 (22:29 +0000)]
Fix a regression introduced in r213710. r213710 removed the use of
auto polling such that it made all controllers obtain link status
information from the state of the LNKRDY input signal. Broadcom
recommends disabling auto polling such that driver should rely on
PHY interrupts for link status change indications. Unfortunately it
seems some controllers(BCM5703, BCM5704 and BCM5705) have PHY
related issues so Linux took other approach to workaround it.
bge(4) didn't follow that and it used to enable auto polling to
workaround it. Restore this old behavior for BCM5700 family
controllers and BCM5705 to use auto polling. For BCM5700 and
BCM5701, it seems it does not need to enable auto polling but I
restored it for safety.
Special thanks to marius who tried lots of patches with patience.
USB network (NCM driver):
- correct the ethernet payload remainder which
must be post-offseted by -14 bytes instead of
0 bytes. This is not very clearly defined in the
NCM specification.
- add development feature about limiting the
maximum datagram count in each NCM payload.
- zero-pad alignment data
- add TX-interval tuning sysctl
Pyun YongHyeon [Wed, 13 Oct 2010 21:53:37 +0000 (21:53 +0000)]
Add more checks for resolved link speed in bge_miibus_statchg().
Link UP state could be reported first before actual completion of
auto-negotiation. This change makes bge(4) reprogram BGE_MAC_MODE,
BGE_TX_MODE and BGE_RX_MODE register only after controller got a
valid link.
Pyun YongHyeon [Wed, 13 Oct 2010 17:55:19 +0000 (17:55 +0000)]
Rewrite interrupt handler to give fairness for both RX and TX.
Previously rl(4) continuously checked whether there are RX events
or TX completions in forever loop. This caused TX starvation under
high RX load as well as consuming too much CPU cycles in the
interrupt handler. If interrupt was shared with other devices which
may be always true due to USB devices in these days, rl(4) also
tried to process the interrupt. This means polling(4) was the only
way to mitigate the these issues.
To address these issues, rl(4) now disables interrupts when it
knows the interrupt is ours and limit the number of iteration of
the loop to 16. The interrupt would be enabled again before exiting
interrupt handler if the driver is still running. Because RX buffer
is 64KB in size, the number of iterations in the loop has nothing
to do with number of RX packets being processed. This change
ensures sending TX frames under high RX load.
RX handler drops a driver lock to pass received frames to upper
stack such that there is a window that user can down the interface.
So rl(4) now checks whether driver is still running before serving
RX or TX completion in the loop.
While I'm here, exit interrupt handler when driver initialized
controller.
With this change, now rl(4) can send frames under high RX load even
though the TX performance is still not good(rl(4) controllers can't
queue more than 4 frames at a time so low TX performance was one of
design issue of rl(4) controllers). It's much better than previous
TX starvation and you should not notice RX performance drop with
this change. Controller still shows poor performance under high
network load but for many cases it's now usable without resorting
to polling(4).
John Baldwin [Wed, 13 Oct 2010 13:17:38 +0000 (13:17 +0000)]
Suggest that DEBUG_FLAGS be used to enable extra debugging rather than
frobbing CFLAGS directly. DEBUG_FLAGS is something that can be specified
on the make command line without having to edit the Makefile directly.
Juli Mallett [Wed, 13 Oct 2010 09:17:44 +0000 (09:17 +0000)]
o) Make it possible to attach a PHY directly to an octe device rather than
using miibus, since for some devices that use multiple addresses on the bus,
going through miibus may be unclear, and for devices that are not standard
MII PHYs, miibus may throw a fit, necessitating complicated interfaces to
fake the interface that it expects during probe/attach.
o) Make the mv88e61xx SMI interface in octe attach a PHY directly and fix some
mistakes in the code that resulted from trying too hard to present a nice
interface to miibus.
o) Add a PHY driver for the mv88e61xx. If attached (it is optional in kernel
compiles so the default behavior of having a dumb switch is preserved) it
will place the switch in a VLAN-tagging mode such that each physical port
has a VLAN associated with it and interfaces for the VLANs can be created to
address or bridge between them.
XXX It would be nice for this to be part of a single module including the
SMI interface, and for it to fit into a generic switch configuration
framework and for it to use DSA rather than VLANs, but this is a start
and gives some sense of the parameters of such frameworks that are not
currently present in FreeBSD. In lieu of a switch configuration
interface, per-port media status and VLAN settings are in a sysctl tree.
XXX There may be some minor nits remaining in the handling of broadcast,
multicast and unknown destination traffic. It would also be nice to go
through and replace the few remaining magic numbers with macros at some
point in the future.
XXX This has only been tested with the MV88E6161, but it should work with
minimal or no modification on related switches, so support for probing
them was included.
Thanks to Pat Saavedra of TELoIP and Rafal Jaworowski of Semihalf for their
assistance in understanding the switch chipset.