John Baldwin [Fri, 9 Dec 2011 19:24:17 +0000 (19:24 +0000)]
- Add a test for PR 151758.
- While here, make this compile and work on non-i386:
- Use CMSG_SPACE(), CMSG_LEN(), and CMSG_FIRSTHDR() instead of ignoring
padding between 'struct cmsghdr' and control message payloads.
- Don't initialize the control message before calling recvmsg().
Instead, check that we get a valid control message on return from
recvmsg().
- Use errx() instead of err() for some errors that don't report failures
that set errno.
Pyun YongHyeon [Fri, 9 Dec 2011 19:10:38 +0000 (19:10 +0000)]
Announce flow control ability to PHY driver and enable RX flow
control. Controller does not automatically generate pause frames
based on number of available RX buffers so it's very hard to
know when driver should generate XON frame in time. The only
mechanism driver can detect low number of RX buffer condition is
ET_INTR_RXRING0_LOW or ET_INTR_RXRING1_LOW interrupt. This
interrupt is generated whenever controller notices the number of
available RX buffers are lower than pre-programmed value(
ET_RX_RING0_MINCNT and ET_RX_RING1_MINCNT register). This scheme
does not provide a way to detect when controller sees enough number
of RX buffers again such that efficient generation of XON/XOFF
frame is not easy.
While here, add more flow control related register definition.
Pyun YongHyeon [Fri, 9 Dec 2011 18:34:45 +0000 (18:34 +0000)]
Remove unnecessary definition of ET_PCIR_BAR. Controller support
I/O memory only.
While here, use pci_set_max_read_req(9) rather than directly
manipulating PCIe device control register.
Pyun YongHyeon [Fri, 9 Dec 2011 18:17:02 +0000 (18:17 +0000)]
Do not disable interrupt without knowing whether the raised
interrupt is ours. Note, interrupts are automatically ACKed when
the status register is read.
Add RX/TX DMA error to interrupt handler and do full controller
reset if driver happen to encounter these errors. There is no way
to recover from these DMA errors without controller reset.
Rename local variable name intrs with status to enhance
readability.
While I'm here, rename ET_INTR_TXEOF and ET_INTR_RXEOF to
ET_INTR_TXDMA and ET_INTR_RXDMA respectively. These interrupts
indicate that a frame is successfully DMAed to controller's
internal FIFO and they have nothing to do with EOF(end of frame).
Driver does not need to wait actual end of TX/RX of a frame(e.g.
no need to wait the end signal of TX which is generated when a
frame in TX FIFO is emptied by MAC). Previous names were somewhat
confusing.
John Baldwin [Fri, 9 Dec 2011 17:49:34 +0000 (17:49 +0000)]
Explicitly use curthread while manipulating td_fpop during last close
of a devfs file descriptor in devfs_close_f(). The passed in td argument
may be NULL if the close was invoked by garbage collection of open
file descriptors in pending control messages in the socket buffer of a
UNIX domain socket after it was closed.
Peter Holm [Fri, 9 Dec 2011 17:19:41 +0000 (17:19 +0000)]
Move cpu_set_upcall(newtd, td) up before the first call of
thread_free(newtd). This to avoid a possible page fault in
cpu_thread_clean() as seen on amd64 with syscall fuzzing.
Glen Barber [Fri, 9 Dec 2011 02:30:56 +0000 (02:30 +0000)]
Update du(1):
- Sort arguments alphabetically where appropriate
- '-B blocksize' is not mutually exclusive of '-h|-k|-m'
- Mention '-t' in synopsis
- Other wording improvements
- Update usage() output to reflect the new synopsis [1]
- Other miscellaneous improvements
Ruslan Ermilov [Thu, 8 Dec 2011 13:45:32 +0000 (13:45 +0000)]
Cherry-pick vendor changes to mdoc:
: 2011-10-23 Ingo Schwarze <schwarze@openbsd.org>
:
: [mdoc] Synchronize string tables with the mandoc(1) utility.
:
: * tmac/doc-common: Add many architecture names used in NetBSD and
: OpenBSD (and "arm" from FreeBSD) and remove the duplicate OS version
: entry for Darwin-10.6.0.
:
: * tmac/doc-syms: Add many library names used in NetBSD and FreeBSD.
:
: * tmac/groff_mdoc.man: Document all supported architecture names, OS
: versions, and library names.
:
: 2011-09-11 Joseph Koshy <jkoshy@users.sourceforge.net>
:
: [mdoc] Add some library strings.
:
: * tmac/doc-syms: Add `libdwarf' and `libelf'.
: * tmac/groff_mdoc.man: Document them.
:
: 2011-07-03 Guillem Jover <guillem@debian.org>
:
: mdoc: Update more OS versions strings.
:
: * tmac/doc-common: Add versions strings for NetBSD, OpenBSD,
: FreeBSD, and DragonFly.
Pyun YongHyeon [Wed, 7 Dec 2011 23:20:14 +0000 (23:20 +0000)]
Disable all clocks and put PHY into COMA before entering into
suspend state. This will save more power.
On resume, make sure to enable all clocks. While I'm here, if
controller is not fast ethernet, enable gigabit PHY.
Pyun YongHyeon [Wed, 7 Dec 2011 22:04:57 +0000 (22:04 +0000)]
Consistently use a tab character instead of using either a space or
tab after #define.
While I'm here consistently use capital letters when it uses
hexadecimal notation.
Pyun YongHyeon [Wed, 7 Dec 2011 21:54:44 +0000 (21:54 +0000)]
Protect SIOCSIFMTU ioctl handler with driver lock.
Don't blindly re-initialize controller whenever MTU is changed.
Now, reinitializing is done only when driver is running.
While here, remove unnecessary assignment of error value since it
was already initialized to 0.
Pyun YongHyeon [Wed, 7 Dec 2011 21:29:51 +0000 (21:29 +0000)]
Rework link state tracking and TX/RX MAC configuration.
o Do not report link status if driver is not running.
o TX/RX MAC configuration should be done with resolved speed,
duplex and flow control after establishing a link so it can't
be done in driver initialization routine.
Move the configuration to miibus_statchg callback which will be
called whenever any link state change is detected.
At this moment, flow-control is not enabled yet mainly because
I was not able to set correct flow control parameters to
generate TX pause frames.
o Now TX/RX MAC is enabled only when a valid link is detected.
Rearragnge hardware initialization routine a bit to leave
enabling MAC to miibus_statchg callback. In order to that,
TX/RX DMA engine is enabled in et_init_locked().
o Introduce ET_FLAG_LINK flag to track current link state.
o Introduce ET_FLAG_FASTETHER flag to mark whether controller is
fast ethernet. This flag is checked in miibus_statchg callback
to know whether PHY established a valid link.
o In et_stop(), TX/RX MAC is explicitly disabled instead of
relying on et_reset(). And move et_reset() from et_stop() to
controller initialization. Controler reset is not required here
and it would also clear critial registers(i.e station address,
RX filter configuration, WOL etc) that are required to make WOL
work.
o Switching to current media is done in et_init_locked() after
setting IFF_DRV_RUNNING flag. This should ensure reliable
auto-negotiation/manual link establishment.
o In et_start_locked(), check whether driver got a valid link
before trying to send frames.
o Remove checking a link in et_tick() as this is done by
miibus_statchg callback.
David Chisnall [Wed, 7 Dec 2011 21:17:50 +0000 (21:17 +0000)]
As per das@'s suggestion, s/__noreturn/_Noreturn/, since the latter is an
identifier reserved for the implementation in C99 and earlier so there is
no sensible reason for introducing yet another reserved identifier when we
could just use the one C1x uses.
Dimitry Andric [Wed, 7 Dec 2011 21:00:33 +0000 (21:00 +0000)]
Make it possible to use the debug versions of std::map and std::multimap
with clang, by removing two unneeded using declarations. Otherwise, you
would get errors similar to:
/usr/include/c++/4.2/debug/map.h:77:20: error: dependent using declaration resolved to type without 'typename'
using _Base::value_compare;
^
N.B.: Take care when you actually use the debug versions of any
libstdc++ header. They are more likely to contain problems, because
they are exercised far less often, and since the standard library
complexity guarantees don't always apply anymore, compile times can
drastically increase.
Pyun YongHyeon [Wed, 7 Dec 2011 19:43:04 +0000 (19:43 +0000)]
Remove et_enable_intrs(), et_disable_intrs() functions and
manipulation of interrupt register access is done through
CSR_WRITE_4 macro. Also add disabling interrupt into et_reset()
because we want interrupt disabled state after controller reset.
While I'm here slightly change interrupt handler to be more
readable one.
Pyun YongHyeon [Wed, 7 Dec 2011 19:08:54 +0000 (19:08 +0000)]
Controller does not require TX start command for every frame. So
send a single TX command after setting up all TX frames. This
removes unnecessary register accesses and bus_dmamap_sync(9) calls.
et(4) uses TX interrupt moderation so it's possible to have TX
buffers that were already transmitted but waiting for TX completion
interrupt. If the number of available TX descriptor is less then
1/3 of total TX descriptor, try reclaiming first to get enough free
TX descriptors before setting up TX descriptors.
After r228325, et_txeof() no longer tries to send frames after
reclaiming TX buffers. That change was made to give more chance
to transmit frames in main interrupt handler since we can still
send frames in interrupt handler with RX interrupt. So right
before exiting interrupt hander, after enabling interrupt, try to
send more frames. This gives slightly better performance numbers.
While I'm here reduce number of spare TX descriptors from 8 to 4.
Controller does not require reserved TX descriptors, it was just to
reduce TX overhead. After r228325, driver has much lower TX
overhead so it does not make sense to reserve 8 TX descriptors.
Pyun YongHyeon [Wed, 7 Dec 2011 18:17:09 +0000 (18:17 +0000)]
Overhaul bus_dma(9) usage in et(4) and clean up TX/RX path. This
change should make et(4) work on any architectures.
o Remove m_getl inline function and replace it with stanard mbuf
interfaces. Previous code tried to minimize code duplication
but this came from incorrect use of common DMA tag.
Driver may be still use a common RX allocation handler with
additional structure changes but I don't see much point to do
that it would make it hard to understand the code.
o Remove DragonflyBSD specific constant EVL_ENCAPLEN, use
ETHER_VLAN_ENCAP_LEN instead.
o Add bunch of new RX status definition. It seems controller
supports RX checksum offloading but I was not able to make the
feature work yet. Currently driver checks whether recevied
frame is good one or not.
o Avoid a typedef ending in '_t' as style(9) says.
o Controller has no restriction on DMA address space, so there
is no reason to limit the DMA address to 32bit. Descriptor
rings, status blocks and TX/RX buffers now use full 64bit DMA
addressing.
o Allocate DMA memory shared between host and controller as
coherent.
o Create 3 separate DMA tags to be used as TX, mini RX ring and
stanard RX ring. Previously it created a single DMA tag and it
was used to all three rings.
o et(4) does not support jumbo frame at this moment and I still
don't quite understand how jumbo frame works on this controller
so use two RX rings to handle small sized frame and normal sized
frame respectively. The mini RX ring will be used to receive
frames that are less than or equal to 127 bytes. The second RX
ring is used to receive frames that are not handled by the first
RX ring.
If jumbo frame support is implemented, driver may have to choose
better RX scheme by letting the second RX ring handle jumbo
frames. This scheme will mimic Broadcom's efficient jumbo frame
handling feature. However RAM buffer size(16KB) of the
controller is too small to hold 2 jumbo frames, if 9KB
jumbo frame is used, I'm not sure how good performance would it
have.
o In et_rxeof(), make sure to check whether controller received
good frame or not. Passing corrupted frame to upper layer is
bad idea.
o If driver receives a bad frame or driver fails to allocate RX
buffer due to resource shortage condition, reuse previously
loaded DMA map for RX buffer instead of unloading/loading RX
buffer again.
o et_init_tx_ring() never fails so change return type to void.
o In watchdog handler, show TX DMA write back status of errored
frame which could be used as a clue to debug watchdog timeout.
o Add missing bus_dmamap_sync() in various places such that et(4)
should work with bounce buffers(e.g. PAE).
o TX side bus_dmamap_load_mbuf_sg(9) support.
o RX side bus_dmamap_load_mbuf_sg(9) support.
o Controller has no DMA alignment limit in RX buffer so use
m_adj(9) in RX buffer allocation to make IP header align on 2
bytes boundary. Otherwise it would trigger unaligned access
error in upper layer on strict alignment architectures.
One of down side of controller is it provides limited set of RX
buffer length like most Intel controllers. This is not problem
at this moment because driver does not support jumbo frame yet
but it may require alignment fixup code to support jumbo frame
on strict alignment architectures.
o In et_txeof(), don't zero TX descriptors for transmitted frames.
TX descriptors don't need write access after transmission.
Driver sets IFF_DRV_OACTIVE when the number of available TX
descriptors are less than or equal to ET_NSEG_SPARE. Make sure
to clear IFF_DRV_OACTIVE only when the number of available TX
descriptor is greater than ET_NSEG_SPARE.
David Chisnall [Wed, 7 Dec 2011 15:25:48 +0000 (15:25 +0000)]
Implement quick_exit() / at_quick_exit() from C++11 / C1x. Also add a
__noreturn macro and modify the other exiting functions to use it.
The __noreturn macro, unlike __dead2, must be used BEFORE the function.
This is in line with the C and C++ specifications that place _Noreturn (c1x)
and [[noreturn]] (C++11) in front of the functions. As with __dead2, this
macro falls back to using the GCC attribute.
Unfortunately, clang currently sets the same value for the C version macro
in C99 and C1x modes, so these functions are hidden by default. At some
point before 10.0, I need to go through the headers and clean up the C1x /
C++11 visibility.
Alan Cox [Wed, 7 Dec 2011 07:03:14 +0000 (07:03 +0000)]
Eliminate the possibility of 32-bit arithmetic overflow in the calculation
of vm_kmem_size that may occur if the system administrator has specified a
vm.vm_kmem_size tunable value that exceeds the hard cap.
Force linker error when created shared library contains a relocation
against text. Provide the override switch to turn off the strict
behaviour. Apparently, openssl libcrypto needs it due to assembler
code not being PIC.
Most users of pipe(2) do not call fstat(2) on the returned pipe descriptors.
Optimize for the case, by lazily allocating the pipe inode number at the
fstat(2) time. If alloc_unr(9) returns failure, do not fail fstat(2), since
uses of inode numbers are even rare then fstat(2), but provide zero inode
forever. Note that alloc_unr() failure is unlikely due to total number
of pipes in the system limited by the number of file descriptors.
Based on the submission by: gianni
MFC after: 2 weeks
c162502
Drop frame if cannot allocate a vtnet_tx_header.
If we don't, we set OACTIVE, but if there are no
other frames in flight, vtnet_txeof() will never
be called to unset OACTIVE. The interface would
have to be down/up'ed in order to become usable.
We could be cuter here and only do this if the
virtqueue is emtpy, but its probably not worth
the complication.
c162501
Start mbuf replacement loop at 1 for clarity
Obtained from: Bryan Venteicher bryanv at daemoninthecloset dot org
Pyun YongHyeon [Tue, 6 Dec 2011 00:58:42 +0000 (00:58 +0000)]
Make et_probe() return BUS_PROBE_DEFAULT such that allow other
driver that has high precedence for the controller override et(4).
Add missing callout_drain(9) in device detach and rework detach
routine. While I'm here use rman_get_rid(9) instead of using
cached resource id because bus methods are free to change the
id.
Pyun YongHyeon [Tue, 6 Dec 2011 00:18:37 +0000 (00:18 +0000)]
et(4) supports VLAN oversized frame so correctly set header length.
While I'm here remove initializing if_mtu, it is set by
ether_ifattach(9). Also move callout_init_mtx(9) to the right below
driver lock initialization.
Pyun YongHyeon [Mon, 5 Dec 2011 22:55:52 +0000 (22:55 +0000)]
Fix alt(4) support. Also add check for number of available TX
descriptors before trying to send frames. If we're not able to
send a frame, make sure to prepend it to if_snd queue such that
alt(4) should work.
While I'm here prefer ETHER_BPF_MTAP to BPF_MTAP. ETHER_BPF_MTAP
should be used for controllers that support VLAN hardware tag
insertion. The controller supports VLAN tag insertion but lacks
VLAN tag stripping in RX path though.
Marius Strobl [Mon, 5 Dec 2011 21:38:45 +0000 (21:38 +0000)]
- In mii_attach(9) just set the driver for a newly added miibus(4) instance
before calling bus_enumerate_hinted_children(9) (which is the minimum for
this to work) instead of fully probing it so later on we can just call
bus_generic_attach(9) on the parent of the miibus(4) instance. The latter
is necessary in order to work around what seems to be a bzzarre race in
newbus affecting a few machines since r227687, causing no driver being
probed for the newly added miibus(4) instance. Presumably this is the
same race that was the motivation for the work around done in r215348.
Reported and tested by: yongari
- Revert the removal of a static in r221913 in order to help compilers to
produce more optimal code.
Mikolaj Golub [Mon, 5 Dec 2011 19:34:02 +0000 (19:34 +0000)]
Protect kern.proc.auxv and kern.proc.ps_strings sysctls with p_candebug().
Citing jilles:
If we are ever going to do ASLR, the AUXV information tells an attacker
where the stack, executable and RTLD are located, which defeats much of
the point of randomizing the addresses in the first place.
Given that the AUXV information seems to be used by debuggers only anyway,
I think it would be good to move it to p_candebug() now.
The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
under p_candebug().
Alan Cox [Mon, 5 Dec 2011 18:29:25 +0000 (18:29 +0000)]
Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how to
use superpage reservations. So, for the first time, kernel virtual memory
that is allocated by contigmalloc(), kmem_alloc_attr(), and
kmem_alloc_contig() can be promoted to superpages. In fact, even a series
of small contigmalloc() allocations may collectively result in a promoted
superpage.
Eliminate some duplication of code in vm_reserv_alloc_page().
Change the type of vm_reserv_reclaim_contig()'s first parameter in order
that it be consistent with other vm_*_contig() functions.
Pyun YongHyeon [Mon, 5 Dec 2011 18:10:43 +0000 (18:10 +0000)]
Fix off by one error in mbuf access. Previously it caused panic.
While I'm here use NULL to compare mbuf pointer and add additional
check for zero length mbuf before accessing the mbuf.
Ed Schouten [Mon, 5 Dec 2011 16:08:18 +0000 (16:08 +0000)]
Get rid of kludgy per-descriptor state handling in acpi_apm.
Where i386/bios/apm.c requires no per-descriptor state, the ACPI version
of these device do. Instead of using hackish clone lists that leave
stale device nodes lying around, use the cdevpriv API.
Luigi Rizzo [Mon, 5 Dec 2011 15:33:13 +0000 (15:33 +0000)]
add netmap support for "em", "lem", "igb" and "re".
On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.
On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.
Luigi Rizzo [Mon, 5 Dec 2011 12:06:53 +0000 (12:06 +0000)]
1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.
2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.
3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).
Support for other drivers (em, lem, re, igb) will come later.
"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.
Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
Initialize fifoinfo fi_wgen field on open. The only important is the
difference between fi_wgen and f_seqcount, so the change is purely
cosmetic, but it makes the code easier to understand.
Rick Macklem [Sun, 4 Dec 2011 16:33:04 +0000 (16:33 +0000)]
This patch adds a sysctl to the NFSv4 server which optionally disables the
check for a UTF-8 compliant file name. Enabling this sysctl results in
an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled
by default. However, enabling this sysctl results in NFSv3 compatible
behaviour and fixes the problem reported by "dan at sunsaturn.com"
to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat".
Tested by: dan at sunsaturn.com
Reviewed by: zack
MFC after: 2 weeks
The "domain-search" option (option 119) allows a DHCP server to publish
a list of implicit domain suffixes used during name lookup. This option
is described in RFC 3397.
For instance, if the domain-search option says:
".example.org .example.com"
and one wants to resolve "foobar", the resolver will try:
1. "foobar.example.org"
2. "foobar.example.com"
The file /etc/resolv.conf is updated with a "search" directive if the
DHCP server provides "domain-search".
A regression test suite is included in this patch under
tools/regression/sbin/dhclient.
PR: bin/151940
Sponsored by Yakaz (http://www.yakaz.com)
Adrian Chadd [Sun, 4 Dec 2011 11:55:33 +0000 (11:55 +0000)]
Allow the i2c node requirements to be slightly relaxed.
These realtek switch PHYs speak a variant of i2c with some slightly
modified handling.
From the submitter, slightly modified now that some further digging
has been done:
The I2C framework makes a assumption that the read/not-write bit of the first
byte (the address) indicates whether reads or writes are to follow.
The RTL8366 family uses the bus: after sending the address+read/not-write byte,
two register address bytes are sent, then the 16-bit register value is sent
or received. While the register write access can be performed as a 4-byte
write, the read access requires the read bit to be set, but the first two bytes
for the register address then need to be transmitted.
This patch maintains the i2c protocol behaviour but allows it to be relaxed
(for these kinds of switch PHYs, and whatever else Realtek may do with this
almost-but-not-quite i2c bus) - by setting the "strict" hint to 0.
The "strict" hint defaults to 1.
Marius Strobl [Sat, 3 Dec 2011 13:51:57 +0000 (13:51 +0000)]
Revert r225889 a bit. While it's correct that in total store order there's
no need to additionally add CPU memory barriers to the acquire variants of
atomic(9), these are documented to also include compiler memory barriers.
So add the latter, which were previously included by using membar(), back.