When sending packets directly to the DHCP server, use a socket and send
directly rather than bogusly sending it out as a link layer broadcast
(which fails to be received on some networks).
Delay the global registration of the struct ifnet in if_alloc() until after
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.
Reported by: vijay singh <vijjus at rocketmail dot com>
Protection against errant sender sending a stream
seq number out of order with no missing TSN's (a
cisco box has this problem which will make a ssn
be held forever).
Merge in_pcb.c:1.203, in6_pcb.c:1.88 from HEAD to RELENG_7:
In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and
eliminate unnecessary local variable caching of the list head pointer,
making the code a bit easier to read.
Remove Giant acquisition around soreceive() and sosend() in fifofs.
The bug that caused us to reintroduce it is believed to be fixed,
and Kris says he no longer sees problems with fifofs in highly
parallel builds.
Provide more detailed information about each procstat(1) display mode,
including a key to fields in each mode and flag abbreviations.
Note: mention of POSIX shared memory in the man page has been removed
in the MFC, as explicit kernel support for pshm hasn't been merged to
RELENG_7 yet.
Put the arguments of kse_switchin in local variables, rather than
dereferencing uap throughout. On ia64 uap points into the trapframe
and the call to set_mcontext() in this function will change the
trapframe. Consequently, when we dereference uap afterwards we can
best qualify the behaviour as undefined. By putting the arguments
in local variables we also improve code-generation, because the
compiler is not forced to reload after every function call.
Caught by: Christian Kandeler <christian.kandeler@hob.de>
- style(9) cleanup.
- dummynet_io() declaration has changed.
- Alter packet flow inside dummynet and introduce 'fast' mode of dummynet
operation: allow certain packets to bypass dummynet scheduler. Benefits are:
-- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
-- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter
ip stack later. Such packets can be fastforwarded.
-- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost
- New sysctl nodes:
net.inet.ip.dummynet.io_fast - enables 'fast' dummynet io
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet
- Workaround p->numbytes overflow, which can result in infinite loop inside
dummynet module (prerequisite is using queues with "fat" pipe).
o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that
it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free() to counter-act
cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized
in cpu_thread_alloc().
marius [Wed, 23 Apr 2008 21:28:29 +0000 (21:28 +0000)]
- Spelling fix for interupt -> interrupt
- Take advantage of bus_dmamap_load_mbuf_sg(9).
- Take advantage of m_collapse(9).
- Sync with other NIC drivers and prepend a TX mbuf if the first attempt
to load it fails with an error other than EFBIG and stop trying instead
of freeing it and keeping on trying to enqueue more mbufs. Also ensure
the driver queue isn't empty before trying to enqueue mbufs in order to
reduce locking operations.
- In xl_ifmedia_upd() add a missing XL_UNLOCK(). [1]
- Const'ify the xl_devs array.
- Remove an outdated comment.
marius [Wed, 23 Apr 2008 21:25:16 +0000 (21:25 +0000)]
MFC: bus.h 1.42; bus_machdep.c 1.47
- Const'ify the bus_stream_asi and bus_type_asi arrays.
- Replace hard-coded functions names missed in bus_machdep.c rev. 1.44
with __func__.
- Break some long lines.
Take the route mtu into account, if available, when sending an
ICMP unreach, frag needed. Up to now we only looked at the
interface MTU. Make sure to only use the minimum of the two.
In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu()
to avoid any further conditional maths.
Without this, PMTU was broken in those cases when there was a
route with a lower MTU than the MTU of the outgoing interface.
PR: kern/122338
Tested by: Mark Cammidge mark peralex.com
Back out rev. 1.161 (1.157.2.2) and
switch back to optimized TCP options ordering.
A lot of testing has shown that the problem people were seeing was due
to invalid padding after the end of option list option, which was corrected
in tcp_output.c rev. 1.146 (1.141.2.4).
Thanks to: anders@, s3raphi, Matt Reimer
Thanks to: Doug Hardie and Randy Rose, John Mayer, Susan Guzzardi
Special thanks to: dwhite@ and BitGravity
Discussed with: silby
MFC if_re.c 1.114 to RELENG_7.
Don't touch MSI enable bit in RL_CFG2 register. For unknown reason
clearing MSI enable bit for MSI capable hardwares resulted in Tx
problems. MSI enable bit is set only when MSI is requested from
user.
MFC if_re.c 1.113, if_rlreg.h 1.74 to RELENG_7.
Padding more bytes than necessary one broke another variants of
PCIe RealTek chips. Only pad IP packets if the payload is less than
28 bytes.
MFC if_re.c 1.112 to RELENG_7.
In revision 1.70, 1.71 and 1.84 re(4) tried to workaround checksum
offload bugs by manual padding for short IP/UDP frames. Unfortunately
it seems that these workaround does not work reliably on newer PCIe
variants of RealTek chips.
To workaround the hardware bug, always pad short frames if Tx IP
checksum offload is requested. It seems that the hardware has a
bug in IP checksum offload handling. NetBSD manually pads short
frames only when the length of IP frame is less than 28 bytes but I
chose 60 bytes to safety. Also unconditionally set IP checksum
offload bit in Tx descriptor if any TCP or UDP checksum offload is
requested. This is the same way as Linux does but it's not
mentioned in data sheet.
MFC if_re.c 1.110, if_rlreg.h 1.73 to RELENG_7.
For MSI capable hardwares, enable MSI enable bit in RL_CFG2
register. If MSI was disabled by hw.re.msi_disable tunable
expliclty clear the MSI enable bit.
MFC if_re.c 1.108 to RELENG_7.
VLAN hardware tag information should be set for all desciptors of a
multi-descriptor transmission attempt. Datasheet said nothing about
this requirements. This should fix a long-standing VLAN hardware
tagging issues with re(4).
MFC if_re.c 1.107 to RELENG_7.
Always honor configured VLAN/checksum offload capabilities.
Previously re(4) used to blindly enable VLAN hardware tag stripping
and Rx checksum offload regardless of enabled optional features of
interface.
MFC if_re.c 1.106, if_rl.c 1.173 to RELENG_7.
Don't map memory/IO resource in device probe and just use PCI
vendor/revision/sub device id of the hardware to probe it.
This is the same way as NetBSD does and it enhances readabilty
a lot.
MFC if_re.c 1.101, if_re.c 1.102, if_rlreg.h 1.70 to RELENG_7.
Overhaul re(4).
o Increased number of Rx/Tx descriptors to 256 for 8169 GigEs
because it's hard to push the hardware to the limit with default
64 descriptors.
TSO requires large number of Tx descriptors to pass a full sized
TCP segment(65535 bytes IP packet) to hardware. Previously it
consumed 32 Tx descriptors, assuming MCLBYTES DMA segment size,
to send the TCP segment which means re(4) couldn't queue more
than two full sized IP packets.
For 8139C+ it still uses 64 Rx/Tx descriptors due to its hardware
limitations. With this changes there are (very) small waste of
memory for 8139C+ users but I don't think it would affect 8139C+
users for most cases.
o Various bus_dma(9) fixes.
- The hardware supports DAC so allow 64bit DMA operations.
- Removed BUS_DMA_ALLOC_NOW flag.
- Increased DMA segment size to 4096 from MCLBYTES because TSO
consumes too many descriptors with MCLBYTES DMA segment size.
- Tx/Rx side bus_dmamap_load_mbuf_sg(9) support. With these
changes the code is more readable than previous one and got a
(slightly) better performance as it doesn't need to pass/
decode arguments to/from callback function.
- Removed unnecessary callback function re_dmamap_desc() and
nuked rl_dmaload_arg structure which was used in the callback.
- Additional protection for DMA map load failure. In case of
failure reuse current map instead of returning a bogus DMA
map.
- Deferred DMA map unloading/sync operation for maximum
performance until we really need to load new DMA map. If we
happen to reuse current map(e.g. input error) there is no need
to sync/unload/load again.
- The number of allowable Tx DMA segments for a mbuf chains are
now 32 instead of magic nseg value. If the number of available
Tx descriptors are short enough to send highly fragmented mbuf
chains an optimized re_defrag() is called to collapse mbuf
chains which is supposed to be much faster than m_defrag(9).
re_defrag() was borrowed from ath(4).
- Separated Rx/Tx DMA tag from a common DMA tag such that Rx DMA
tag correctly uses DMA maps that were created with DMA alignment
restriction(8bytes alignments). Tx DMA tag does not have such
alignment limitation.
- Added additional sanity checks for DMA ring map load failure.
- Added additional spare Rx DMA map for graceful handling of Rx
DMA map load failure.
- Fixed misused bus_dmamap_sync(9) and added missing
bus_dmamap_sync(9) in re_encap()/re_txeof()/re_rxeof().
o Enabled TSO again as re(4) have reasonable number of Tx
descriptors.
o Don't touch DMA address of a Tx descriptor in re_txeof(). It's
not needed.
o Fix incorrect update of if_ierrors counter. For Rx buffer
shortage it should update if_qdrops as the buffer is reused.
o Added checks for unsupported H/W revisions and return ENXIO for
these hardwares. This is required to remove resource allocation
code in re_probe as other drivers do in device probe routine.
o Modified descriptor index manipulation macros as it's now possible
to have different number of descriptors for Rx/Tx.
o In re_start, to save a lock operation, use IFQ_DRV_IS_EMPTY before
trying to invoke IFQ_DRV_DEQUEUE. Also don't blindly call re_encap
since we already know the number of available Tx descriptors in
advance.
o Removed RL_TX_DESC_THLD which was used to reserve RL_TX_DESC_THLD
descriptors in Tx path. There is no such a limitation mentioned in
8139C+/8169/8110/8168/8101/8111 datasheet and it seems to work ok
without reserving RL_TX_DESC_THLD descriptors.
o Fix a comment for RL_GTXSTART. The register is 8bits register.
o Added comments for 8169/8139C+ hardware restrictions on descriptors.
o Removed forward declaration for "struct rl_softc", it's not needed.
o Added a new structure rl_txdesc for Tx descriptor managements and
a structure rl_rxdesc for Rx descriptor managements.
o Removed unused member variable rl_intlock in driver softc. There are
still several unused member variables which are supposed to be used
to access hardware statistics counters. But it seems that accessing
hardware counters were not implemented yet.
The final stage of the big CDDL file move. These files were repo
copied in head to their new location under the 'cddl' directories
of src and src/sys and then added for this branch. The build has
been using the files in their new locations for a few days now.
antoine [Sun, 20 Apr 2008 19:32:46 +0000 (19:32 +0000)]
MFC to RELENG_7:
Introduce a new F_DUP2FD command to fcntl(2), for compatibility with
Solaris and AIX.
fcntl(fd, F_DUP2FD, arg) and dup2(fd, arg) are functionnaly equivalent.
Document it.
Add some regression tests (identical to the dup2(2) regression tests).
Merge audit.c:1.41, audit_arg.c:1.19, audit_bsm.c:1.26,
audit_bsm_klib.c:1.15, audit_bsm_token.c:1.15, audit_pipe.c:1.15,
audit_syscalls.c:1.26, audit_trigger.c:1.8, audit_worker.c:1.23
from HEAD to RELENG_7:
Use __FBSDID() for $FreeBSD$ IDs in the audit code.
antoine [Sun, 20 Apr 2008 16:29:01 +0000 (16:29 +0000)]
MFC to RELENG_7:
Merge changes from NetBSD on humanize_number.c, 1.8 -> 1.13
Significant changes:
- rev. 1.11: Use PRId64 instead of a cast to long long and %lld to print
an int64_t.
- rev. 1.12: Fix a bug that humanize_number() produces "1000" where it
should be "1.0G" or "1.0M". The bug reported by Greg Troxel.
MFC some small optimizations:
rev. 1.151: Remove impossible (hk_peer == NULL) check.
rev. 1.152: Remove ng_setisr() call from ng_dequeue().
rev. 1.153: There is no need to erase hook->hk_node before freeing hook.
rev. 1.154: Use new atomic_fetchadd() primitive instead of looping atomic_cmpset().
rev. 1.158: ng_address_hook() microoptimization.