emaste [Fri, 2 Mar 2007 01:44:04 +0000 (01:44 +0000)]
Add "setcounter" and "getcounter" messages, providing the the ability
to embed up to four counters in outgoing packets. The message specifies
the offset at which the counter should be inserted as well as the
parameters of the counter.
emaste [Thu, 1 Mar 2007 23:16:17 +0000 (23:16 +0000)]
Add "settimestamp" and "gettimestamp" messages, providing the the ability
to embed a timestamp (struct timeval) in outgoing packets. The message
specifies the offset at which the timestamp should be inserted.
NG_SOURCE(4) gives an example usage that queues an ICMP packet. Using that
example, the following command will insert a timestamp in the ICMP's data
payload:
bms [Thu, 1 Mar 2007 14:38:08 +0000 (14:38 +0000)]
Introduce a new mbuf flag, M_PROMISC.
An mbuf packet chain with the M_PROMISC flag set contains a unicast packet
received by the link layer, which does not correspond to any configured
link layer address in the local system.
It is copied when copying m_pkthdr. It is not cleared when crossing layers.
As such, it is defined to have a flag value which is outside of the
M_PROTO* range, like M_VLANTAG has.
bms [Thu, 1 Mar 2007 13:29:30 +0000 (13:29 +0000)]
Fix undirected broadcast sends for the case where SO_DONTROUTE has also
been set at the socket layer, in our somewhat convoluted IPv4 source
selection logic in ip_output().
IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255
must always be delivered on a local link with a TTL of 1.
If IP_ONESBCAST has been set at the socket layer, also perform destination
interface lookup for point-to-point interfaces based on the destination
address of the link; previously it was not possible to use the option with
such interfaces; also, the destination/broadcast address fields map to the
same field within struct ifnet, which doesn't help matters.
One more valid fix going forward for these issues is to treat 255.255.255.255
as a destination in its own right in the forwarding trie. Other
implementations do this. It fits with the use of multiple paths, though
it then becomes necessary to specify interface preference.
This hack will eventually go away when that comes to pass.
kmacy [Thu, 1 Mar 2007 09:35:48 +0000 (09:35 +0000)]
Evidently I've overestimated gcc's ability to peak inside inline functions
and optimize away unused stack values. The 48 bytes that the lock_profile_object
adds to the stack evidently has a measurable performance impact on certain workloads.
rwatson [Thu, 1 Mar 2007 09:00:42 +0000 (09:00 +0000)]
Remove two simultaneous acquisitions of multiple unpcb locks from
uipc_send in cases where only a global read lock is held by breaking
them out and avoiding the unpcb lock acquire in the common case. This
avoids deadlocks which manifested with X11, and should also marginally
further improve performance.
kientzle [Thu, 1 Mar 2007 06:22:34 +0000 (06:22 +0000)]
Because the buffer gets released immediately, I need to
copy the symlink target name, not just copy the reference.
This problem sometimes caused crashes when extracting
symlinks from ISO9660 images.
jmallett [Wed, 28 Feb 2007 22:49:12 +0000 (22:49 +0000)]
Increase helpfulness in diagnostic message - ypbind running without -ypset or
-ypsetme will prevent use of ypset. Remind the user to check that it was
started correctly.
bms [Wed, 28 Feb 2007 22:05:30 +0000 (22:05 +0000)]
Prepare for 802.1p:
Add macro EVL_APPLY_VLID() which may be used to apply an 802.1q VLAN ID
to the M_VLANTAG field in an mbuf packet header non-destructively.
This will be used by net80211 to begin with.
Add macro EVL_APPLY_PRI() which may be used to apply an 802.1p priority
class to the M_VLANTAG field in an mbuf packet header non-destructively.
Add other macros for manipulating tags and the CFI bit.
Submitted by: Boris Kovalenko (EVL_CFIOFTAG(), EVL_MAKETAG())
bms [Wed, 28 Feb 2007 21:18:38 +0000 (21:18 +0000)]
Nuke ascii2addr() and addr2ascii(). They have no consumers anywhere
in FreeBSD, and originated from INRIA IPv6.
Stub out netstat reference to addr2ascii() I mistakenly introduced.
Update misleading man page sections.
Merge NetBSD's getnameinfo() AF_LINK extensions for a portable way to
print link-layer addresses given a sockaddr_dl(), minus the IEEE 1394
bits which don't map directly to our code.
Obtained from: NetBSD (getnameinfo.c)
Discussed on: current (March 2006)
mohans [Wed, 28 Feb 2007 20:48:00 +0000 (20:48 +0000)]
In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
bms [Wed, 28 Feb 2007 20:32:25 +0000 (20:32 +0000)]
Remove code which would never be used, viz a viz Quality-of-Service;
the token bucket filter got killed in netinet, so it gets killed here
too. Correct comments.
ru [Wed, 28 Feb 2007 20:06:21 +0000 (20:06 +0000)]
Resurrect one of the patches from attic and refine the
lib32 build somewhat. Specifically, instead of spamming
${CC} et al with -I${LIB32TMP}/usr/include which can be
harmful (as has been demonstrated by the ncursesw WIP),
use slightly different approach to achieve the same goal.
This also simplifies things a bit.
glebius [Wed, 28 Feb 2007 12:41:49 +0000 (12:41 +0000)]
Toss the code, that handles errors from ip_output(), to make it more
readable:
- Merge two embedded if() into one.
- Introduce switch() block to handle different kinds of errors.
grog [Tue, 27 Feb 2007 23:09:31 +0000 (23:09 +0000)]
Furhter clarifications:
- the issues with wakeup_one are due to address space clashes between
unrelated groups of threads.
- sleep() was removed in FreeBSD 2.2.
- date the page today, not 4 days ago.
- replace grammatically correct "woken" with "woken up" for
consistency with the function name.
imp [Tue, 27 Feb 2007 22:33:50 +0000 (22:33 +0000)]
Some USB mass storage devices return the number of sectors in response
to a READ_CAPACITY request rather than the maximum sector (off by one
problem). This causes a huge cascade of errors as the geom tasting
code tries to read the last sector (which isn't really there in the
face of this error). automated tools that manipulate disk labels and
such also have issues.
Create a new quirk READ_CAPACITY_OFFBY1 and add a quirk for the
SanDISK ImageMate that I have that suffers from this problem (the
SDDR-31). It intercepts the READ_CAPACITY response and adjusts it
from number of sectors to max sector for devices with this quirk.
Reading the Linux source suggests that there are a host of
other devices with this issue, including iPods and some popular
cameras. I've not added quirks for them, since I don't have the
devices in front of me to test.
jhb [Tue, 27 Feb 2007 19:40:26 +0000 (19:40 +0000)]
Use pause() in vm_object_deallocate() to yield the CPU to the lock holder
rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended
to awaken the swapper, not random threads blocked in
vm_object_deallocate().
jhb [Tue, 27 Feb 2007 17:27:23 +0000 (17:27 +0000)]
Use pause() instead of tsleep()'s on the softc pointer that have no
corresponding wakeups. Also, at least some of the comments nearby indicate
that these are fixed-length I/O sleeps.
piso [Tue, 27 Feb 2007 17:09:20 +0000 (17:09 +0000)]
Do not execute filter only handlers in ithread_execute_handlers():
this fixes the panics when filter only and ithread only handlers where
sharing the same irq .
kmacy [Tue, 27 Feb 2007 06:42:05 +0000 (06:42 +0000)]
Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
bde [Tue, 27 Feb 2007 04:54:33 +0000 (04:54 +0000)]
Use a periodic itimer instead of repeated calls to alarm() in
sidewaysintpr(). This increases the accuracy of the per-interval
counts when they are interpreted as rates. Repeated calls to alarm(n)
give an average interval that is about 2 ticks larger than n and has
a large variance. Periodic itimers normally get the average almost
right but have similarly large variance (due to scheduling delays).
Statistics utilities should use clock_gettime() to determine the
actual interval, but it is still useful to maximize the accuracy of
the interval, especially for cases like netstat -w where counts are
displayed so the program cannot hide the inaccuracy in a rate
conversion.
mjacob [Tue, 27 Feb 2007 04:01:58 +0000 (04:01 +0000)]
First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months