jmallett [Wed, 28 Feb 2007 22:49:12 +0000 (22:49 +0000)]
Increase helpfulness in diagnostic message - ypbind running without -ypset or
-ypsetme will prevent use of ypset. Remind the user to check that it was
started correctly.
bms [Wed, 28 Feb 2007 22:05:30 +0000 (22:05 +0000)]
Prepare for 802.1p:
Add macro EVL_APPLY_VLID() which may be used to apply an 802.1q VLAN ID
to the M_VLANTAG field in an mbuf packet header non-destructively.
This will be used by net80211 to begin with.
Add macro EVL_APPLY_PRI() which may be used to apply an 802.1p priority
class to the M_VLANTAG field in an mbuf packet header non-destructively.
Add other macros for manipulating tags and the CFI bit.
Submitted by: Boris Kovalenko (EVL_CFIOFTAG(), EVL_MAKETAG())
bms [Wed, 28 Feb 2007 21:18:38 +0000 (21:18 +0000)]
Nuke ascii2addr() and addr2ascii(). They have no consumers anywhere
in FreeBSD, and originated from INRIA IPv6.
Stub out netstat reference to addr2ascii() I mistakenly introduced.
Update misleading man page sections.
Merge NetBSD's getnameinfo() AF_LINK extensions for a portable way to
print link-layer addresses given a sockaddr_dl(), minus the IEEE 1394
bits which don't map directly to our code.
Obtained from: NetBSD (getnameinfo.c)
Discussed on: current (March 2006)
mohans [Wed, 28 Feb 2007 20:48:00 +0000 (20:48 +0000)]
In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss().
The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.
bms [Wed, 28 Feb 2007 20:32:25 +0000 (20:32 +0000)]
Remove code which would never be used, viz a viz Quality-of-Service;
the token bucket filter got killed in netinet, so it gets killed here
too. Correct comments.
ru [Wed, 28 Feb 2007 20:06:21 +0000 (20:06 +0000)]
Resurrect one of the patches from attic and refine the
lib32 build somewhat. Specifically, instead of spamming
${CC} et al with -I${LIB32TMP}/usr/include which can be
harmful (as has been demonstrated by the ncursesw WIP),
use slightly different approach to achieve the same goal.
This also simplifies things a bit.
glebius [Wed, 28 Feb 2007 12:41:49 +0000 (12:41 +0000)]
Toss the code, that handles errors from ip_output(), to make it more
readable:
- Merge two embedded if() into one.
- Introduce switch() block to handle different kinds of errors.
grog [Tue, 27 Feb 2007 23:09:31 +0000 (23:09 +0000)]
Furhter clarifications:
- the issues with wakeup_one are due to address space clashes between
unrelated groups of threads.
- sleep() was removed in FreeBSD 2.2.
- date the page today, not 4 days ago.
- replace grammatically correct "woken" with "woken up" for
consistency with the function name.
imp [Tue, 27 Feb 2007 22:33:50 +0000 (22:33 +0000)]
Some USB mass storage devices return the number of sectors in response
to a READ_CAPACITY request rather than the maximum sector (off by one
problem). This causes a huge cascade of errors as the geom tasting
code tries to read the last sector (which isn't really there in the
face of this error). automated tools that manipulate disk labels and
such also have issues.
Create a new quirk READ_CAPACITY_OFFBY1 and add a quirk for the
SanDISK ImageMate that I have that suffers from this problem (the
SDDR-31). It intercepts the READ_CAPACITY response and adjusts it
from number of sectors to max sector for devices with this quirk.
Reading the Linux source suggests that there are a host of
other devices with this issue, including iPods and some popular
cameras. I've not added quirks for them, since I don't have the
devices in front of me to test.
jhb [Tue, 27 Feb 2007 19:40:26 +0000 (19:40 +0000)]
Use pause() in vm_object_deallocate() to yield the CPU to the lock holder
rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended
to awaken the swapper, not random threads blocked in
vm_object_deallocate().
jhb [Tue, 27 Feb 2007 17:27:23 +0000 (17:27 +0000)]
Use pause() instead of tsleep()'s on the softc pointer that have no
corresponding wakeups. Also, at least some of the comments nearby indicate
that these are fixed-length I/O sleeps.
piso [Tue, 27 Feb 2007 17:09:20 +0000 (17:09 +0000)]
Do not execute filter only handlers in ithread_execute_handlers():
this fixes the panics when filter only and ithread only handlers where
sharing the same irq .
kmacy [Tue, 27 Feb 2007 06:42:05 +0000 (06:42 +0000)]
Further improvements to LOCK_PROFILING:
- Fix missing initialization in kern_rwlock.c causing bogus times to be collected
- Move updates to the lock hash to after the lock is released for spin mutexes,
sleep mutexes, and sx locks
- Add new kernel build option LOCK_PROFILE_FAST - only update lock profiling
statistics when an acquisition is contended. This reduces the overhead of
LOCK_PROFILING to increasing system time by 20%-25% which on
"make -j8 kernel-toolchain" on a dual woodcrest is unmeasurable in terms
of wall-clock time. Contrast this to enabling lock profiling without
LOCK_PROFILE_FAST and I see a 5x-6x slowdown in wall-clock time.
bde [Tue, 27 Feb 2007 04:54:33 +0000 (04:54 +0000)]
Use a periodic itimer instead of repeated calls to alarm() in
sidewaysintpr(). This increases the accuracy of the per-interval
counts when they are interpreted as rates. Repeated calls to alarm(n)
give an average interval that is about 2 ticks larger than n and has
a large variance. Periodic itimers normally get the average almost
right but have similarly large variance (due to scheduling delays).
Statistics utilities should use clock_gettime() to determine the
actual interval, but it is still useful to maximize the accuracy of
the interval, especially for cases like netstat -w where counts are
displayed so the program cannot hide the inaccuracy in a rate
conversion.
mjacob [Tue, 27 Feb 2007 04:01:58 +0000 (04:01 +0000)]
First cut at GEOM based multipath. This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
njl [Tue, 27 Feb 2007 00:14:20 +0000 (00:14 +0000)]
Rework EC I/O approach. Implement burst mode, including proper handling of
case where it asynchronously exits burst mode on its own. Handle different
values of hz in sleep loop. Provide more debugging options to tune EC
behavior. These tunables/sysctls may be temporary and are not for user
access if the EC is working properly. Burst mode is now on by default for
testing and the poll interval has been increased from 100 to 500 us and
total timeout from 100 to 500 ms.
Hopefully this should be the first step of addressing reports of timeout
errors during battery or thermal access, especially on HP/Compaq laptops.
It is reasonably stable and should not cause a loss of functionality or
performance on systems that were previously working. Testing shows an
increase of responsiveness by ~75% on one system.
mohans [Mon, 26 Feb 2007 22:25:21 +0000 (22:25 +0000)]
Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate
potential issues where the peer does not close, potentially leaving
thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl
fast_finwait2_recycle, which is disabled by default.
jkim [Mon, 26 Feb 2007 22:24:14 +0000 (22:24 +0000)]
Add three new ioctl(2) commands for bpf(4).
- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF. Set to BPF_D_IN to see only incoming packets on the
interface. Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface. Set to BPF_D_OUT to see only outgoing
packets on the interface. This setting is initialized to BPF_D_INOUT
by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.
- BIOCFEEDBACK sets packet feedback mode. This allows injected packets
to be fed back as input to the interface when output via the interface is
successful. When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication. This flag is initialized to
zero by default.
Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.
rwatson [Mon, 26 Feb 2007 20:47:52 +0000 (20:47 +0000)]
Revise locking strategy used for UNIX domain sockets in order to improve
concurrency:
- Add per-unpcb mutexes protecting unpcb connection state, fields, etc.
- Replace global UNP mutex with a global UNP rwlock, which will protect the
UNIX domain socket connection topology, v_socket, and be acquired
exclusively before acquiring more than per-unpcb at a time in order to
avoid lock order issues.
In performance measurements involving MySQL, this change has little or no
overhead on UP (+/- 1%), but leads to a significant (5%-30%) improvement in
multi-processor measurements using the sysbench and supersmack benchmarks.
rwatson [Mon, 26 Feb 2007 19:05:13 +0000 (19:05 +0000)]
Add rw_wowned() interface to rwlock(9), allowing a kernel thread to
determine if it holds an exclusive rwlock reference or not. This is
non-ideal, but recursion scenarios in the network stack currently
require it.
ru [Mon, 26 Feb 2007 10:45:21 +0000 (10:45 +0000)]
Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.
rwatson [Mon, 26 Feb 2007 10:16:53 +0000 (10:16 +0000)]
Fix a likely bug by adding what appears to be a missing break statement
in the IPX over IP configuration ioctl: when changing the flags on a
tunnel interface, return the generated error rather than always EINVAL.
mckusick [Mon, 26 Feb 2007 08:15:56 +0000 (08:15 +0000)]
Update the dump program to save extended attributes. Update
the restore program to restore all dumped extended attributes.
If the restore is running as root, it will always be able
to restore all extended attributes. If it is not running
as root, it makes a best effort to set them. Using the -v
command line flag or the `verbose' command in interactive
mode will display all the extended attributes being set on
files (and at the end on directories) that are being restored.
It will note any extended attributes that could not be set.
The extended attributes are placed on the dump image immediately
following each file's data. Older versions of restore can work
with the newer dump images. Old versions of restore will
correctly restore the file data and then (silently) skip
over the extended attribute data and proceed to the next file.
This resolves PR 93085 which will be closed once the code
has been MFC'ed.
Note that this code will not compile until these header
files have been updated: <protocols/dumprestore.h> and
<sys/extattr.h>.
PR: bin/93085
Comments from: Poul-Henning Kamp and Robert Watson
MFC after: 3 weeks
mckusick [Mon, 26 Feb 2007 06:18:53 +0000 (06:18 +0000)]
Declare a `struct extattr' that defines the format of an extended
attribute. Also define some macros to manipulate one of these
structures. Explain their use in the extattr.9 manual page.
The next step will be to make a sweep through the kernel replacing
the old pointer manipulation code. To get an idea of how they would
be used, the ffs_findextattr() function in ufs/ffs/ffs_vnops.c is
currently written as follows:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eap, u_char **eac)
{
u_char *p, *pe, *pn, *p0;
int eapad1, eapad2, ealength, ealen, nlen;
uint32_t ul;
pe = ptr + length;
nlen = strlen(name);
for (p = ptr; p < pe; p = pn) {
p0 = p;
bcopy(p, &ul, sizeof(ul));
pn = p + ul;
/* make sure this entry is complete */
if (pn > pe)
break;
p += sizeof(uint32_t);
if (*p != nspace)
continue;
p++;
eapad2 = *p++;
if (*p != nlen)
continue;
p++;
if (bcmp(p, name, nlen))
continue;
ealength = sizeof(uint32_t) + 3 + nlen;
eapad1 = 8 - (ealength % 8);
if (eapad1 == 8)
eapad1 = 0;
ealength += eapad1;
ealen = ul - ealength - eapad2;
p += nlen + eapad1;
if (eap != NULL)
*eap = p0;
if (eac != NULL)
*eac = p;
return (ealen);
}
return(-1);
}
After applying the structure and macros, it would look like this:
/*
* Vnode operating to retrieve a named extended attribute.
*
* Locate a particular EA (nspace:name) in the area (ptr:length), and return
* the length of the EA, and possibly the pointer to the entry and to the data.
*/
static int
ffs_findextattr(u_char *ptr, u_int length, int nspace, const char *name,
u_char **eapp, u_char **eac)
{
struct extattr *eap, *eaend;
eaend = (struct extattr *)(ptr + length);
for (eap = (struct extattr *)ptr; eap < eaend; eap = EXTATTR_NEXT(eap)){
/* make sure this entry is complete */
if (EXTATTR_NEXT(eap) > eaend)
break;
if (eap->ea_namespace != nspace ||
eap->ea_namelength != length ||
bcmp(eap->ea_name, name, length))
continue;
if (eapp != NULL)
*eapp = eap;
if (eac != NULL)
*eac = EXTATTR_CONTENT(eap);
return (EXTATTR_CONTENT_SIZE(eap));
}
return(-1);
}
Not only is it considerably shorter, but it hopefully more readable :-)
delphij [Mon, 26 Feb 2007 03:38:09 +0000 (03:38 +0000)]
Close race conditions between fork() and [sg]etpriority()'s
PRIO_USER case, possibly also other places that deferences
p_ucred.
In the past, we insert a new process into the allproc list right
after PID allocation, and release the allproc_lock sx. Because
most content in new proc's structure is not yet initialized,
this could lead to undefined result if we do not handle PRS_NEW
with care.
The problem with PRS_NEW state is that it does not provide fine
grained information about how much initialization is done for a
new process. By defination, after PRIO_USER setpriority(), all
processes that belongs to given user should have their nice value
set to the specified value. Therefore, if p_{start,end}copy
section was done for a PRS_NEW process, we can not safely ignore
it because p_nice is in this area. On the other hand, we should
be careful on PRS_NEW processes because we do not allow non-root
users to lower their nice values, and without a successful copy
of the copy section, we can get stale values that is inherted
from the uninitialized area of the process structure.
This commit tries to close the race condition by grabbing proc
mutex *before* we release allproc_lock xlock, and do copy as
well as zero immediately after the allproc_lock xunlock. This
guarantees that the new process would have its p_copy and p_zero
sections, as well as user credential informaion initialized. In
getpriority() case, instead of grabbing PROC_LOCK for a PRS_NEW
process, we just skip the process in question, because it does
not affect the final result of the call, as the p_nice value
would be copied from its parent, and we will see it during
allproc traverse.
Other potential solutions are still under evaluation.
kientzle [Mon, 26 Feb 2007 02:07:02 +0000 (02:07 +0000)]
Move _posix1e_acl_name_to_id out of acl_support.c and into
acl_from_text.c. Since acl_from_text.c is the only place it
is used, we can now make this internal utility function "static."
As a bonus, acl_set_fd() no longer pulls in getpwuid() for no reason.
cognet [Mon, 26 Feb 2007 02:03:48 +0000 (02:03 +0000)]
Erm we can't change the value of arm_memcpy if we're running from flash.
Instead, make memcpy() check if we're running from flash, and avoid
using arm_memcpy if we're doing so.
mckusick [Mon, 26 Feb 2007 00:42:17 +0000 (00:42 +0000)]
Implement the -h flag (set an ACL on a symbolic link).
Before this fix the -h flag was ignored (i.e. setfacl
always set the ACL on the file pointed to by the symbolic
link even when the -h flag requested that the ACL be set
on the symbolic link itself).