Kip Macy [Sat, 9 May 2009 01:45:55 +0000 (01:45 +0000)]
- rename atomic.S and crc32.c to avoid collisions when linking zfs in to the kernel
- update Makefile
- ifdef out acl_{alloc, free}, they aren't used by zfs and conflict with existing in-kernel routines
- Fixed incorrect packet length problem caused be earlier change to
support ZERO_COPY_SOCKETS.
- Created #define for context initialization retry count.
Ed Schouten [Fri, 8 May 2009 20:06:37 +0000 (20:06 +0000)]
Burn TTY ioctl bridges in compat layers.
I really don't want any pieces of code to include ioctl_compat.h, so let
the ibcs2 and svr4 compat leave sgtty alone. If they want to support
sgtty, they should emulate it on top of termios, not sgtty.
The code has been marked with BURN_BRIDGES for a long time. ibcs2 and
svr4 are not really popular pieces of code anyway.
Marko Zec [Fri, 8 May 2009 14:11:06 +0000 (14:11 +0000)]
Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg. Struct vprocg is likely to become replaced in the near future with
a new jail management API import.
As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.
Change the internal buffer used to store input lines from a static buffer
to a dynamically allocated one in order to support input lines of
arbitrary length.
Tim Kientzle [Thu, 7 May 2009 23:01:03 +0000 (23:01 +0000)]
Partially revert r191171, which went too far in trying
to eliminate some duplicated code. In particular,
archive_read_open_filename() has different close
handling than archive_read_open_fd(), so delegating
the former to the latter in the degenerate case
(a NULL filename is treated as stdin) broke reading
from pipelines. In particular, this fixes occasional
port failures that were seen when using "gunzip | tar"
pipelines under /bin/csh.
Thanks to Alexey Shuvaev for reporting this failure and
patiently helping me to track down the cause.
Kip Macy [Thu, 7 May 2009 20:28:06 +0000 (20:28 +0000)]
Asynchronously release vnodes to avoid blocking on range locks when calling back in to zfs.
This is based on a fix that went in to opensolaris on March 9th. However, it uses a dedicated
thread instead of a Solaris' taskq to avoid doing a blocking memory allocation with the vnode
interlock held.
This fixes a long-time deadlock in ZFS. This is not, strictly speaking, an LOR. The spa_zio
thread releases a vnode, this calls in to vn_reclaim which in turn needs to acquire range locks
to sync dirty data out to disk. The range locks are already held by a user-level process waiting
on a condition variable that it the process is waiting on a spa_zio thread to signal it on. The
process could not be signalled because the spa_zio thread could not proceed.
The nature of this problem was not apparent due to ZFS locks opting out of witness which meant
that DDB did not know about the locks that were held by ZFS.
Jamie Gritton [Thu, 7 May 2009 18:36:47 +0000 (18:36 +0000)]
Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.
Ed Schouten [Thu, 7 May 2009 17:39:23 +0000 (17:39 +0000)]
If we have a regular rint handler, never go into rint_bypass mode.
It turns out if we called cfmakeraw() on a TTY with only a rint handler
in place, it could inject data into the TTY, even though it should be
redirected. Always take a look at the hooks before looking at the
termios flags.
Dmitry Chagin [Thu, 7 May 2009 14:24:50 +0000 (14:24 +0000)]
Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry,
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.
linux_times() shall use the value that is exported to user space.
Ed Schouten [Thu, 7 May 2009 13:49:48 +0000 (13:49 +0000)]
Add tcsetsid(3).
The entire world seems to use the non-standard TIOCSCTTY ioctl to make a
TTY a controlling terminal of a session. Even though tcsetsid(3) is also
non-standard, I think it's a lot better to use in our own source code,
mainly because it's similar to tcsetpgrp(), tcgetpgrp() and tcgetsid().
I stole the idea from QNX. They do it the other way around; their
TIOCSCTTY is just a wrapper around tcsetsid(). tcsetsid() then calls
into an IPC framework.
Andrew Thompson [Thu, 7 May 2009 02:15:58 +0000 (02:15 +0000)]
- Fix the u3g port detection where it would not calculate the correct number of
ports when multiple interfaces are present.
- Claim all interfaces regardless of how many are attached
Sam Leffler [Thu, 7 May 2009 00:35:32 +0000 (00:35 +0000)]
optimize ath_tx_findrix: there's no need to walk the rates table as
sc_rixmap is an inverse map
NB: could eliminate the check for an invalid rate by filling in 0 for
invalid entries but the rate control modules use it to identify
bogus rates so leave it for now
Sam Leffler [Wed, 6 May 2009 23:49:55 +0000 (23:49 +0000)]
o cleanup checks for which vap combinations are permitted and what to
use for ic_opmode
o fixes the case where creating ahdemo+wds vaps caused ic_opmode to be
set to hostap
Ulf Lilleengen [Wed, 6 May 2009 19:34:32 +0000 (19:34 +0000)]
- Split up the BIO queue into a queue for new and one for completed requests.
This is necessary for two reasons:
1) In order to avoid collisions with the use of a BIOs flags set by a consumer
or a provider
2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
available, so the consumer flags of a BIO had to be misused in order to
support enough flags. The new queue makes it possible to recycle the
GV_BIO_DONE flag into GV_BIO_GROW.
As a consequence, gvinum will now work with any other GEOM class under it or
on top of it.
- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
appear to come from a consumer of a gvinum volume. Use bio_cflags only for
cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.
Ulf Lilleengen [Wed, 6 May 2009 18:21:48 +0000 (18:21 +0000)]
- Split the queue mutex into one for the event queue and one for the BIO queue,
as they do not really relate and to prepare for an additional queue to be
covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
as for putting a new element into the BIO queue.
Silence unsolicited spam printed out when KTR_MLD happens to be
in KTR_COMPILE mask. Compiling KTR trace points in does not necessarily
mean enabling them, use proper check against ktr_mask instead.
Andrew Thompson [Tue, 5 May 2009 15:36:23 +0000 (15:36 +0000)]
Revert part of r191494 which used the udev state to mark suspending, this needs
to be set via two variables (peer_suspended and self_suspended) and can not be
merged into one.
Marko Zec [Tue, 5 May 2009 10:56:12 +0000 (10:56 +0000)]
Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all
vnet instances.
Alexander Motin [Tue, 5 May 2009 01:13:20 +0000 (01:13 +0000)]
Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.
John Baldwin [Mon, 4 May 2009 20:25:56 +0000 (20:25 +0000)]
Always compute the root of the kernel source tree and explicitly pass it
to module builds. This avoids having to have the module builds walk up
the tree to find the kernel sources. It also allows a kernel + module
build to succeed when a new level of module subdirectories is added without
requiring that the /usr/share/mk/bsd.kmod.mk file on the machine be patched.
Jung-uk Kim [Mon, 4 May 2009 18:05:27 +0000 (18:05 +0000)]
Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.
Rick Macklem [Mon, 4 May 2009 15:23:58 +0000 (15:23 +0000)]
Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.
Alan Cox [Mon, 4 May 2009 06:30:00 +0000 (06:30 +0000)]
Eliminate vnode_pager_input_smlfs()'s pointless call to pmap_clear_modify().
The page can't possibly have any modified page table entries because it
isn't even mapped.
Andrew Thompson [Sun, 3 May 2009 18:29:04 +0000 (18:29 +0000)]
Relax the condition for printing the lost state transition message. The new
state will be set before the EXT_STATEWAIT flag is cleared and its ok to
transition again at that point.