truckman [Fri, 16 May 2003 19:46:51 +0000 (19:46 +0000)]
Detect that a vnode has been reclaimed while vflush() was waiting to lock
the vnode and restart the loop. Vflush() is vulnerable since it does not
hold a reference to the vnode and it holds no other locks while waiting
for the vnode lock. The vnode will no longer be on the list when the
loop is restarted.
jhb [Fri, 16 May 2003 15:52:32 +0000 (15:52 +0000)]
- Use better terminology when describing mutex operations in msleep(9)'s
description.
- Remove some bogus commas.
- Use the past tense when referring to the removal of the sleep() function
since it happened quite a while ago and since the previous sentence in the
paragraph already uses the past tense.
des [Fri, 16 May 2003 14:01:02 +0000 (14:01 +0000)]
More configuration tweaks. Rename %CONFIGS to %SETUPS to make the code
clearer (particularly to someone who has read the man page). Don't print
anything on stderr.
marcel [Fri, 16 May 2003 07:57:44 +0000 (07:57 +0000)]
o In pmap_install, don't prevent switching the pmap if we're
switching to kernel_pmap. The pmap is not special enough.
o Clear the active bit on the pmap we're switching out.
o Fix some nearby style(9) bugs.
marcel [Fri, 16 May 2003 07:03:15 +0000 (07:03 +0000)]
Turn pmap_growkernel() into a critical section. While here, initialize
kernel_vm_end in pmap_bootstrap. Don't delay the initialization until
we need to grow the kernel VM space. This BTW happens twice before
we enter either single- or multi-user mode. Don't adjust kernel_vm_end
while growing based on whether the KPT contains a non-NULL entry. We
trust kernel_vm_end to be correct and we make sure it's still correct
after growing.
Define virtual_avail and virtual_end in terms of VM_MIN_KERNEL_ADDRESS
and VM_MAX_KERNEL_ADDRESS (resp). Don't hardcode region knowledge.
marcel [Fri, 16 May 2003 06:40:40 +0000 (06:40 +0000)]
Revamp the RID allocation code:
o Limit the size of the region ID map to 64KB. This gives a bitmap
that is large enough to keep track of 2^19 numbers. The minimal map
size is 32KB. The reason we limit the map size is that processor
models may have implemented a 24-bit region ID, which would give
a 2MB bitmap while the maximum number of allocations is always
less than PID_MAX*5, which is less than 2^19.
o Allocate all region IDs up-front. The slight downside of reserving
more RIDs then a process needs (3 for ia64 native and 1 for ia32)
is preferable over the call to pmap_ensure_rid() where RIDs are
allocated on demand. On SMP systems this may lead to a race
condition.
o When allocating a region ID, don't use arc4random(). We're not
interested in randomness or uniform distribution across the
spectrum. We only need uniqueness. Random numbers may easily
collide when the number of allocated RIDs is high, creating a
possibly unbounded retry rate.
marcel [Fri, 16 May 2003 06:03:45 +0000 (06:03 +0000)]
Sync the linker script with the one used by default for userland. Since
ia64 only uses relocations with addend, remove the sections specific to
non-addend relocations (.rel.*). Also remove C++ specific sections.
tjr [Fri, 16 May 2003 02:15:07 +0000 (02:15 +0000)]
Catch up with the renaming of the "union" filesystem to "unionfs".
Fixes a problem where directory entries could show up twice: once
on the top layer of the union stack, and once on the bottom layer.
obrien [Fri, 16 May 2003 01:34:23 +0000 (01:34 +0000)]
Fix long standing bug that prevents the PT_CONTINUE, PT_KILL and
PT_DETACH ptrace(2) requests from functioning as advertised in the
manual page. As described in kern/35175, the PT_DETACH request will,
under certain circumstances, pass an unwanted signal on to the traced
process upan detaching from it. The PT_CONTINUE request will
sometimes fail if you make it pass a signal that has "properties" that
differ from the properties of the signal that origionally caused the
traced process to be stopped. Since PT_KILL is nothing than
PT_CONTINUE with SIGKILL, it is broken too. In the PT_KILL case, this
leads to an unkillable process.
PR: 44011
Submitted by: Mark Kettenis <kettenis@chello.nl>
Approved by: re(jhb)
rwatson [Fri, 16 May 2003 01:13:16 +0000 (01:13 +0000)]
Add a tunable/sysctl "hw.fxp_noflow" which disables flow control support
on if_fxp cards. When flow control is enabled, if the operating system
doesn't acknowledge the packet buffer filling, the card will begin to
generate ethernet quench packets, but appears to get into a feedback
loop of some sort, hosing local switches. This is a temporary workaround
for 5.1: the ability to configure flow control should probably be
exposed by some or another management interface on ethernet link layer
devices.
tmm [Fri, 16 May 2003 01:10:33 +0000 (01:10 +0000)]
In cpu_fork(), initialize pcb_psl for the new process to PSL_KERNEL,
instead of taking the (userland) eflags from the trap frame and masking
out PSL_I. There is no need to inherit any flags from the forking process;
the old method however can cause flags set in userland for the forking
process to be bogusly set in kernel mode when the newly forked process
runs for the first time (in particular PSL_T, which is set for userland
when the process is single-stepped; this would cause trace traps in
kernel mode).
hmp [Fri, 16 May 2003 00:31:12 +0000 (00:31 +0000)]
Bring the kame(4) manual page closer to reality:
- prefix(8) and gifconfig(8) are deprecated
- dtcpc, dtcps were never imported (also removed from KAME CVS)
- pim6dd, pim6sd and racoon are ports
- inet6d does not exist on FreeBSD
PR: docs/51295
Submitted by: Simon L. Nielsen <simon@nitro.dk>
Content reviewed by: itojun
Approved by: des (mentor), re (bmah)
rwatson [Thu, 15 May 2003 21:13:08 +0000 (21:13 +0000)]
VOP_PATHCONF() requires a vnode lock; this patch adds locking to
fpathconf(). The lock is held for direct calls to VOP_PATHCONF() in
pathconf() already.
Approved by: re (jhb)
Pointed out by: DEBUG_VFS_LOCKS
rwatson [Thu, 15 May 2003 21:12:08 +0000 (21:12 +0000)]
This change grabs the vnode lock for NFS client vnodes when calling
VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS
will panic, and (b) it may be possible to corrupt entries in the cached
vnode attributes in the nfsnode, since nfsnode attribute cache data is
also protected by the vnode lock.
Approved by: re (jhb)
Pointed out by: VFS_DEBUG_LOCKS
rwatson [Thu, 15 May 2003 21:07:33 +0000 (21:07 +0000)]
Jeff added locking assertions that the VV_ flags on vnodes were modified
only while holding appropriate vnode locks. This patch slides the lock
release for ufs_extattr_enable() to continue to hold the active vnode lock
on a backing file until after the flag change; it also acquires a vnode
lock when disabling an attribute and hence clearing a flag on the backing
vnode. This permits VFS_DEBUG_LOCKS to run UFS1 extended attributes
without panicking, as well as preventing a potential race and vnode flag
problem.
Approved by: re (jhb)
Pointed out by: DEBUG_VFS_LOCKS
bmilekic [Thu, 15 May 2003 19:05:28 +0000 (19:05 +0000)]
Make the mb_alloc low-watermark sysctl-tunable read-only and make
netstat(1) not display it for now because its effects are not yet
completely implemented and we're about to cut 5.2-RELEASE.
This is temporary.
julian [Thu, 15 May 2003 18:51:28 +0000 (18:51 +0000)]
fix a cut-n-paste error.
in the case where the bridge node was closed down but a timeout
still applied to it, the final reference to the node was freeing the private
data structure using the wrong malloc type.
mtm [Thu, 15 May 2003 18:17:13 +0000 (18:17 +0000)]
Do some cleanup with respect to condition variables. The implementation
of pthread_cond_timedwait() is moved into cond_wait_common().
Pthread_cond_wait() and pthread_cond_timedwait() are now wrappers around
this function. Previously, the former called the latter with the abstime
pointing to 0 time. This violated Posix semantics should an application
have reason to call it with that argument because instead or returning
immediately it would have waited indefinitely for the cv to be signaled.
Approved by: markm/mentor, re/blanket libthr
Reviewed by: jeff
ru [Thu, 15 May 2003 17:59:32 +0000 (17:59 +0000)]
Use the installed world's idea of OSRELDATE rather than the kernel.
This was the initial intent anyway, and it became clear that it is
really necessary to treat it this way, as many people happen to run
with kernel newer than the installed world.
mtm [Thu, 15 May 2003 17:56:18 +0000 (17:56 +0000)]
o Make the setting/checking of cancel state atomic with
respect to other threads and signal handlers by moving to
the _thread_critical_enter/exit functions.
o Introduce an static function, testcancel(), that is used by
the other functions in this module. This allows it to make
locking assumptions that the top-level functions can't.
o Rework the code flow a bit to reduce indentation levels.
Approved by: markm/mentor, re/blanket libthr
Reviewed by: jeff
njl [Thu, 15 May 2003 17:36:22 +0000 (17:36 +0000)]
Generalize a quirk for Asahi Optical-based cameras (i.e. Pentax). It appears
all of the Optio series have the same problems. It might be a better
approach eventually to add wildcard support to USB quirks.
PR: kern/50271, kern/46369
Approved by: re (rwatson)
tmm [Thu, 15 May 2003 16:57:55 +0000 (16:57 +0000)]
Miscellaneous fixes:
- Fix compilation without GEM_DEBUG.
- Do not #define GEM_DEBUG by default; it adds overhead (due to bzero()ing
RX space) and is not needed any more, since the driver is quite stable
now.
- Fix watchdog timeouts when failing to load TX packets.
- Do not forcibly limit the number of descriptors used for a packet to
GEM_NTXSEGS, by passing this number to bus_dma_tag_create(). There is
no requirement for a limit any lower than the total number of
available descriptors, and the present limit caused network problems
due to mbuf chains requiring more descriptors.
GEM_NTXSEGS is still used to estimate the interrupt window size, for
which we just need an estimate.
mbr [Thu, 15 May 2003 16:53:29 +0000 (16:53 +0000)]
Only use a SIA/SYM media info block if no MII block is detected.
The submitter of PR 32118 told me that this patch also fixes autoselecting
for znyx 4 port cards (10baseT, 100baseTX did work already).
des [Thu, 15 May 2003 13:12:57 +0000 (13:12 +0000)]
Make 'clean' and 'update' commands rather than options. Invoke 'update'
(but not 'clean') in all setups. Bump tinderbox.pl version to 2.1, mostly
for the 'release' command added in the previous commit.
des [Thu, 15 May 2003 12:33:46 +0000 (12:33 +0000)]
Make the ENV configuration variable a hash rather than an array.
Build LINT on -STABLE now that tinderbox.pl knows how. Also try to build
LINT on powerpc and amd64 (this is a formality as they don't have NOTES
so nothing will be built)
Add two setups for release testing, with plenty of NO* to speed things up.
If the config key was not specified on the command line, try to guess it
from the hostname.
des [Thu, 15 May 2003 12:26:55 +0000 (12:26 +0000)]
Add a 'release' command which builds a release. It currently sets
NOCDROM, NODOC and NOPORTS to save time and space, but I may remove
those at a later date so we can use the results to populate a snapshot
server.
Document the --machine option.
Make $arch and $machine default to the correct values for the current
system. This shouldn't make any difference unless you run the
tinderbox on a pc98 machine, since for all other platforms, $arch and
$machine are the same.
Only set kernel-related variables if actually building a kernel or a
release.
Be paranoid and cd to the correct directory in each stage so we're
sure we invoke make(1) in the right place.
To support building LINT on -STABLE, don't try to 'make LINT' unless
NOTES exists, but build LINT if the config file exists even if there
is no NOTES.
marcel [Thu, 15 May 2003 08:36:03 +0000 (08:36 +0000)]
This file creates register sets based on the runtime specification.
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:
struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads. The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.
struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).
struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.
struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.
marcel [Thu, 15 May 2003 08:08:32 +0000 (08:08 +0000)]
This file contains elementary context related functions used to
save and restore "sets" of registers in various places.
The restorectx and swapctx functions are used by cpu_switch()
and deal with the special registers, as well as the preserved
registers.
The *callee_saved* functions are used to save and restore the
preserved registers (integer and floating-point). They are
useful for signal delivery and ptrace support.
The save_high_fp and restore_high_fp functions are used to
"load" and "unload" to and from the CPU as part of lazy context
switching.
The ia32 specific context functions have been kept with the ia32
code.
marcel [Thu, 15 May 2003 07:51:22 +0000 (07:51 +0000)]
This file contains the code that implements the syscall path based
on the epc instruction. The epc instruction, given the permissions
of the page in which the epc is located, allows the privilege level
to be increased with little or no overhead. The previous privilege
level is recorded in the current frame marker and is restored by
a regular (function) return.
Since the epc instruction has to live in a page with non-standard
properties, we hardwire a "gateway" page in the address space. The
address of the gateway page is exported to userland in ar.k7. This
allows us to rewire the page without breaking the ABI.
The syscall stubs in libc are regular function calls that slightly
differ from the normal runtime. The difference is mostly to simplify
the stubs themselves by by moving some of the logic to the kernel.
The libc stubs call into the gateway page (offset 0), from where the
kernel trampolines to the code that sets up a minimal trapframe and
arranges to execute from the kernel stack.
The way back is basicly the same. The kernel returns to the gateway
page, whereby privilege is dropped, and jumps back to the syscall
stub.
Only the special registers are saved in the trapframe. None of the
scratch registers are preserved and since the kernel follows the
same runtime model, none of the preserved registers are saved.
Future enhancements can include the implementation of lightweight
syscalls, where kernel functions are performed without setting up
a trapframe. Good candidates are the *context syscalls for example.
Now that there's a gateway page from which code can be executed in
a non-privileged context, we also have the ideal place to put the
signal trampolines. By moving the signal trampolines from the user
stack to the gateway page, we open up the doors to unexecutable
stacks. The gateway page contains signal trampolines for both the
"legacy" break-based syscall code and the new and improved epc-
based syscall code.
alc [Thu, 15 May 2003 05:12:24 +0000 (05:12 +0000)]
Initialize logical_cpus_mask when the logical CPUs are enumerated in
the mptable. (Previously, logical_cpus_mask was only initialized if
the hyperthreading fixup was executed.)
marcel [Thu, 15 May 2003 05:04:44 +0000 (05:04 +0000)]
This is beta4 of libuwx; an ia64 stack unwinder. This code is made
available by Hewlett-Packard under the MIT license. The unwinder is
small, clean and fast and needed little adaptation for use in the
kernel.
This import has embedded in it the changes needed to make it build
in a kernel environment.
To optimize the common case, the kernel will minimize the number
of registers saved by not saving the preserved registers. In case
access to preserved registers is needed (signal handling, ptrace)
the kernel will unwind to the context of the syscall or exception.
For this we need an unwinder.
rwatson [Thu, 15 May 2003 03:19:30 +0000 (03:19 +0000)]
When getting back an NLM DENIED response for a requested lock from the
server, map it to EAGAIN locally rather than EACCES. The NLM spec
indicates the DENIED corresponds to lock contention, not a permission
failure. This fixes O_EXLOCK/O_SHLOCK with O_NONBLOCK, which would
previously give a permission error, which in turn fixes things
like mailq(8) and lockf(1) over NFS.
Approved by: scottl (re)
Reviewed by: truckman, Andrew P Lentvorski, Jr. <bsder@allcaps.org>
Idea from: truckman
jmallett [Thu, 15 May 2003 02:10:30 +0000 (02:10 +0000)]
Clear up that COMPAT_43 may not do the same thing on every architecture
and clear up that COMPAT_SUNOS is similarly MI, and does something
relatively similar.
peter [Thu, 15 May 2003 00:23:40 +0000 (00:23 +0000)]
Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.
Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.
rwatson [Wed, 14 May 2003 21:16:33 +0000 (21:16 +0000)]
Avoid registering for a lock on the server in the event the NFS client
has requested the lock in a non-blocking form, instead returning an
immediate failure. This appears to help reduce one of my "locks get
lost" symptoms involving lockf(1), which attempts a non-blocking lock
attempt before actually blocking on the lock. At this point the client
still gets back EACCES, which is an issue we're still working.
Approved by: re (scottl)
Submitted by: Andrew P. Lentvorski, Jr. <bsder@allcaps.org>
rwatson [Wed, 14 May 2003 20:31:06 +0000 (20:31 +0000)]
When giving examples of how to use extattrctl(8) to configure UFS1
attributes, use the current convention for attribute directory names
so that UFS_EXTATTR_AUTOSTART will work with them.
thomas [Wed, 14 May 2003 14:20:22 +0000 (14:20 +0000)]
In atapi_cam_reinit_bus, only call reinit_bus if the ATAPI channel
has already been registered with ATAPI/CAM (else there is nothing
to do). atapi_cam_reinit_bus may be called before the bus is
registered if an ATAPI command times out during the boot sequence.
rwatson [Wed, 14 May 2003 13:50:40 +0000 (13:50 +0000)]
When receiving NLM_GRANTED_RES or NLM4_GRANTED_RES lock granted messages
from the NFS server, following contention on a lock by this or another
client, immediately notify the waiting process that the lock has been
granted via a wakeup. Without this change, the client rpc.lockd will
not wakeup the waiting process until it next re-polls the lock (sometime
in the next ten seconds), which can lead to marked latency across all
potential lockers, as the lock is held by the client for the duration.
Approved by: re (scottl)
Submitted by: truckman
Reviewed by: Andrew P. Lentvorski, Jr <bsder@allcaps.org>
peter [Wed, 14 May 2003 04:10:49 +0000 (04:10 +0000)]
Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.
Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.
On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.
peter [Wed, 14 May 2003 03:38:13 +0000 (03:38 +0000)]
Fix some misunderstandings about 64 bit extension.
Fix fuword/suword - they're supposed to be 'long' - ie: point them
at fuword64/suword64 instead of the incorrect 32 bit versions.
jhb [Tue, 13 May 2003 20:36:02 +0000 (20:36 +0000)]
- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.
jhb [Tue, 13 May 2003 19:21:46 +0000 (19:21 +0000)]
In setitimer(2), if the it_value of the new itimer value is clear, then
don't add the current time to it, but leave it as clear so that when the
timer is disabled, the it_value is always clear.
alc [Tue, 13 May 2003 04:36:02 +0000 (04:36 +0000)]
Optimize the use of splay in gbincore(). During a "make buildworld" the
desired buffer is found at one of the roots more than 60% of the time.
Thus, checking both roots before performing either splay eliminates
unnecessary splays on the first tree splayed.