bms [Sun, 5 Oct 2003 09:35:08 +0000 (09:35 +0000)]
Add a pre-emption counter, td_generation, so that threads can notice
when they have been pre-empted by other threads. This is bumped from
within mi_switch() every time a context switch takes place.
bms [Sun, 5 Oct 2003 08:38:22 +0000 (08:38 +0000)]
Fold the vslock() and vsunlock() calls in this file with #if 0's; they will
go away in due course. Involuntary pre-emption means that we can't count
on wiring of pages alone for consistency when performing a SYSCTL_OUT()
bigger than PAGE_SIZE.
jeff [Sun, 5 Oct 2003 07:16:45 +0000 (07:16 +0000)]
- Check the XLOCK before inspecting v_data.
- Slightly rewrite the fsync loop to be more lock friendly. We must
acquire the vnode interlock before dropping the mnt lock. We must
also check XLOCK to prevent vclean() races.
- Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean()
races.
- Use a local variable to store the results of the nvp == TAILQ_NEXT
test so that we do not access the vp after we've vrele()d it.
- Add an XXX comment about UFS_UPDATE() not being protected by any lock
here. I suspect that it should need the VOP lock.
jeff [Sun, 5 Oct 2003 07:12:38 +0000 (07:12 +0000)]
- Fix an XXX. Check the error of vn_lock() in vflush(). Don't specify
LK_RETRY either, we don't want this vnode if it turns into another.
- Remove the code that checks the mount point after acquiring the lock
we are guaranteed to either fail or get the vnode that we wanted.
jeff [Sun, 5 Oct 2003 06:43:03 +0000 (06:43 +0000)]
- File systems that wish to inspect the vnode contents or their private
v_data field before calling vget/vn_lock must check VI_XLOCK manually to
be sure that v_data is still valid. Implement this check in two places
here.
jeff [Sun, 5 Oct 2003 05:35:41 +0000 (05:35 +0000)]
- Rename vcanrecycle() to vtryrecycle() to reflect its new role.
- In vtryrecycle() try to vgonel the vnode if all of the previous checks
passed. We won't vgonel if someone has either acquired a hold or usecount
or started the vgone process elsewhere. This is because we may have been
removed from the free list while we were inspecting the vnode for
recycling.
- The VI_TRYLOCK stops two threads from entering getnewvnode() and recycling
the same vnode. To further reduce the likelyhood of this event, requeue
the vnode on the tail of the list prior to calling vtryrecycle(). We can
not actually remove the vnode from the list until we know that it's
going to be recycled because other interlock holders may see the VI_FREE
flag and try to remove it from the free list.
- Kill a bogus XXX comment. If XLOCK is set we shouldn't wait for it
regardless of MNT_WAIT because the vnode does not actually belong to
this filesystem.
jeff [Sun, 5 Oct 2003 02:48:04 +0000 (02:48 +0000)]
- Don't cache_purge() in getnewvnode. It's done in vclean(). With this
purge, the purge in vclean, and the filesystems purge, we had 3 purges
per vnode.
- Move the insmntque(vp, 0) to vclean() so that we may remove it from the
two vgone() functions and reduce the number of lock operations required.
jeff [Sun, 5 Oct 2003 00:35:41 +0000 (00:35 +0000)]
- Solve a LOR with the sync_mtx by using the VI_ONWORKLST flag to determine
whether or not the sync failed. This could potentially get set between
the time that we VOP_UNLOCK and VI_LOCK() but the race would harmelssly
lead to the sync being delayed by an extra 30 seconds. If we do not move
the vnode it could cause an endless loop if it continues to fail to sync.
- Use vhold and vdrop to stop the vnode from changing identities while we
have it unlocked. Other internal vfs lists are likely to follow this
scheme.
jeff [Sun, 5 Oct 2003 00:02:41 +0000 (00:02 +0000)]
- Move the xlock 'locking' code into vx_lock() and vx_unlock().
- Create a new function, vgonechrl(), which performs vgone for an in-use
character device. Move the code from vflush() that did this into
vgonechrl().
- Hold the xlock across the entirety of vgonel() and vgonechrl() so that
at no point will an invalid vnode exist on any list without XLOCK set.
- Move the xlock code out of vclean() now that it is in the vgone*()
functions.
alc [Sat, 4 Oct 2003 22:47:20 +0000 (22:47 +0000)]
Eliminate some unnecessary uses of the vm page queues lock around the
vm page's valid field. This field is being synchronized using the
containing vm object's lock.
peter [Sat, 4 Oct 2003 22:04:54 +0000 (22:04 +0000)]
Fix the apm problem for real. We leave the first 4K page for the bios to
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.
alc [Sat, 4 Oct 2003 19:23:29 +0000 (19:23 +0000)]
- Extend the scope the vm object lock to cover calls to
vm_page_is_valid().
- Assert that the lock on the containing vm object is held in
vm_page_is_valid().
imp [Sat, 4 Oct 2003 18:40:36 +0000 (18:40 +0000)]
I've been burned about half a dozen times by the old PAO syntax for
'any' interrupt. There's no reason not to be liberal here and accept
the PAO syntax.
jeff [Sat, 4 Oct 2003 18:03:53 +0000 (18:03 +0000)]
- In sched_sync() test our preconditions prior to dropping the sync_mtx.
This is so that we may grab the interlock while still holding the
sync_mtx. We have to VI_TRYLOCK() because in all other cases the lock
order runs the other way.
- If we don't meet any of the preconditions, reinsert the vp into the
list for the next second.
- We don't need to panic if we fail to sync here because each FSYNC
function handles this case. Removing this redundant code also
simplifies locking.
jeff [Sat, 4 Oct 2003 17:37:51 +0000 (17:37 +0000)]
- Set the sopt_dir member of the sockopt structure, otherwise, this parameter
will not actually be set even though we're calling sosetopt. sosetopt
calls down to a single ctloutput function if the name or level is
implemented by a specific protocol.
jeff [Sat, 4 Oct 2003 16:09:40 +0000 (16:09 +0000)]
- Don't use vrecycle() call vgonel() directly after grabing the vnode
interlock. We do this so that we still hold the interlock when we lock
the vnode later. This prevents races with the mnt vnode list.
yar [Sat, 4 Oct 2003 15:17:08 +0000 (15:17 +0000)]
Assorted minor fixes, mostly style(9):
- PID should be pid_t, not int;
- sort #include's and local variables;
- don't overuse initializers;
- use warn(3) instead of perror(3) consistently;
- amplify the comment on signals.
jeff [Sat, 4 Oct 2003 15:10:40 +0000 (15:10 +0000)]
- In a Giantless world, the vn_lock() in vcanrecycle() could legitimately
fail. Remove the panic from that case and document why it might fail.
- Document the reason for calling cache_purge() on a newly created vnode.
- In insmntque() order the operations so that we can call mtx_unlock()
one fewer times. This makes the code somewhat clearer as well.
- Add XXX comments in sched_sync() and vflush().
- In vget(), do not sleep while waiting for XLOCK to clear if LK_NOWAIT is
set.
- In vclean() we don't need to acquire a lock around a single TAILQ_FIRST
call. It's ok if we race here, the vinvalbuf will just do nothing.
- Increase the scope of the lock in vgonel() to reduce the number of lock
operations that are performed.
jeff [Sat, 4 Oct 2003 14:35:22 +0000 (14:35 +0000)]
- If we are called with LK_NOWAIT in vn_lock() we may be holding a mutex
and should not sleep while waiting for XLOCK to clear. Care needs to be
taken in functions that use this capability to avoid spinning.
jeff [Sat, 4 Oct 2003 14:27:49 +0000 (14:27 +0000)]
- Increase the scope of the interlock in ffs_reload(). Acquire it before
we release the mntvnode_mtx.
- Call vgonel() directly instead of going through vrecycle() since we own
the interlock now.
- Remove a few cases where we locked the interlock just so that we could
call VOP_UNLOCK with interlock held.
jeff [Sat, 4 Oct 2003 14:25:45 +0000 (14:25 +0000)]
- Fix an unlocked call to GETATTR by slightly shuffling the code in
ffs_snapshot() around.
- Acquire the interlock before releasing the mntvnode_mtx. Use the
interlock to protect v_usecount access.
jeff [Sat, 4 Oct 2003 14:21:53 +0000 (14:21 +0000)]
- Use the UMA_ZONE_VM flag on the fakepg and object zones to prevent
vm recursion and LORs. This may be necessary for other zones created in
the vm but this needs to be verified.
jeff [Sat, 4 Oct 2003 14:03:28 +0000 (14:03 +0000)]
- Use the VI_LOCK macro in two places where we directly called mtx_lock()
before. Direct calls indicated places that needed review and these have
now been reviewed.
jeff [Sat, 4 Oct 2003 14:02:32 +0000 (14:02 +0000)]
- Properly acquire the vnode interlock before releasing the
mntvnode_mtx.
- Use a local variable to store the results of the test to see if the
next vnode on the mount list has changed. This is so that we no longer
acess the vnode after we vput() it.
jeff [Sat, 4 Oct 2003 13:44:51 +0000 (13:44 +0000)]
- Acquire the vnode interlock prior to dropping the mntvnode_mtx.
- Make a note of the lack of XLOCK protection in this code. We would access
a vnode while it is changing identities without Giant.
jeff [Sat, 4 Oct 2003 13:16:54 +0000 (13:16 +0000)]
- Make proper use of the mntvnode_mtx. We do not need the loop label
because we do not drop the mntvnode_mtx. If this code had ever executed
and hit the loop condition it would have spun forever.
jeff [Sat, 4 Oct 2003 12:52:37 +0000 (12:52 +0000)]
- Acquire the vnode interlock prior to droping the mntvnode_mtx. This does
not eliminate races where the vnode could be reclaimed and end up with
a NULL v_data pointer but Giant is protecting us from that at the moment.
jeff [Sat, 4 Oct 2003 08:51:50 +0000 (08:51 +0000)]
- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a
stack trace supplied by phk, I now understand what's going on here. The
check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been
partially torn down in vclean(). It is not clear that this would cause
a problem. Document this in nfs_bio.c, which is where the other two
filesystems copied this code from.
peter [Sat, 4 Oct 2003 06:30:56 +0000 (06:30 +0000)]
Emulate bugs in the old PSE code so that apm works again.
I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.
If anybody ever tried DISABLE_PSE before, apm would not work.
If your cpu did not have PSE, apm would not work there either (eg: 486).
This bug has been around for a Very Long Time.
The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.
This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.
sam [Sat, 4 Oct 2003 03:44:50 +0000 (03:44 +0000)]
Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.
Other/related changes:
o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts
Notes:
1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
bms [Sat, 4 Oct 2003 01:30:01 +0000 (01:30 +0000)]
Update the pmap(9) documentation to reflect the movement of pmap_prefault()
to the machine-independent VM layer, as per alc's recent commit.
Add a definition for the new pmap_is_prefaultable() helper function.
alc [Fri, 3 Oct 2003 22:46:53 +0000 (22:46 +0000)]
Migrate pmap_prefault() into the machine-independent virtual memory layer.
A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.
imp [Fri, 3 Oct 2003 22:00:06 +0000 (22:00 +0000)]
While make has been fixed to grok this construct, the new make hasn't
been widely deploy and that's causing us a lot of pain. Back out the
last commit for a few weeks so that we can lessen the support load in
current@ asking why they can't build kernels anymore. Instructions in
UPDATING have been updated, but this should be more effective.
phk [Fri, 3 Oct 2003 21:33:40 +0000 (21:33 +0000)]
Default ntpd to write a "driftfile" in /var/db/ntpd.drift.
A "driftfile" caches the oscillator offset estimate from boot to boot,
having this means faster and less bumpy time synchronization. Will
be overridden by any value in the config file.
alc [Fri, 3 Oct 2003 19:49:08 +0000 (19:49 +0000)]
Make PAGE_SIZE and related quantities signed on sparc64. (They are signed
quantities on every other architecture.) This change is required in order
to move pmap_prefault() out of the pmap and into the machine-independent
layer.
rwatson [Fri, 3 Oct 2003 18:27:24 +0000 (18:27 +0000)]
When direct dispatching an netisr (net.isr.enable=1), if there are already
any queued packets for the isr, process those packets before the newly
submitted packet, maintaining ordering of all packets being delivered
to the netisr. Remove the bypass counter since we don't bypass anymore.
Leave the comment about possible problems and options since later
performance optimization may change the strategy for addressing ordering
problems here.
Specifically, this maintains the strong isr ordering guarantee; additional
parallelism and lower latency may be possible by moving to weaker
guarantees (per-interface, for example). We will probably at some point
also want to remove the one instance netisr dispatch limit currently
enforced by a mutex, but it's not clear that's 100% safe yet, even in
the netperf branch.
sam [Fri, 3 Oct 2003 18:15:54 +0000 (18:15 +0000)]
cleanups prior to adding locking (and in some cases to eliminate locking):
o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send