rwatson [Sat, 9 Oct 2004 22:04:13 +0000 (22:04 +0000)]
Modify entropy harvesting locking strategy:
- Trade off granularity to reduce overhead, since the current model
doesn't appear to reduce contention substantially: move to a single
harvest mutex protecting harvesting queues, rather than one mutex
per source plus a mutex for the free list.
- Reduce mutex operations in a harvesting event to 2 from 4, and
maintain lockless read to avoid mutex operations if the queue is
full.
- When reaping harvested entries from the queue, move all entries from
the queue at once, and when done with them, insert them all into a
thread-local queue for processing; then insert them all into the
empty fifo at once. This reduces O(4n) mutex operations to O(2)
mutex operations per wakeup.
In the future, we may want to look at re-introducing granularity,
although perhaps at the granularity of the source rather than the
source class; both the new and old strategies would cause contention
between different instances of the same source (i.e., multiple
network interfaces).
rwatson [Sat, 9 Oct 2004 20:58:28 +0000 (20:58 +0000)]
Add a simple C-based TCP connection generator, which generates and
closes the specified number of TCP connections sequentially and
synchronously. Useful for trying to trigger races in the accept
code.
njl [Sat, 9 Oct 2004 20:16:06 +0000 (20:16 +0000)]
Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB,
allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already
a 64 bit type, coincidental with daddr_t.
csjp [Sat, 9 Oct 2004 20:07:33 +0000 (20:07 +0000)]
Add a note to the man page warning users about possible lock order
reversals+system lock ups if they are using ucred based rules
while running with debug.mpsafenet=1.
I am working on merging a shared locking mechanism into ipfw which
should take care of this problem, but it still requires a bit more
testing and review.
maxim [Sat, 9 Oct 2004 17:13:58 +0000 (17:13 +0000)]
o Backout rev. 1.16, see 1.3 commit log for more info.
Requested by: bde
o Remove unneeded sys/types.h and netinet/in.h from the synopsis and
the example.
o We do have struct in_addr in arpa/inet.h, so no need for netinet/in.h.
o Mention where AF_* constants defined are.
rwatson [Sat, 9 Oct 2004 16:48:51 +0000 (16:48 +0000)]
Acquire the send socket buffer lock around tcp_output() activities
reaching into the socket buffer. This prevents a number of potential
races, including dereferencing of sb_mb while unlocked leading to
a NULL pointer deref (how I found it). Potentially this might also
explain other "odd" TCP behavior on SMP boxes (although haven't
seen it reported).
green [Sat, 9 Oct 2004 08:16:37 +0000 (08:16 +0000)]
Don't "implicitly order all sleep locks before spin locks" in witness
when the spin lock in question isn't -- it's the critical_enter() that
KDB set. No more panic in DDB for console -> syscons -> tty -> knote
operations.
yongari [Sat, 9 Oct 2004 07:31:03 +0000 (07:31 +0000)]
Port NetBSD auxio driver. The driver was modified to use led(4) and can
be used to announce various system activity.
The auxio device provides auxiliary I/O functions and is found on various
SBus/EBus UltraSPARC models. At present, only front panel LED is
controlled by this driver.
scottl [Sat, 9 Oct 2004 02:53:47 +0000 (02:53 +0000)]
3 important fixes for growfs:
1) ginode() is passed a cylinder group number and inode number. The inode
number is relative to the cg. Use this relative number rather than the
absolute inode number when searching the cg inode bitmap to see if the inode
is allocated. Using the absolute number quickly runs the check off the end
of the array and causes invalid inodes to be referenced.
2) ginode() checks the absolute indoe number to make sure that it is greater
than ROOTINO. However, the caller loops through all of the possible inode
numbers and directly passes in values that are < ROOTINO. Instead of halting
the program with an error, just return NULL.
3) When allocating new cylinder groups, growfs was initializing all of the
inodes in the group regardless of this only being required for UFS1. Not
doing this for UFS2 provides a significant performance increase.
These fixes allow growing a filesystem beyond a trivial amount and have
been tested to grow an 8GB filesystem to 1.9TB. Much more testing would
be appreciated.
sos [Fri, 8 Oct 2004 21:27:27 +0000 (21:27 +0000)]
Only do the geometry translations on ad* devices, other devices seems to
have their own way of life.
Those other devices translations should be moved here as well.
truckman [Fri, 8 Oct 2004 20:44:47 +0000 (20:44 +0000)]
Eliminate linked list used to track inodes with an initial link
count of zero and instead encode this information in the inode state.
Pass 4 performed a linear search of this list for each inode in
the file system, which performs poorly if the list is long.
Reviewed by: sam & keramida (an earlier version of the patch), mckusick
MFC after: 1 month
keramida [Fri, 8 Oct 2004 20:31:33 +0000 (20:31 +0000)]
To avoid pushing the paragraph text too far from the left border, making
line-splitting extremely difficult for groff, indent the .Bl items by
the standard `indent' length instead of an indent large enough to hold
the maximal tag name.
green [Fri, 8 Oct 2004 20:19:29 +0000 (20:19 +0000)]
Fix critical stability problems that can cause UMA mbuf cluster
state management corruption, mbuf leaks, general mbuf corruption,
and at least on i386 a first level splash damage radius that
encompasses up to about half a megabyte of the memory after
an mbuf cluster's allocation slab. In short, this has caused
instability nightmares anywhere the right kind of network traffic
is present.
When the polymorphic refcount slabs were added to UMA, the new types
were not used pervasively. In particular, the slab management
structure was turned into one for refcounts, and one for non-refcounts
(supposed to be mostly like the old slab management structure),
but the latter was almost always used through out. In general, every
access to zones with UMA_ZONE_REFCNT turned on corrupted the
"next free" slab offset offset and the refcount with each other and
with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).
Fix things so that the right type is used to access refcounted zones
where it was not before. There are additional errors in gross
overestimation of padding, it seems, that would cause a large kegs
(nee zones) to be allocated when small ones would do. Unless I have
analyzed this incorrectly, it is not directly harmful.
rwatson [Fri, 8 Oct 2004 19:23:11 +0000 (19:23 +0000)]
Add a version of netsend that uses the interval timer rather than
explicit clock reads to set an overall duration to the send, and
blasts rather than trying to clock output. The goal of netblast,
unlike netsend, is to send as many UDP packets as possible; the
cost is that there's no ability to control the rate, and there's
less accuracy in the timing as the interval timer granularity is
relatively low.
njl [Fri, 8 Oct 2004 17:56:47 +0000 (17:56 +0000)]
Update a quirk for the ASUS P5A to disable the timer. It appears to work fine
with acpi but the timer runs twice as fast. Note that the main problem
(system doesn't work properly with acpi disabled) should be fixed separately.
Changes:
* Add a quirk to disable the timer
* Merge the P5A and P5A-B quirks since they appear to be based on the
same ASL.
PR: i386/72450
Tested by: Kevin Oberman <oberman es.net>
MFC after: 3 days
mlaier [Fri, 8 Oct 2004 12:07:20 +0000 (12:07 +0000)]
Change pfil starvation prevention from fail-open to fail-close.
We return ENOBUF to indicate the problem, which is an errno that should be
handled well everywhere.
Requested & Submitted by: green
Silently okay'ed by: The rest of the firewall gang
MFC after: 3 days
glebius [Fri, 8 Oct 2004 09:57:12 +0000 (09:57 +0000)]
- sort struct rtentry fields in man page in some order as they are in struct
- remove RTF_PRCLONING
- add rt_mtx field
- rename rt_metrics -> rt_metrics_lite
- mention that only 3 metrics are really used in rt_metrics_lite
alc [Fri, 8 Oct 2004 08:23:43 +0000 (08:23 +0000)]
Make pte_load_store() an atomic operation in all cases, not just i386 PAE.
Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)
brooks [Fri, 8 Oct 2004 00:24:30 +0000 (00:24 +0000)]
Since net/net_osdep.c contained only one function that could be
trivially implemented as a macro, do that and remove it. NetBSD did
this quite a while ago.
dougb [Fri, 8 Oct 2004 00:14:28 +0000 (00:14 +0000)]
1. Incorporate most of Ruslan's improvements to where and how the
/etc/namedb symlink is created.
2. Incorporate Brian's suggestion to make the link relative. This
is necessary to handle situations (such as mergemaster) where the
user is building a tree in a seperate environment. This will also
fix the problem with the way DESTDIR is set in 'make release'.
3. Add a new knob, NO_BIND_MTREE, as suggested by the folks who
already have stuff in /var/named that they don't want me to mess with.
4. Update make.conf(5) with the new stuff, and correct a few paths
that have changed since I last updated it.
kensmith [Thu, 7 Oct 2004 20:36:56 +0000 (20:36 +0000)]
Back out v1.58... We still don't know what is causing the specific
problem I had but it's happening in code that is messing around with
register windows - I'm willing to live with that piece being sensitive
to this and it looks like the other problems we had reported lately
are not fixed by using -O instead of -O2.
Sorry for the churn. Looks like I need a second pointy hat. Someone
tells me they stack well. :-))))
sos [Thu, 7 Oct 2004 17:37:09 +0000 (17:37 +0000)]
Move the PC98 specific geometry "gunk" to geom_pc98.c where it belongs.
This also adds support for bigger disks on the controller I have access to,
and maybe others if I understood the adhoc methods used on those.
Those with more PC98 bigdrive controllers it is hereby invited to add/fix
support for those in geom_pc98.c and not using #ifdef PC98 all over the place.
rwatson [Thu, 7 Oct 2004 14:13:35 +0000 (14:13 +0000)]
When running with debug.mpsafenet=0, initialize IP multicast routing
callouts as non-CALLOUT_MPSAFE. Otherwise, they may trigger an
assertion regarding Giant if they enter other parts of the stack from
the callout.
MFC after: 3 days
Reported by: Dikshie < dikshie at ppk dot itb dot ac dot id >
mlaier [Thu, 7 Oct 2004 12:10:25 +0000 (12:10 +0000)]
Add a minimal altq.4 manpage to tell about the kernel options and where to
find more information. Also move the "SUPPORTED DEVICES" section from altq.9
to altq.4, where is belongs.
pjd [Thu, 7 Oct 2004 10:02:46 +0000 (10:02 +0000)]
- Be more userfriendly and allow to specify gbde device name in those forms:
device
device.bde
/dev/device
/dev/device.bde
- Fix stop routine:
+ There don't have to be file system mounted on gbde device,
so ignore errors from umount(8).
+ Only detach existing gbde devices.
pjd [Thu, 7 Oct 2004 06:00:06 +0000 (06:00 +0000)]
Only try to attach if parent device actually exists.
I used ugly "/dev/${parent}" instead of "${parentdev}", because "/dev/"
prefix for devices listed in gbde_devices variable is optional.
kensmith [Wed, 6 Oct 2004 19:55:14 +0000 (19:55 +0000)]
Back out v1.49. Recent findings suggest sparc64 may not be ready for
-O2 on kernel compiles after all. While working on adding a KASSERT
to sparc64/sparc64/rwindow.c I found that it was "position sensitive",
putting it above a call to flushw() instead of below caused corruption
of processes on the system. jake and jhb have both confirmed there is
no obvious explanation for that. The exact same kernel code does not
have the process corruption problem if compiled with -O instead of -O2.
There have been signs of similar issues floated on the sparc64@ mailing
list, lets see if this helps make them go away.
Note this isn't an optimal fix as far as the file format goes, if this
disgusts too many people I'll fix it the right way. Since compiling
with something other than -O is a known problem this format would prevent
a change to the default causing grief. And this may also help motivate
finding out what the compiler is doing wrong so we can shift back to
using -O2. :-)
My turn for the pointy hat... One of the florescent ones...
jhb [Wed, 6 Oct 2004 17:10:56 +0000 (17:10 +0000)]
- Fix the compile to chase the p_rux changes.
- Add a comment noting that the ru_[us]times values being read aren't
actually valid and need to be computed from the raw values.
mtm [Wed, 6 Oct 2004 14:23:00 +0000 (14:23 +0000)]
Close a race between a thread exiting and the freeing of it's stack.
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.
Discussed with: davidxu, deischen, marcel
MFC after: 3 days
davidxu [Wed, 6 Oct 2004 08:11:07 +0000 (08:11 +0000)]
Allocate red zone and stack space together and then split red zone from
allocated space, orignal code left red zone unallocated, but those space
can be allocated by user code, and result was providing no protection.
imp [Wed, 6 Oct 2004 07:26:52 +0000 (07:26 +0000)]
For older systems with ACPI which don't have a pci <-> pci bridge,
allocate unallocated memory resources from the top 32MB of the address
space rather than the top 2GB. While the latter works on some
chipsets, it fails badly on others. 32MB is more conservative and
matches what cheap harware from this era is hardwired to pass.
imp [Wed, 6 Oct 2004 07:22:58 +0000 (07:22 +0000)]
For legacy PCI bridges, limit memory allocation to the top 32MB of
RAM. Many older, legacy bridges only allow allocation from this
range. This only appies to devices who don't have their memory
assigned by the BIOS (since we allocate the ranges so assigned
exactly), so should have minimal impact.
Hoewver, for CardBus bridges (cbb), they rarely get the resources
allocated by the BIOS, and this patch helps them greatly. Typically
the 'bad Vcc' messages are caused by this problem.
green [Wed, 6 Oct 2004 04:25:37 +0000 (04:25 +0000)]
Don't recurse the BPF descriptor lock during the BIOCSDLT operation
(and panic). To try to finish making BPF safe, at the very least,
the BPF descriptor lock really needs to change into a reader/writer
lock that controls access to "settings," and a mutex that controls
access to the selinfo/knote/callout. Also, use of callout_drain()
instead of callout_stop() (which is really a much more widespread
issue).
marcel [Wed, 6 Oct 2004 02:43:28 +0000 (02:43 +0000)]
Add the Madison II, which is the second generation Madison. The Madison II
is model 2 in the Itanium 2 family and has up to 9MB of L3 cache and clocks
higher than 1.5Ghz. There's no LV variant AFAICT.