Mike Barcroft [Tue, 26 Feb 2002 19:43:03 +0000 (19:43 +0000)]
Rather than include namespace pollution in <grp.h> in order to declare
`gid_t', use the canonical protection scheme to define a type in two or
more headers. This brings <grp.h> closer to POSIX.1-2001 conformance.
Matthew Dillon [Tue, 26 Feb 2002 18:08:54 +0000 (18:08 +0000)]
Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.
Jake Burkholder [Tue, 26 Feb 2002 17:09:24 +0000 (17:09 +0000)]
Apparently gcc3.1 is now using deprcated v8 instructions in v9 code
due to them being faster in certain cases. Therefore we need to save
and restore the v8 %y register around traps in kernel mode as well as
traps in usermode.
Matthew Dillon [Tue, 26 Feb 2002 17:06:21 +0000 (17:06 +0000)]
STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().
This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.
This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.
This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.
The following changes have been made:
* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.
Other architectures are unaffected.
* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.
The new code places the burden of entering the critical section
in the trampoline code where it belongs.
* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.
* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.
* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.
* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.
* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.
* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.
Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.
* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.
Performance issues:
One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.
The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.
The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).
The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().
Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.
Murray Stokely [Tue, 26 Feb 2002 15:12:54 +0000 (15:12 +0000)]
Add some ifdef(RELEASE_CRUNCH) goo to explicitly list the requisite
object files for crunchgen. Without this patch, release.4 will fail
to build the crunched binaries for the release floppies.
Doug Ambrisko [Tue, 26 Feb 2002 05:43:05 +0000 (05:43 +0000)]
In ad-hoc mode, the "associate" bit is valid to check to see if it is
part of an ad-hoc network. This means another station needs to be
around so they can both associate.
Jake Burkholder [Tue, 26 Feb 2002 02:37:43 +0000 (02:37 +0000)]
Allow the user tsb to span multiple pages. Make the default 2 pages for now
until we do some testing to see what's best. This gives a massive reduction
in system time for processes with a relatively large working set. The size
of the tsb directly affects the rss size that a user process can keep mapped.
When it starts to get full replacements occur and the process takes a lot of
soft vm faults. Increasing the default from 1 page to 2 gives the following
before and after numbers for compiling vfs_bio.c:
before:
14.27 real 6.56 user 5.69 sys
after:
8.57 real 6.11 user 1.62 sys
This should make self hosted builds more tolerable.
Brooks Davis [Tue, 26 Feb 2002 02:19:33 +0000 (02:19 +0000)]
When using hardware decoding, reconstruct the wire form of the ethernet
header and push it up any attached bpf devices on the parent interface.
This makes hardware vlan decoding more like the normal software path.
Brooks Davis [Tue, 26 Feb 2002 01:56:56 +0000 (01:56 +0000)]
Make gif(4) nesting level and parallel tunnel support tunable at runtime
via sysctl's. The old #defines, MAX_GIF_NEST and XBONEHACK are
currently supported for backwards compatability, but will probably be
removed at some point in the future.
Peter Wemm [Tue, 26 Feb 2002 01:11:08 +0000 (01:11 +0000)]
Fix a warning by pulling prototype for arp_ifinit() into scope.
Then fix cast the correct value into an incorrect value, which was not
detected due to the missing prototype (but was harmless anyway).
Peter Wemm [Mon, 25 Feb 2002 23:49:51 +0000 (23:49 +0000)]
Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.
This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.
David E. O'Brien [Mon, 25 Feb 2002 22:13:44 +0000 (22:13 +0000)]
I was able to boot this kernel using the latest WIP kernel sources.
I don't believe anyone is quite using the sparc64 kernel sources in CVS
yet -- things aren't just quite ready (but almost). So this commit should
be OK to make.
Peter Wemm [Mon, 25 Feb 2002 22:04:33 +0000 (22:04 +0000)]
Turn on -Werror by default. This is is easily turned off, by either:
- fix the warnings, they are there for a reason!
- add -DNO_ERROR to your make(1) command.
- add 'makeoptions NO_WERROR=true' to your kernel config.
- add 'nowerror' to conf/files* that have warnings that should be fixed
due to tracking 3rd party vendor code.
- add 'nowerror' to conf/files* where the warning is false due to a
compiler bug and fixing it with brute force would be too expensive.
There are some very sloppy warnings in our kernel build, come on folks!
Jake Burkholder [Mon, 25 Feb 2002 18:37:17 +0000 (18:37 +0000)]
Implement a nested window state. This avoids attempting to spill a user
window to the user stack while in a nested kernel trap. We do this for
entry to the kernel from user mode, but if we get an interrupt in kernel
mode while there are still user windows in the cpu, and we attempt to spill
to the user stack, we may take too many nested traps and overflow the trap
stack, causing a red state exception. This is needed by upcoming changes
to allow the user tsb to not be locked in the tlb.
Add a new test_counter() function which tries to determine the width of
the inter-value histogram for 2000 samples. If the width is 3 or less
for 10 consequtive samples, we trust the counter to be good, otherwise
we use the *_safe() method.
This method may be too strict, but the worst which can happen is that
we take the performance hit of the *_safe() method when we should not.
Make the *_safe() method more discriminating by mandating that the three
samples do not span more than 15 ticks on the counter.
Disable the PCI-ident based probing as a means to recognize good
counters.
Maxim Sobolev [Mon, 25 Feb 2002 09:17:44 +0000 (09:17 +0000)]
Fix a bug introduced in rev.1.23 - for some reason mkdir("/", ...) system
call returns `EISDIR', not `EEXIST', so that be prepared for that. This should
fix number of ports, that often call `mkdir -p //usr/local/foobar'. This
is just a quick workaround, the real fix would be either to avoid calling
mkdir("/", ...) or fix VFS code to return consistent errno for this case.
Crist J. Clark [Mon, 25 Feb 2002 08:29:21 +0000 (08:29 +0000)]
The TCP code did not do sufficient checks on whether incoming packets
were destined for a broadcast IP address. All TCP packets with a
broadcast destination must be ignored. The system only ignored packets
that were _link-layer_ broadcasts or multicast. We need to check the
IP address too since it is quite possible for a broadcast IP address
to come in with a unicast link-layer address.
Note that the check existed prior to CSRG revision 7.35, but was
removed. This commit effectively backs out that nine-year-old change.
Bruce Evans [Mon, 25 Feb 2002 05:16:22 +0000 (05:16 +0000)]
#include <sys/time.h> instead of depending on namespace pollution in
<sys/stat.h> for the declaration of struct timeval. Intentionally
don't follow the local style of polluting the local headers.
Bruce Evans [Mon, 25 Feb 2002 05:09:12 +0000 (05:09 +0000)]
Unremoved used includes. <sys/time.h> is needed if <sys/stat.h> isn't
polluted, and <sys/types.h> is strictly a prerequisite for <sys/stat.h>
untiil we drop support for pre-2001 versions of POSIX.
Jake Burkholder [Mon, 25 Feb 2002 04:56:50 +0000 (04:56 +0000)]
Modify the tte format to not include the tlb context number and to store the
virtual page number in a much more convenient way; all in one piece. This
greatly simplifies the comparison for a matching tte, and allows the fault
handlers to be much simpler due to not having to load wierd masks.
Rewrite the tlb fault handlers to account for the new format. These are also
written to allow faults on the user tsb inside of the fault handlers; the
kernel fault handler must be aware of this and not clobber the other's
registers. The faults do not yet occur due to other support that is needed
(and still under my desk).
David E. O'Brien [Mon, 25 Feb 2002 04:49:17 +0000 (04:49 +0000)]
Use the default 'ld' emulation rather than hard coding it.
For FreeBSD, 'ld' 2.12.0 uses a different emulation than in the past.
So this change makes the upgrade easier.
Bruce Evans [Mon, 25 Feb 2002 04:47:39 +0000 (04:47 +0000)]
#include <sys/time.h> instead of depending on namespace pollution in
<sys/stat.h> for the declaration of struct timeval (sys/stat.h> only
needs timespecs even when its POSIX support is not turned on, so it
shouldn't declare timevals).
Bruce Evans [Mon, 25 Feb 2002 04:31:25 +0000 (04:31 +0000)]
Declare time(not3) instead of depending on namespace pollution 3 layers
deep in <stand.h> to eventually include <time.h> to declare the user
version.
This is not quite the right place to declare it, but <stand.h> would
be worse because time() is very MD so it isn't in libstand.
Many places in the boot sources still get the user version using only
1 layer of pollution (#include <sys/time.h>. Some pollute themselves
directly (#include <time.h>). But the boot Makefiles are too broken
to enable warnings for redeclarations.
Bruce Evans [Mon, 25 Feb 2002 02:18:36 +0000 (02:18 +0000)]
Removed unused include of <sys/resource.h> instead of depending on
namespace pollution only 1 layer deep in <sys/stat.h> for its
prerequisite <sys/time.h>
Ian Dowse [Mon, 25 Feb 2002 00:03:34 +0000 (00:03 +0000)]
Sockets passed into uipc_abort() have been allocated by sonewconn()
but never accept'ed, so they must be destroyed. Originally, unp_drop()
detected this situation by checking if so->so_head is non-NULL.
However, since revision 1.54 of uipc_socket.c (Feb 1999), so->so_head
is set to NULL before calling soabort(), so any unix-domain sockets
waiting to be accept'ed are leaked if the server socket is closed.
Resolve this by moving the socket destruction code into uipc_abort()
itself, and making it unconditional (the other caller of unp_drop()
never needs the socket to be destroyed). Use unp_detach() to avoid
the original code duplication when destroying the socket.
PR: kern/17895
Reviewed by: dwmalone (an earlier version of the patch)
MFC after: 1 week
Matthew Dillon [Sun, 24 Feb 2002 22:58:15 +0000 (22:58 +0000)]
Tests by numerous people have shown that many chipsets do not properly
latch the acpi timer, resulting in weird deltas. The problem is severe
enough to adversely effect the timecounter code.
Default to the 'safe' version of the get-timecount function. The probe
will override it if a known-good chipset is found. This is temporary
until a more complete solution is found.