Hartmut Brandt [Wed, 12 Nov 2003 09:10:11 +0000 (09:10 +0000)]
Double length of node names, hook names, command strings and types. Add
defines for these constants that include the trailing NUL byte. These
new constants have SIZ in their name instead of LEN. As soon as all
consumers in the tree are converted to use the new defines the old
defines will be put under BURN_BRIDGES.
Reviewed by: archie, julian, ru
Approved by: re (in principle)
Tim J. Robbins [Wed, 12 Nov 2003 08:49:12 +0000 (08:49 +0000)]
Use __sfvwrite() instead of __sputc() via __fputwc() to write to fake
string files (__SSTR flag set). This is necessary because __sputc()
does not respect the __SALC flag, and crashes trying to flush the buffer
instead of resizing it.
Kirk McKusick [Wed, 12 Nov 2003 08:09:19 +0000 (08:09 +0000)]
Update the five files derived from /sys/kern/syscalls.master
after the additions made for the new statfs structure (version
1.157). These must be updated in a separate checkin after
syscalls.master has been checked in so that they reflect its
new CVS identity. As these are purely derived files, it is not
clear to me why they are under CVS at all. I presume that it has
something to do with having `make world' operate properly.
Kirk McKusick [Wed, 12 Nov 2003 08:01:40 +0000 (08:01 +0000)]
Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.
You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.
Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Tim Robbins <tjr@freebsd.org>
Reviewed by: Julian Elischer <julian@elischer.org>
Reviewed by: the hoards of <arch@freebsd.org>
Sponsored by: DARPA & NAI Labs.
Robert Watson [Wed, 12 Nov 2003 03:14:31 +0000 (03:14 +0000)]
Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.
This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.
While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.
NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.
Alexander Kabaev [Wed, 12 Nov 2003 02:54:47 +0000 (02:54 +0000)]
1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.
2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.
3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.
4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.
Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson
Marcel Moolenaar [Wed, 12 Nov 2003 01:26:02 +0000 (01:26 +0000)]
Further work-out the handling of the high FP registers. The most
important change is in cpu_switch() where we disable the high FP
registers for the thread that we switch-out if the CPU currently
has its high FP registers. This avoids that the high FP registers
remain enabled for the thread even when the CPU has unloaded them
or the thread migrated to another processor.
Likewise, when we switch-in a thread of that has its high FP
registers on the CPU, we enable them. This avoids an otherwise
harmless, but unnecessary trap to have them enabled.
The code that handles the disabled high FP trap (in trap()) has
been turned into a critical section for the most part to avoid
being preempted. If there's a race, we bail out and have the
processor trap again if necessary.
Avoid using the generic ia64_highfp_save() function when the
context is predictable. The function adds unnecessary overhead.
Don't use ia64_highfp_load() for the same reason. The function
is now unused and can be removed.
These changes make the lazy context switching of the high FP
registers in an UP kernel functional.
John Baldwin [Tue, 11 Nov 2003 22:07:29 +0000 (22:07 +0000)]
Add an implementation of turnstiles and change the sleep mutex code to use
turnstiles to implement blocking isntead of implementing a thread queue
directly. These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.
Turnstiles do not come out of a fixed-sized pool. Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads. The queue associated with a given lock
is found by a lookup in a simple hash table. The turnstile itself is
protected by a lock associated with its entry in the hash table. This
means that sched_lock is no longer needed to contest on a mutex. Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.
Currently turnstiles only support mutexes. Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation. For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.
The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win). Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.
John Baldwin [Tue, 11 Nov 2003 18:20:10 +0000 (18:20 +0000)]
Some motherboards like to remap the SCI (normally IRQ 9) up to a PCI
interrupt such as IRQ 22 or 19. However, the ACPI BIOS still routes
interrupts from some PCI devices to the same intpin calling the pin
IRQ 22. Thus, ACPI expects to address a single interrupt source via two
different names. To work around this, if the SCI is remapped to a non-ISA
interrupt (i.e., greater than 15), then we use
acpi_OverrideInterruptLevel() function to tell ACPI to use IRQ 22 or 19
rather than IRQ 9 for the SCI.
Previously we would change IRQ 22 or 19's name to IRQ 9 when we encountered
such an Interrupt Source Override entry in the MADT which routed the SCI
properly but left PCI devices mapped to IRQ 22 or 19 w/o a routable
interrupt.
Jake Burkholder [Tue, 11 Nov 2003 18:01:44 +0000 (18:01 +0000)]
Set RB_SERIAL in boothowto if the firmware output-device is ttya or ttyb.
This ensures that uart gets a higher console priority than syscons when
a serial console is being used. Testing against the "console" environment
variable doesn't make sense since we only have one loader console driver.
Mike Silbersack [Tue, 11 Nov 2003 17:58:36 +0000 (17:58 +0000)]
Remove the m_defrag call from if_loop; testing with m_fragment
has shown that the IPv6 stack can clearly handle fragmented
mbuf chains without a problem.
John Baldwin [Tue, 11 Nov 2003 17:16:15 +0000 (17:16 +0000)]
Enable HTT CPUs by default instead of halting them by default. Users
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.
John Baldwin [Tue, 11 Nov 2003 17:14:26 +0000 (17:14 +0000)]
Disable probing of HTT CPUs by default for the MP Table case. HTT CPUs
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.
John Baldwin [Tue, 11 Nov 2003 15:52:31 +0000 (15:52 +0000)]
- Remove empty rogue SMP hardware section.
- Add some additional comments about 'device apic' to note that it can be
used in both UP and SMP kernels but is required for SMP kernels.
John Baldwin [Tue, 11 Nov 2003 15:47:44 +0000 (15:47 +0000)]
Axe rotted comment about MP Tables and PCI cards with built in bridges.
Now that we properly route PCI interrupts for the apic case, these cards
are no longer a problem.
Sort the device lists alphabetically to make it simpler to add new
devices to the lists in the appropriate places. This also makes it
easier to find devices in the lists.
Marcel Moolenaar [Tue, 11 Nov 2003 09:53:37 +0000 (09:53 +0000)]
Save and restore the high FP registers in {g|s}_mcontext(). Note
that we currently do not keep track of whether the thread has
actually used the high FP registers before. If not, we should
not save them in the context which automaticly means that we
also would not restore them from the context. For now, do it
unconditionally so that we can reach functional completeness.
Marcel Moolenaar [Tue, 11 Nov 2003 09:25:19 +0000 (09:25 +0000)]
Fix a nasty bug that got exposed when the sendsig() and sigreturn()
functions switched to using {g|s}et_mcontext(). The problem is that
sigreturn(), being a syscall, can be given an async. context (i.e.
one corresponding to an interrupt or trap). When this happens, we
try to return to user mode via epc_syscall_return with a trapframe
that can only be used to return to user mode via exception_restore.
To fix this, we check the frame's flags immediately prior to
epc_syscall_return and branch to exception_restore for non-syscall
frames. Modify the assertion in set_mcontext() to check that if
there's a mismatch, it's because of sigreturn().
Jake Burkholder [Tue, 11 Nov 2003 07:33:24 +0000 (07:33 +0000)]
Add a uart attachment/syscons keyboard driver for sun keyboards. In theory
this will work with any uart backend, currently supported hardware uses
either ns8250 or z8530.
Jake Burkholder [Tue, 11 Nov 2003 06:52:04 +0000 (06:52 +0000)]
Allow uart to attach to keyboards that are not the firmware's notion of
stdin, such as when using a serial console. We must recognize these
devices here so that we can override the tty attach routine.
Jake Burkholder [Tue, 11 Nov 2003 06:47:00 +0000 (06:47 +0000)]
Assume that unit 0 is the graphics console initialized by syscons, instead
of testing if the device's firmware node is stdout. This allows syscons to
be used when the firmware's input and output is the serial console.
Jake Burkholder [Tue, 11 Nov 2003 06:41:54 +0000 (06:41 +0000)]
Fix a bug in the data access error recorvery. Before re-enabling the data
cache after a data access error we must discard all cache lines. When
disabled existing cache lines are not invalidated by stores to memory, so
we risk reading stale data that was cached before the data access error if
we don't flush them. This is especially fatal when the memory involved
is the active part of the kernel or user stack. For good measure we also
flush the instruction cache.
This fixes random crashes when the X server probes the PCI bus through
/dev/pci.
Bruce Evans [Tue, 11 Nov 2003 06:27:34 +0000 (06:27 +0000)]
Include <sys/reboot.h> the definition of RB_BOOTINFO. The previous
commit broke the world because it depended on namespace pollution that
was only in my version of <machine/bootinfo.h>. The include was removed
in rev.1.63 after the last reference to it went away in rev.1.61.
Scott Long [Tue, 11 Nov 2003 05:38:28 +0000 (05:38 +0000)]
Fix sound LOR problems:
dsp_open: rearrange to only hold one lock at a time
dsp_close: ditto
mixer_hwvol_init: delete locking, the only consumer seems to
be the ess driver and it only call it a creation time, I
think the device will be stable across the sleepable malloc.
cmi interrupt routine: Release locks while caller chn_intr,
either this or do what emu10k1 does which is have no locks
at in the interrupt handler.
Joseph Koshy [Tue, 11 Nov 2003 04:59:25 +0000 (04:59 +0000)]
Add a section documenting the sysctl(8) tunables that influence the
operation of ktrace(2). Add a cross-reference to sysctl(8). Make the
language of rev 1.22 more consistent with the rest of the manual page.
Alan Cox [Tue, 11 Nov 2003 04:45:37 +0000 (04:45 +0000)]
- Revision 1.469 of vfs_subr.c resulted in the buf's b_object field being
consistency initialized. Consequently, a number of conditionals that
checked the validity of b_object before passing it to VM_OBJECT_LOCK()
and VM_OBJECT_UNLOCK() are no longer needed.
Alfred Perlstein [Tue, 11 Nov 2003 00:32:46 +0000 (00:32 +0000)]
Stop using shared locks for nfs vop locks.
The reason this was done was to avoid a race to the root when an
NFS server went down. However a semi-recent change to the way that
the kernel's lookup() routine traverses mount points prevents this.
Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we
aquire a shared lock on the mount point being accessed and then drop
the directory vnode lock before requesting the target lock.
With that in place we no longer need shared locks for NFS to prevent
race to the root lockups.
Ian Dowse [Mon, 10 Nov 2003 22:45:37 +0000 (22:45 +0000)]
In in_pcbconnect_setup(), don't use the cached inp->inp_route unless
it is marked as RTF_UP. This appears to fix a crash that was sometimes
triggered when dhclient(8) tried to send a packet after an interface
had been detatched.