CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log

Correct/Simplify ignore wide residue message handling

aic79xx.c:
In ahd_handle_ign_wide_residue():
o Use SCB_XFERLEN_ODD SCB field to determine transfer
  "oddness" rather than the DATA_COUNT_ODD logic.
  SCB_XFERLEN_ODD is toggled on every ignore wide
  residue message so that multiple ignore wide residue
  messages for the same transaction are properly supported.
o If the sg list has been exausted, the sequencer
  doesn't bother to update the residual data count
  since it is known to be zero.  Perform the zeroing
  manually before calculating the remaining data count.
o Use multibyte in/out macros instead of shifting/masking
  by hand.

aic79xx_inline.h:
In ahd_setup_scb_common(), setup the SCB_XFERLEN_ODD field.

aic79xx.reg:
Use the SCB_TASK_ATTRIBUTE field as a bit field in the
non-packetized case.  We currently only define one bit,
SCB_XFERLEN_ODD.

Remove the ODD_SEG bit field that was used to carry the odd
transfer length information through the SG cache.  This
is obviated by SCB_XFERLEN_ODD field.

Remove the DATA_COUNT_ODD scratch ram byte that was used
dynamicaly compute data transfer oddness.  This is obviated
by SCB_XFERLEN_ODD field.

aic79xx.seq:
Remove all updates to the DATA_COUNT_ODD scratch ram field.
Remove all uses of ODD_SEG.  These two save quite a few
sequencer instructions.

Use SCB_XFERLEN_ODD to validate the end of transfer
ignore wide residue message case.

FIFOEMP can lag LAST_SEG_DONE in the Ultra2 and U160
hardware. Wait a few extra clocks for FIFOEMP to assert
before calling an overrun.

Approved by: RE

Correct/Simplify ignore wide residue message handling

aic7xxx.c:
In ahc_handle_ign_wide_residue():
o Use SCB_XFERLEN_ODD SCB field to determine transfer
  "oddness" rather than the DATA_COUNT_ODD logic.
  SCB_XFERLEN_ODD is toggled on every ignore wide
  residue message so that multiple ignore wide residue
  messages for the same transaction are properly supported.
o If the sg list has been exausted, the sequencer
  doesn't bother to update the residual data count
  since it is known to be zero.  Perform the zeroing
  manually before calculating the remaining data count.
o Ensure that SG_LIST_NULL is cleared in the
  residual sg pointer for "mid-transfer" ignore
  wide residue cases.
o Use multibyte in/out macros instead of shifting/masking
  by hand.

aic7xxx.h:
Modify the SCB_GET_LUN() macro to mask the lun hardware
SCB field with LID.  This leaves two bits in the LUN
field that can be used for other purposes.

aic7xxx.reg:
Change LID to be 0x3F.  This is the maximum supported
lun size for non-packetized SCSI.  Map the top bit
of the lun to SCB_XFERLEN_ODD.  The host must set
this bit whenever a transfer is an odd length.

Remove the ODD_SEG bit field that was used to carry the odd
transfer length information through the SG cache.  This
is obviated by SCB_XFERLEN_ODD field.

Remove the DATA_COUNT_ODD scratch ram byte that was used
dynamicaly compute data transfer oddness.  This is obviated
by SCB_XFERLEN_ODD field.

aic7xxx.seq:
Be more careful in our handling of the SCB_LUN field.  It
must be masked with LID if only lun information is desired.

Remove all updates to the DATA_COUNT_ODD scratch ram field.
Remove all uses of ODD_SEG.  These two save quite a few
sequencer instructions.

Use SCB_XFERLEN_ODD to validate the end of transfer
ignore wide residue message case.

aic7xxx_inline.h:
In ahc_queue_scb(), setup the SCB_XFERLEN_ODD field.

Approved by: RE

Fix disabling of PCI parity error interrupts. We need to set
FAILDIS in the SEQCTL register, not the HCNTRL register.

aic7xxx.c:
Remeber SEQCTL settings in the "seqctl" field of our
softc. seqctl defaults to just having FASTMODE set,
but the bus attachments can override this.

aic7xxx.h:
Add the seqctl softc field.

aic7xxx_pci.c:
Update the seqctl softc field and manually update SEQCTL
when to many PCI errors occur

Approved by: RE

Change hadling of the Rev. A packetized lun output bug
to be more efficient by having the sequencer copy the
single byte of valid lun data into the long lun field.

aic79xx.c:
Memset our hardware SCB to 0 so that untouched
fields don't confuse diagnostic output. With the
old method for handling the Rev A bug, if the long
lun field was not 0, this could result in bogus
lun information being sent to drives.

Use the same SCB transfer size for all chip types
now that the long lun is not DMA'ed to the chip.

aic79xx.seq:
Add code to copy lun information for Rev.A hardware.

aic79xx_inline.h:
Remove host update of the long_lun field on every
packetized command.

Add 7901B support.

Sort IDs based on chip type.

Remove IROC IDs. We'll switch to using the IROC masks
if/when we want to start attaching to IROC controllers.

Approved by: RE

Fixup spelling of "coalesce" and derivatives.

Approved by: RE

Remove stray K&R style function definition.

Approved by: RE

pkg_create incorrectly does not add trailing '\n' when it receives
either COMMENT or DESCR from the command line. When a port is
installed, one gets both +COMMENT and +DESCR files with a trailing
'\n' character. However, +COMMENT does not contain a trailing '\n'
when it is installed from a package due to this behavior of pkg_create.

Therefore, make sure it behaves exactly the same regardless of
where got its information; either command line or files. The modified
functions are used by pkg_create.

PR: 52097
Reviewed by: bento, kris,
portmgr, re,
Michael Nottebrock <michaelnottebrock@gmx.net>,
Martin Horcicka <horcicka@FreeBSD.cz>
Approved by: re (scottl)
MFC after: 1 week

Add a trailing '\n' character if none is found in the information
obtained from a package. Patch show_file() [1] and show_index() [2]
functions.

PR: 52097
Reviewed by: bento, kris,
portmgr, re,
Michael Nottebrock <michaelnottebrock@gmx.net>,
Martin Horcicka <horcicka@FreeBSD.cz>
Approved by: re (scottl)
Obtained from: NetBSD [1],
OpenBSD [2]
MFC after: 1 week

Fix two typos from the last commit

Merge the following from the English version:

1.12  -> 1.15 early-adopter/article.sgml
1.143 -> 1.155 hardware/common/dev.sgml
1.5   -> 1.6 hardware/common/intro.sgml
1.9   -> 1.11 hardware/i386/proc-i386.sgml
1.2   -> 1.3 hardware/ia64/article.sgml
1.3   -> 1.7 hardware/ia64/proc-ia64.sgml
1.6   -> 1.7 share/sgml/release.dsl

Approved by: re (blanket)

De-orbit bus_dmamem_alloc_size from here too.

Pointed out by: des
Pointy hat to: me

Remember to close the read end of the pipe.

Remove uninitialized local variable in favor of global.

PR: bin/52685
Submitted by: Alexander Nedotsukov <bland@mail.ru>
Approved by: re (scottl)

De-orbit bus_dmamem_alloc_size(). It's a hack and was never used anyways.
No need for it to pollute the 5.x API any further.

Approved by: re (bmah)

Decouple the thread stack [de]allocating functions from the 'dead threads list'
lock. It's not really necessary and we don't need the added complexity
or potential for deadlocks.

Approved by: re/blanket libthr

Revise the unlock order in _pthread_join(). Also, if the joined
thread is not dead, the join loop is guaranteed to execute at least
once, so there is no need to pick up the thread list lock after
we return from suspenstion only to release it after the loop.

Approved by: re/blanket libthr

Return gracefully, rather than aborting, when the maximum concurrent
threads per process has been reached. Return EAGAIN, as per spec.

Approved by: re/blanket libthr

Copy the va_list in sbuf_vprintf() before passing it to vsnprintf(),
because we could fail due to a small buffer and loop and rerun.  If this
happens, then the vsnprintf() will have already taken the arguments off
the va_list.  For i386 and others, this doesn't matter because the
va_list type is a passed as a copy.  But on powerpc and amd64, this is
fatal because the va_list is a reference to an external structure that
keeps the vararg state due to the more complicated argument passing system.
On amd64, arguments can be passed as follows:
First 6 int/pointer type arguments go in registers, the rest go on
  the memory stack.
Float and double are similar, except using SSE registers.
long double (80 bit precision) are similar except using the x87 stack.
Where the 'next argument' comes from depends on how many have been
processed so far and what type it is.  For amd64, gcc keeps this state
somewhere that is referenced by the va_list.

I found a description that showed the va_copy was required here:
http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?va_end+9
The single unix spec doesn't mention va_copy() at all.

Anyway, the problem was that the sysctl kern.geom.conf* nodes would panic
due to walking off the end of the va_arg lists in vsnprintf.  A better fix
would be to have sbuf_vprintf() use a single pass and call kvprintf()
with a callback function that stored the results and grew the buffer
as needed.

Approved by: re (scottl)

- Create a new lock, umtx_lock, for use instead of the proc lock for
protecting the umtx queues. We can't use the proc lock because we need
to hold the lock across calls to casuptr, which can fault.

Approved by: re

Don't do silly thing if the disk_create() event gets canceled.

Approved by: re/scottl

- Reset the free ent to NULL if we have consumed the last free entry.  This
   fixes a problem where we would overwrite old data if we ran out of free
   entries.

Submitted by: sam
Approved by: re (scottl)

_pthread_cancel() breaks the normal lock order of first locking the
joined and then the joiner thread. There isn't an easy (sane?) way
to make it use the correct order without introducing races involving
the target thread and finding which (active or dead) list it is on. So,
after locking the canceled thread it will try to lock the joined thread
and if it fails release the first lock and try again from the top.

Introduce a new function, _spintrylock, which is simply a wrapper arround
umtx_trylock(), to help accomplish this.

Approved by: re/blanket libthr

Part of the last patch.
Modify the thread creation and thread searching routine
to lock the thread lists with the new locks instead of GIANT_LOCK.

Approved by: re/blanket libthr

Start locking up the active and dead threads lists. The active threads
list is protected by a spinlock_t, but the dead list uses a pthread_mutex
because it is necessary to synchronize other threads with the garbage
collector thread. Lock/Unlock macros are used so it's easier to make
changes to the locks in the future.

The 'dead thread list' lock is intended to replace the gc mutex.
This doesn't have any practical ramifications. It simply makes it
clearer what the purpose of the lock is. The gc will use this lock,
instead of the gc mutex, to synchronize access to the dead list with
other threads.

Modify _pthread_exit() to use these two new locks instead of GIANT_LOCK,
and also to properly lock and protect thread state changes,
especially with respect to a joining thread.

The gc thread was also re-arranged to be more organized and less nested.

_pthread_join() was also modified to use the thread list locks. However,
locking and unlocking here needs special care because a thread could find
itself in a position where it's joining an exiting thread that is
waiting on the dead list lock, which this thread (joiner) holds. If the
joiner doesn't take care to lock *and* unlock in the same order they
(the joiner and the joinee) could deadlock against each other.

Approved by: re/blanket libthr

The libthr code makes use of higher-level primitives (pthread_mutex_t and
pthread_cond_t) internaly in addition to the low-level spinlock_t. The
garbage collector mutex and condition variable are two such examples. This
might lead to critical sections nested within critical sections. Implement
a reference counting mechanism so that signals are masked only on the first
entry and unmasked on the last exit.

I'm not sure I like the idea of nested critical sections, but if
the library is going to use the pthread primitives it might be necessary.

Approved by: re/blanket libthr

Add a pretty cheesy hack to avoid a gcc-3.2.2 ICE (internal compiler
error) on amd64 when doing pointer subtraction.  This bug is already
fixed in gcc-3.3 (waiting for after the branch), and the hack will be
backed out at the first opportunity.  This is in the ipv6 code path.

Approved by:  re (scottl)

The struct mcontext has changed. It's using the register sets. Bring
this in line.

Beat vnode locking in the NFS server code into submission. This change
is not pretty, but it fixes the code so that it no longer violates the
vnode locking rules in the VFS API and doesn't trip any of the locking
assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option.
There is one report that this patch fixed a "locking against myself"
panic on an NFS server that was tripped by a diskless client.

Approved by: re (scottl)

Always set the hardware parse bit in the IPCB structure when this
structure, which is new to the 82550 and 82551, is used to transmit
a packet.  This appears to fix the packet truncation problem that was
observed when using 82550-based fxp cards to transmit ICMP or fragmented
UDP packets of certain lengths which only had one to three bytes in the
second and final mbuf of the packet.  This matches a note in the "Intel
8255x 10/100 Mbps Ethernet Controller Family Open Source Software Developer
Manual", which says that the hardware parse bit should be set when sending
these types of packets.

There have also been unconfirmed reports of similar problems when
transmitting TCP packets, which should not be affected by the above
mentioned change because the hardware parse bit was already being set
if the stack requested hardware checksumming of the packet.  If the
problem remains, the use of the IPCB structure can be disabled to
cause the driver to fall back to using the older 82559 interface with
82550-based cards by setting
        hint.fxp.UNIT_NUMBER.ipcbxmit_disable
to a non-zero value at boot time, or using kenv to set this variable
before using kldload to load the fxp driver.

Approved by: re (jhb)

Add textproc/opensp into $MINIMALDOCPORTS when openjade is used.

Reported by: scottl (by alpha building breakage)
Approved by: re (scottl)

Now that we define user mode as any IP address that isn't in the
kernel's VA regions, we cannot limit the use of break-based
syscalls to user mode only. The signal trampolines are in the
gateway page, which is mapped into the process address space in
region 5 and thus is kernel space.

We don't special case the gateway page here. Allow break-based
syscalls from anywhere in the kernel VA space.

Approved by: re@ (blanket)

Ignore the 'must allocate below 1MB' flag for the TPL_BAR_REG.  It is
set on realtek cards, but they work without it (and don't work with
it).  The standard seems to imply that this is just a hint anyway, so
this should be harmless.  It doesn't appear to be set on any other
cardbus cards that I have (or have seen).

This should make the rl based CardBus cards work again.  I've been
running it for about a month now.

Approved by: re@ (jhb)

Fix a source of instability specific to an EPC userland. We return
to userland with interrupts disabled until we restore PSR. However,
it has been observed that interrupts do actually happen before they
are enabled again. This is a bit surprising and I don't know yet
what's going on exactly. Nevertheless, the code was not crafted
carefully enough to allow interrupts to happen and we could
clobber the kernel stack of another thread when interrupts did
happen.

This is what happens: we restore the (memory) stack pointer (sp)
and the register stack base prior to restoring ar.k6 and ar.k7.
This is not a problem if interrupts don't happen between setting
sp/ar.bspstore and ar.k6/ar.k7. Alas, interrupts can happen.
Since sp/ar.bspstore already point to the userland stacks, we
need to switch to the kernel stack in interrupt. However, ar.k6
and ar.k7 have not been set, which means that we were switching
to some unrelated kstack and happily clobbered the trapframe
present there if the thread to which the kstack belonged was
in kernel mode or otherwise we could have our trapframe clobbered
if that other thread enters the kernel. Nasty either way.

We now carefully restore ar.k6 prior to restoring ar.bspstore and
likewise for ar.k7 and sp. All we need is the guarantee that an
interrupt does not clobber ar.k6 or ar.k7 before we're back in
userland. That has been achieved by restoring ar.k6/ar.k7
unconditionally (see exception.s)

While here, remove the disabling of interrupts on EPC entry. It
was added as a way to "resolve" the crashes until it was understood
what was going on. I think I achieved the latter, so we can remove
the patch. Note that setting up a trapframe with interrupts
enabled has it's own share of corner cases, but it's better to
properly fixed those than to keep a mostly wrong patch around
because we're afraid to remove it...

Approved by: re@ (blanket)

Be more careful how we restore interrupts. Don't rewrite most of the
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.

Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.

This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.

This change does not fix any bugs, other than that it bugged me...

Approved by: re@ (blanket)

Consistently us the same metric to differentiate between kernel mode
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.

We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)

For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.

Approved by: re@ (blanket)

Add __amd64__ ifdefs to enable the bootblock handling code, slices, etc.

Approved by: re (murray)
Obtained from: obrien

Add a temporary indirect patch for gcc when targeting amd64.  This is to
give the cvs tree a surviving a 'make world'.  One of the two diff chunks
is already in gcc-3.3, the other has been committed to gcc's HEAD and
is in the pipeline for gcc-3.3.1 (but has not been committed yet).

The first chunk simplifies an excessively complex assembler statement
when generating switch jump tables.  The use of '.' causes as(1) to choke
on big files.  Use a simpler form instead.  This is only an issue for
TARGET_64BIT mode.

The second chunk fixes an internal compiler error when compiling
libc/stdio/vfprinf.c.  While this is supposedly only an issue for
64 bit mode, it does touch the 32 bit i386 code paths, so this patch
is only applied for TARGET_ARCH == amd64 to keep the risks down.
Breaking gcc at the 11th hour would suck.

This will be removed when it is time to import gcc-3.3.

Discussed with: kan
Approved by: re (jhb)

Unconditionally restore ar.k7 (memory stack) and ar.k6 (register stack)
when returning from an interrupt. Both registers are used on interrupt
to switch to the right kernel stack, but other than that they are not
used. This means we only have to make sure they contain proper values
while in user mode. As such, we conditionally restored these registers
based on whether we returned to userland or not. A nice property of
conditionally restoring ar.k6 and ar.k7 is that it introduces two
invariants: ar.k6 always points to the bottom of the kernel stack and
ar.k7 always points to the top of the kernel stack (immediately below
the PCB we have there).

However, the EPC syscall path introduces an irregularity: there's no
"thin red line" between user and kernel. There's a grey area that's a
couple of instructions wide. Any interruption in that grey area is
bound to see an inconsistent state. One such state is that we're in
kernel space for all practical purposes, but we still need to have
ar.k6 and ar.k7 restored as if we're in userland.

Thus: restore ar.k6 and ar.k7 unconditionally at the cost of losing
a valuable invariant. Both registers now hold the extend of the
usable portion of the kernel stack at any interrupt nesting, which
when in userland mean the bottom and the top of the kstack.

mdoc(7) fixes.

Approved by: re (blanket)

libstdc++.so breaks on amd64 due to bogons in our build, so prevent the
shared library being built for amd64.  The problem is that libstdc++.so
is produced with 'cc -shared'.  This has an internal -lgcc, which is
not PIC.  libstdc++.so uses exceptions and the dwarf2 unwinder, which
are in libgcc.a.  As a result, non-PIC code gets pulled into libstdc++.so.
This is fatal on amd64 when certain relocation types cannot be used in
PIC mode.  The official FSF solution to this is to have libgcc.so with
internal ELF symbol versioning to solve the ABI problem, but I dont want
to fight that battle yet.  I tried making libgcc_pic.a (which worked
fine), but thats not something for the 11th hour before a release.

Approved by:  re (amd64 "safe" stuff)

no libc_r on amd64 yet -> no pppctl.

Approved by: re (safe amd64 changes)

Merge some entries from maho's USB device compatibility list.

Approved by: re (bmah)
Obtained from: http://people.FreeBSD.org/~maho/USB/

Get usb(4) manual page closer to reality:

- update ``struct usb_device_info''
- add information about new fields in about struct
- document USB_EVENT_IS_ATTACH() and USB_EVENT_IS_DETACH()
- update URL of the USB.ORG developer documentation

PR: docs/41580 (original patch)
Reviewed by: n_hibma
Approved by: des (mentor), re (bmah)

Stop profiled libc from exploding, matching gcc's generated code.

Approved by: re (amd64/* blanket)

Bring vnode(9) man page to its senses:

- remove '-*- nroff -*-'
- bump the date

- nuke outdated ``struct vnode''
(it is just better to lookup the struct in the header)

- nuke ``enum vtype'' and related junk
- add a one line about ``struct vnode''
- use .Va instead of .Dv for vnode struct fields

Approved by: des (mentor), re (bmah)
Reviewed by: arch@, mentor

Do not exclude amd64 from rtld-elf builds.

Approved by: re (safe amd64 support commits)

Initial pass at supporting shared libraries on amd64. There are still
a few missing relocation types in amd64/reloc.c, but I have not found
any of them in use yet. :-)

Approved by: re (amd64/* blanket)

Repair PIC mode. It seems I was a bit too excited about the
implications of native PC relative addressing.

Change low-level locking a bit so that we can tell if
a lock is being waitied on.

Fix a races in join and cancellation.

When trying to wait on a CV and the library is not yet
threaded, make it threaded so that waiting actually works.

When trying to nanosleep() and we're not threaded, just
call the system call nanosleep instead of adding the thread
to the wait queue.

Clean up adding/removing new threads to the "all threads queue",
assigning them unique ids, and tracking how many active threads
there are. Do it all when the thread is added to the scheduling
queue instead of making pthread_create() know how to do it.

Fix a race where a thread could be marked for signal delivery
but it could be exited before we actually add the signal to it.

Other minor cleanups and bug fixes.

Submitted by: davidxu
Approved by: re@ (blanket for libpthread)

Lock the cond queue (condition variables):
Access to the thread's flags and state is protected by
_thread_critical_enter/exit(). When a thread is signaled with a condition
its state must be protected by locking it and disabling
signals before it is taken of the waiters' queue.

Move the implementation of pthread_cond_signal() and pthread_cond_broadcast()
into one function, cond_signal(). Its behaviour is determined by the
last argument, int broadcast. If this is set to 1 it will remove all
waiters, otherwise it will wake up only the first waiter thread.

Remove an extraneous call to pthread_testcancel().

Approved by: re/blanket libthr

Fix an alpha inheritance bug:

On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.

The cleanup constitutes:
o  Remove the unused arguments from ia64_init().
o  Don't return from ia64_init(), but instead call mi_startup()
   directly. This reduces the amount of muckery in assembly and
   also allows for the next bullet:
o  Save our currect context prior to calling mi_startup(). The
   reason for this is that many threads are created from thread0
   by cloning the PCB. By saving our context in the PCB, we have
   something sane to clone. It also ensures that a cloned thread
   that does not alter the context in any way will return to
   the saved context, where we're ready for the eventuality with
   a nice, user unfriendly panic().

The cleanup fixes at least the following bugs:
o  Entering mi_startup() with the RSE in enforced lazy mode.
o  Re-execution of ia64_init() in certain "lab" conditions.

While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.

Approved by: re@ (blanket)

Fix a (new) source of instability:

When interrupting a kernel context, we don't need to switch stacks
(memory nor register). As such, we were also not restoring the
register stack pointer (ar.bspstore). This, however, fails to be
valid in 1 situation: when we interrupt a register stack switch as
is being done in restorectx(). The problem is that restorectx()
needs to have ar.bsp == ar.bspstore before it can assign the new
value to ar.bspstore. This is achieved by doing a loadrs prior to
assigning to ar.bspstore. If we take an interrupt in between the
loadrs and the assignment and we don't make sure we restore the
ar.bspstore prior to returning from the interrupt, we switch
stacks with possibly non-zero dirty registers, which means that
the new frame pointer (ar.bsp) will be invalid.

So, instead of jumping over the restoration of the register frame
pointer and related registers, we conditionalize it based on whether
we return to kernel context or user context. A future performance
tweak is possible by only restoring ar.bspstore when returning to
kernel mode *and* when the RSE is in enforced lazy mode. One cannot
assume ar.bsp == ar.bspstore if the RSE is not in enforced lazy mode
anyway.

While here (well, not quite) don't unconditionally assign to
ar.bspstore in exception_save. Only do that when we actually switch
stacks. It can only harm us to do it unconditionally.

Approved by: re@ (blanket)

Add two functions: _spinlock_pthread() and _spinunlock_pthread()
that take the address of a struct pthread as their first argument.
_spin[un]lock() just become wrappers arround these two functions.
These new functions are for use in situations where curthread can't be
used. One example is _thread_retire(), where we invalidate the array index
curthread uses to get its pointer..

Approved by: re/blanket libthr

In swapctx(), put the RSE in enforced lazy mode before we flush the
register stack. There's nothing really wrong with flushing before
putting the RSE in enforced lazy mode, provided you don't depend on
ar.bspstore being equal to ar.bsp when the RSE has been put in
enforced lazy more. The small window between the flush and setting
the RSE may be sufficient to have the RSE eagerly increase the dirty
region (and hence cause ar.bspstore != ar.bsp) or have an interrupt
that may even get the laziest RSE to do something.

Anyway: we don't depend on ar.bspstore being equal to ar.bsp, so
nothing was and is broken. But the code was non-intuitive and
easily confuses. This is a source of future bugs.

Note: the advantage of not depending on ar.bspstore is that there's
some recilience against an interrupted flushrs. Clobbering is limited
to stacked register contents only, not to RSE address clobbering.

Approved: re@ (blanket)

Fix a typo in rev 1.10

Flesh out the libkse note a bit. Source material kindly provided by
deischen, any inaccuracies are mine.

Approved by: re (implicitly)

Make the maximum number of vnodes a function of both the physical memory
size and the kernel's heap size, specifically, vm_kmem_size.  This
function allows a maximum of 40% of the vm_kmem_size to be used for
vnodes and vm objects.  This is a conservative bound based upon recent
problem reports.  (In other words, a slight increase in this percentage
may be safe.)

Finally, machines with less than ~3GB of RAM should be unaffected
by this change, i.e., the maximum number of vnodes should remain
the same.  If necessary, machines with 3GB or more of RAM can increase
the maximum number of vnodes by increasing vm_kmem_size.

Desired by: scottl
Tested by: jake
Approved by: re (rwatson,scottl)

I'm lost in a maze of twisty little tunables, all different.
The ACPI-disabling hint goes into device.hints, not loader.conf.

Pointed out by: njl

Add some hopefully helpful notes about ACPI.

Approved by: re (implicitly)
Reviewed by: imp

Move ($create-refentry-xref-link$) to the language-neutral place
and add entities &release.manpath.*; for man.cgi's manpath=XXX.

Approved by: re (bmah)

EDOOFUS
Prevent one thread from messing up another thread's saved signal
mask by saving it in struct pthread instead of leaving it as a
global variable. D'oh!

Approved by: re/blanket libthr

Make WARNS2 clean. The fixes mostly included:
o removed unused variables
o explicit inclusion of header files
o prototypes for externally defined functions

Approved by: re/blanket libthr

note to self: do not confuse void* with int.

Approved by: re/blanket libthr

Typo fix.  oops.

Submitted by:  jmallett
Approved by:   re (blanket amd64/*)

Update comments.  Note that the kernel is at -1GB, not -2GB as erroniously
implied by the previous commit.  KVM is still only 1GB until
pmap_growkernel() learns about the extra page table level.

Approved by:  re (blanket)

As suggested by the gdb folks, pad the 'struct fpreg' to a full 512 bytes
to match the native fxsave/fxrstor object size since thats apparently what
the Linux/NetBSD folks do.

Add amd64 to the MACHINE_ARCH list of systems that link bsdlabel to
disklabel. I just got burnt again by having an old disklabel binary
kicking around.

Discussed with: phk
Approved by: re (safe amd64 stuff)

Low risk amd64 fix. Use a vm_offset_t for the virtual location of the
buffer space instead of a u_int32_t. Otherwise the upper 32 bits of
the address space get truncated and syscons blows up.

Approved by: re (safe, low risk amd64 fixes)

Deal with the user VM space expanding.  32 bit applications do not like
having their stack at the 512GB mark.  Give 4GB of user VM space for 32
bit apps.  Note that this is significantly more than on i386 which gives
only about 2.9GB of user VM to a process (1GB for kernel, plus page
table pages which eat user VM space).

Approved by: re (blanket)

Major pmap rework to take advantage of the larger address space on amd64
systems.  Of note:
- Implement a direct mapped region using 2MB pages.  This eliminates the
  need for temporary mappings when getting ptes.  This supports up to
  512GB of physical memory for now.  This should be enough for a while.
- Implement a 4-tier page table system.  Most of the infrastructure is
  there for 128TB of userland virtual address space, but only 512GB is
  presently enabled due to a mystery bug somewhere.  The design of this
  was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
  advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
  to fit virtual addresses in an 'int'.

Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
  downwards below kernbase.  The kernel must be at the top 2GB of the
  negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.

Approved by: re (blanket)

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

expand_table: Add parameters file and line if we're debugging.

Approved by: re (jhb)

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

Add and clarify comments.

Approved by: re (jhb)

expand_table: Add parameters file and line if we're debugging.

MMalloc, vinum_meminfo: Use strlcpy to copy file name.

Approved by: re (jhb)

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

Approved by: re (jhb)

detachobject: Update volume config after detaching a plex.

update_volume_config: Remove redundant diskconfig parameter.

Approved by: re (jhb)

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

update_volume_config: Remove redundant diskconfig parameter.

expand_table: Add parameters file and line if we're debugging.

Approved by: re (jhb)

Change many strcpys to strlcpys, etc.

Submitted by:    Ted Unangst <tedu@stanford.edu>

Correct some inaccurate and badly formatted comments.

config_subdisk: If our drive is down, ensure that the subdisk is
crashed.  Previously it was possible for the subdisk
to be up when the drive was down.

Change the way the plex lock mutexes work.  Previously they were part
of the struct plex, which tore apart the mutex linked lists when the
plex table was expanded.  Now we maintain a pool of mutexes (currently
32) to be shared by all plexes.  This is still a lot better than the
splhigh() method used in other architectures.

update_volume_config: Remove redundant diskconfig parameter.

Approved by: re (jhb)

Modified release note: Note code generation problems with the base
system GCC using -march=pentium4, and the local workaround in our
Makefile infrastructure.

Approved by: re (implicitly)

o Document the tunables that acpi allows. (mdoc gurus please comment
on and fix if neceeary).
o Note that acpi is available on i386-ia32, ia64 and amd64, not just 'intel'
platforms. Intel has had nothing to do with amd64.

Approved by: re (scottl@)

Correctly tag some on-board Ethernet devices with the right
architecture.

Approved by: re (implicitly)

Note a puc(4) device that works on ia64.

Submitted by: Jim Brown <jpb@sixshooter.v6.thrupoint.net>
Approved by: re (implicitly)

Add more ia64 device information, in a section similar to that for
FreeBSD/alpha. Heavily hacked version of a diff that was...

Submitted by: Jim Brown <jpb@sixshooter.v6.thrupoint.net>
Approved by: re (implicitly)

Enable some devices on ia64. Based on patch that was...

Submitted by: Jim Brown <jpb@sixshooter.v6.thrupoint.net>
Approved by: re (implicitly)

Merge from i386/trap.c rev 1.252. Use td_critnest instead of the
spinlocks count for explicitly enabling interrupts.

Approved by: re (blanket)

The "krb5" distribution was merged with "crypto", record the death.

Reviewed by: jhb
Approved by: re (jhb)

When newfs'ing a partition with UFS2 that had previously been newfs'ed
with UFS1, the UFS1 superblocks were not deleted. This allowed any
RELENG_4 (or other non-UFS2-aware) fsck to think it knew how to "fix"
the file system, resulting in severe data scrambling.

This patch is a more advanced version than the one originally submitted.
Lukas improved it based on feedback from Kirk, and testing by me. It
blanks all UFS1 superblocks (if any) during a UFS2 newfs, thereby causing
fsck's that are not UFS2 aware to generate the "SEARCH FOR ALTERNATE
SUPER-BLOCK FAILED" message, and exit without damaging the fs.

PR: bin/51619
Submitted by: Lukas Ertl <l.ertl@univie.ac.at>
Reviewed by: kirk
Approved by: re (scottl)

Calculate routed interrupts using the slot number from the device and
not that of the bridge.

Approved by: re (jhb)

Mark a couple of instances of onboard NICs as i386-only.

Approved by: re (implicitly)

Fix two misuses of __BSD_VISIBLE.

Submitted by: bde
Approved by: re

Change -march=pentium4 to -march=pentium3 when CPUTYPE==p4, because gcc 3.2 is
known to produce broken code with -march=pentium4. Add a note explaining this.
This should be removed when we update to gcc 3.3 or the bug is otherwise fixed.

Approved by: re

Add a link to the FreeBSD/ia64 project. Maybe should do this for
other platforms that have their own project pages too?

Based on a patch that was:

Submitted by: Jim Brown <jpb@sixshooter.v6.thrupoint.net>
Approved by: re (implicitly)

Update the abstract to be somewhat more helpful. Based on a patch
that was...

Submitted by: Jim Brown <jpb@sixshooter.v6.thrupoint.net>
Approved by: re (implicitly)

Erase whitspace at EOL.

Approved by: re (blanket)

Assorted mdoc(7) fixes.

Approved by: re (blanket)

Erase whitespace at EOL.

Approved by: re (blanket)

Moved $FreeBSD$ tag to where it belongs.

Approved by: re (blanket)

Nitpicking.

Approved by: re (blanket)

Assorted mdoc(7) fixes.

Approved by: re (blanket)