iedowse [Tue, 25 Nov 2003 02:23:29 +0000 (02:23 +0000)]
Write the correct value to `td_be' for the second and further
transfer descriptors when a large request needs to be split into
more than one 8k chunk. The bug was that the calculation did not
take into account the offset of the chunk within the overall request.
This is reported to fix crashes and data corruption on ohci
controllers.
sam [Mon, 24 Nov 2003 03:57:03 +0000 (03:57 +0000)]
Correct a problem where ipfw-generated packets were being returned
for ipfw processing w/o an indication the packets were generated
by ipfw--and so should not be processed (this manifested itself
as a LOR.) The flag bit in the mbuf that was used to mark the
packets was not listed in M_COPYFLAGS so if a packet had a header
prepended (as done by IPsec) the flag was lost. Correct this by
defining a new M_PROTO6 flag and use it to mark packets that need
this processing.
sam [Sun, 23 Nov 2003 18:13:41 +0000 (18:13 +0000)]
Use MPSAFE callouts only when debug.mpsafenet is 1. Both timer routines
potentially transmit packets that may enter KAME IPsec w/o Giant if the
callouts are marked MPSAFE.
tmm [Sun, 23 Nov 2003 03:02:00 +0000 (03:02 +0000)]
bzero() the the sockaddr used for the destination address for
rtalloc_ign() in in_pcbconnect_setup() before it is filled out.
Otherwise, stack junk would be left in sin_zero, which could
cause host routes to be ignored because they failed the comparison
in rn_match().
This should fix the wrong source address selection for connect() to
127.0.0.1, among other things.
alfred [Sat, 22 Nov 2003 02:21:49 +0000 (02:21 +0000)]
Use function pointers to remove the depenancy cross dependancy on nfs4
and the nfs3 client. Also fix some bugs that happen to be causing crashes
in both v3 and v4 introduced by the v4 import.
Submitted by: Jim Rees <rees@umich.edu>
Approved by: re
peter [Sat, 22 Nov 2003 01:11:07 +0000 (01:11 +0000)]
Argh! The Athlon64 and Opteron only implement 40 bits of address space in
the MTRR Base/Mask registers. If you use the documented algorithm in the
systems programming guide, you'll get a GPF. The only thing that has
prevented this so far is that the bios pre-sets some MTRR entries which
we mis-interpreted sufficiently to fool the memcontrol interface into
thinking all the address space was taken and therefore rejected XFree86's
requests. However, not all bioses do this.. You get an insta-panic in
that case. Grrr. A better fix (dynamic mask) will happen by 5.3/5-stable
so that we automatically adapt to more than 40 physical bits.
jhb [Fri, 21 Nov 2003 22:23:26 +0000 (22:23 +0000)]
- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.
Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)
njl [Fri, 21 Nov 2003 21:24:31 +0000 (21:24 +0000)]
Update code for checking the reference count and performing the final
delete of objects. Also revert our temporary workaround in dsmthdat.c
that always copied objects. This is the correct fix for errors
evaluating _BST (and GBST) on IBM Thinkpads where an argument (Arg3)
was returned to the caller and the object was freed while still in use.
This will be in a future ACPI-CA dist.
njl [Fri, 21 Nov 2003 21:21:17 +0000 (21:21 +0000)]
Add the byte offset to the base address for IndexField objects. This
fixes an interrupt storm for certain users. This is done on the vendor
branch since the code is already in the 20031029 ACPI-CA dist and will
be imported after 5.2R.
Tested by: sebastian ssmoller <sebastian.ssmoller@gmx.net>
PR: i386/57909
Approved by: re (jhb)
dcs [Fri, 21 Nov 2003 19:01:02 +0000 (19:01 +0000)]
With the beastie menu a problem was introduced in which selecting a
different kernel to boot with kernel="NAME" would load the kernel and
loader.conf-selected modules from /boot/NAME, but it would not change
module_path. So, for instance, the automatically loaded acpi.ko would come
from /boot/kernel/acpi.ko, *always*.
Mind you, this happened for unassisted boot. If you interrupted, typed
"unload" and then "boot NAME", it would Do The Right Thing.
The source of the problem is the double initialization with beastie's
loader.rc. One would happen inside "start", and would load the kernel. The
next one would happen later in the loader.rc script, resetting module_path.
Because module_path is set to the Right Value by the functions in support.4th
that actually load the kernel, when beastie.4th proceeded to boot
module_path would remain wrong, as the kernel was already loaded.
This can be corrected by removing either initialization, and also by changing
the command used by beastie.4th from "boot" to "boot-conf", which makes sure
you use the right kernel and modules.
I chose to remove the second initialization, since this let you interrupt
(or confirm) boot before beastie even comes up. I avoid also doing the
boot-conf change because that would simply cause the kernel and modules to
be loaded twice (in fact, that was my original patch, until, in writing this
very commit message, I saw the error of my ways).
This commit changes the semantics of module loading when using the beastie
menu. Now it does what one would expect it to, but not what it was actually
doing, so something may break for unusual setups depending on broken
behavior. As our japanese friends so nicely put it, shikata ga nakatta. :-)
peter [Fri, 21 Nov 2003 03:19:59 +0000 (03:19 +0000)]
Turn on NO_MIXED_MODE for amd64 generic. It turns out that all the
known samples of broken chipsets that needed mixed mode in the first place
are so broken (ie: locks up) that we can't use IO APIC mode at all and it
needs to be turned off in the bios. So, the MIXED_MODE penalty on the
good chipsets gained nothing.
peter [Thu, 20 Nov 2003 22:54:44 +0000 (22:54 +0000)]
Provide a streamlined '#define curthread __curthread()' for amd64 to avoid
the compiler having to parse and optimize the PCPU_GET(curthread) so often.
__curthread() is an inline optimized version of PCPU_GET(curthread) that
knows that pc_curthread is at offset zero in the pcpu struct. Add a
CTASSERT() to catch any possible changes to this. This accounts for
just over a 1% wall clock speedup for total kernel compile/link time,
and 20% compile time speedup on some specific files depending on which
compile options are used.
peter [Thu, 20 Nov 2003 22:50:26 +0000 (22:50 +0000)]
Allow the MD backend to provide an alternative to
#define curthread PCPU_GET(curthread)
since its so heavily used in the kernel and ripe for compile-speed
optimization on some platforms.
andre [Thu, 20 Nov 2003 21:47:20 +0000 (21:47 +0000)]
Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.
It removes significant locking requirements from the tcp stack with
regard to the routing table.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
jhb [Thu, 20 Nov 2003 21:23:49 +0000 (21:23 +0000)]
Try all of the possible interrupts for a link device when programming
boot-disabled devices instead of skipping the last interrupt. This is
especially important for devices that only have one interrupt as this
bug was keeping any interrupt from being tried at all.
andre [Thu, 20 Nov 2003 20:07:39 +0000 (20:07 +0000)]
Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.
It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.
tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.
It removes significant locking requirements from the tcp stack with
regard to the routing table.
Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)
marcel [Thu, 20 Nov 2003 16:42:39 +0000 (16:42 +0000)]
Set the ACPI processor Id in the PCPU structure so that CPU idling
on SMP systems has a chance of working. This was a loose end of the
implementation of the ACPI Cx idle states. Since our logical CPU Id
is the ACPI processor Id, we do not need to jump through hoops to
obtain it.
rwatson [Thu, 20 Nov 2003 03:47:50 +0000 (03:47 +0000)]
A variety of content cleanups:
(1) Document the notion of using jail(8) to run "virtual servers" or
just to constrain specific applications. If only running specific
applications, some configuration steps are unnecessary (such as
editing rc.conf).
(2) Add some more subsection headers to break up the bigger chunks of
text.
(3) Clarify the problems associated with applications binding all IP
addresses in the host, and attempt to be more specific about
potential application problems. Document how to force sshd to
bind the the right socket.
(4) Suggest that in a jailed application scenario, you might want to
have the host syslogd listen on the socket in the jail, rather
than running syslogd in the jail.
njl [Wed, 19 Nov 2003 20:28:56 +0000 (20:28 +0000)]
Improve the section on Cx states, documenting the removal of -1 as a
valid value for cx_lowest. To disable sleeping, use machdep.cpu_idle_hlt
instead. Update the version of the ACPI spec we implement.
njl [Wed, 19 Nov 2003 20:27:06 +0000 (20:27 +0000)]
* Add a DEVMETHOD for acpi so that child detach methods get called. Add
an acpi_cpu method for shutdown that disables entry to acpi_cpu_idle
and then IPIs/waits for threads to exit. This fixes a panic late in
reboot in the SMP case.
* In the !SMP case, don't use the processor id filled out by the MADT
since there can only be one processor. This was causing a panic in
acpi_cpu_idle if the id was 1 since the data was being dereferenced from
cpu_softc[1] even though the actual data was in cpu_softc[0] (which is
correct).
* Rework the initialization functions so that cpu_idle_hook is written
late in the boot process.
* Make the P_BLK, P_BLK_LEN, and cpu_cx_count all softc-local variables.
This will help SMP boxes that have _CST or multiple P_BLKs. No such
boxes are known at this time.
* Always allocate the C1 state, even if the P_BLK is invalid. This means
we will always take over idling if enabled. Remove the value -1 as
valid for cx_lowest since this is redundant with machdep.cpu_idle_hlt.
* Reduce locking for the throttle initialization case to around the write
to the smi_cmd port. Add disabled code to write the CST_CNT. It will
be enabled once _CST re-evaluation is tested (post 5.2R).
Thank you: dfr, imp, jhb, marcel, peter
Tested by: rwatson, Harald Schmalzbauer <h@schmalzbauer.de>
Approved by: re (rwatson)
alc [Wed, 19 Nov 2003 18:48:45 +0000 (18:48 +0000)]
- Avoid a lock-order reversal between Giant and a system map mutex that
occurs when kmem_malloc() fails to allocate a sufficient number of vm
pages. Specifically, we avoid the lock-order reversal by not grabbing
Giant around pmap_remove() if the map is the kmem_map.
Approved by: re (jhb)
Reported by: Eugene <eugene3@web.de>
peter [Wed, 19 Nov 2003 18:11:27 +0000 (18:11 +0000)]
Sync with i386.
- turn on SMP in generic
- add 'device atpic' - this is unconditional on i386, but certain nvidia
based systems need to disable acpi because the reference bios seems to be
hosed. If acpi is disabled, we won't find the apic. amd64 has the
mptable code in a seperate compile option as well.
- turn sym back on, it doesn't fail to compile anymore.
marcel [Wed, 19 Nov 2003 16:59:00 +0000 (16:59 +0000)]
Force a staticly linked /bin and /sbin for ia64. The necessary changes
to gcc have not been made for ia64, which means that executables still
have /usr/libexec/ld-elf.so.1 as the dynamic linker. This simply does
not work if /usr is a seperate filesystem not mounted when the kernel
tries to execute init(8).
Note that this is a temporary fix until a new gcc has been imported
that does have the required changes.
dds [Wed, 19 Nov 2003 15:51:26 +0000 (15:51 +0000)]
Fix problem where initgroups would silently truncate groups with
more than NGROUP elements without providing the opportunity to
setgroups to fail and correctly return error and set errno.
jhb [Wed, 19 Nov 2003 15:40:23 +0000 (15:40 +0000)]
Add a special check for a stray IRQ 7 or IRQ 15 to see if it is actually
a spurious interrupt from one of the 8259As. If so, don't log it as a
stray IRQ, but just silently ignore it.
jhb [Wed, 19 Nov 2003 15:38:56 +0000 (15:38 +0000)]
- Add counts to the ATPIC interrupt sources and point the ATPIC interrupt
source count pointers at them so that intr_execute_handlers() won't
choke when it tries to handle an unregisterd ATPIC interrupt source.
- Install the low-level ATPIC interrupt handlers when we first program the
ATPIC in atpic_startup() rather than at SI_SUB_INTR. This is only
necessary to work around buggy code that enables interrupts too early
in the boot process (namely, the vm86 code).
imp [Wed, 19 Nov 2003 05:08:27 +0000 (05:08 +0000)]
o Remove @- from the ln and change it to a -sf. This was bogus, and
regocnized as such at the time. Now that the other bogons in the
tree have been fixed, we can remove this ugly kludge.
o Remove stale/bogus opt_foo.h files. These are left over from
by-gone resources. And they point to the need, yet again, to
improve the build system so meta information is only in one place.
kan [Wed, 19 Nov 2003 04:12:32 +0000 (04:12 +0000)]
Do not call VOP_GETATTR in getdents function. It does not serve any
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
archie [Tue, 18 Nov 2003 20:43:23 +0000 (20:43 +0000)]
Lower the maximum ACK timeout for GRE packets from 10 to 1 second.
In practice it seems that in situations of high packet loss the ACK
timeout seems to hit this maximum (perhaps inappropriately, but the
estimation algorithm is not perfect, so apparently it happens). In
any case, 10 seconds is way too high a value so lower to 1 second.
phk [Tue, 18 Nov 2003 18:17:39 +0000 (18:17 +0000)]
Call class->init() an class->fini() while the class is hooked up,
rather than right before and right after. This allows these routines
to manipulate the mesh.
KASSERT that nobody creates a geom on an alien class.
sos [Tue, 18 Nov 2003 15:23:37 +0000 (15:23 +0000)]
Work around the problem that some CDROM drives might return different
TOC's for the same media!! that borks up GEOM.
Although this looks like bad HW the following patch removes the
chance for GEOM panic'ing.
jake [Tue, 18 Nov 2003 14:21:41 +0000 (14:21 +0000)]
Install the user trap handlers that libc provides from a constructor, so
that they will be installed before application constructors are invoked.
Its possible to link applications such that this fails, application code
is invoked before they are installed, but, well, Don't Do That.
rwatson [Tue, 18 Nov 2003 00:39:07 +0000 (00:39 +0000)]
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
mckusick [Tue, 18 Nov 2003 00:36:40 +0000 (00:36 +0000)]
Document that the live dump command (`dump -L') creates its snapshot
in the .snap directory in the root of the filesystem being dumped.
Document that if the .snap directory is missing that it must be
created manually and that it should be owned by user root and
group operator and set to mode 770 before a live dump can be run.
rwatson [Mon, 17 Nov 2003 23:25:16 +0000 (23:25 +0000)]
Clarify UPDATING language: do buildworld before buildkernel, and
do installkernel before installworld, rather than don't make world
before installkernel.
markm [Mon, 17 Nov 2003 23:02:21 +0000 (23:02 +0000)]
Overhaul the entropy device:
o Each source gets its own queue, which is a FIFO, not a ring buffer.
The FIFOs are implemented with the sys/queue.h macros. The separation
is so that a low entropy/high rate source can't swamp the harvester
with low-grade entropy and destroy the reseeds.
o Each FIFO is limited to 256 (set as a macro, so adjustable) events
queueable. Full FIFOs are ignored by the harvester. This is to
prevent memory wastage, and helps to keep the kernel thread CPU
usage within reasonable limits.
o There is no need to break up the event harvesting into ${burst}
sized chunks, so retire that feature.
o Break the device away from its roots with the memory device, and
allow it to get its major number automagically.
rwatson [Mon, 17 Nov 2003 20:20:53 +0000 (20:20 +0000)]
Add a sysctl, security.bsd.see_other_gids, similar in semantics
to see_other_uids but with the logical conversion. This is based
on (but not identical to) the patch submitted by Samy Al Bahra.