mjacob [Mon, 23 Aug 2004 19:04:19 +0000 (19:04 +0000)]
Until I can get a clearer architecture from PHK about why he wants
the geometry code to grab a mutex that prohibits any driver on the
stack below it from sleeping, it's not safe to allow anything in
the top half of isp to sleep (excepting the thread that Fibre Channel
instances use to re-scan loops/fabrics).
imp [Mon, 23 Aug 2004 18:51:36 +0000 (18:51 +0000)]
Add a blanket note about 5.x being the same as 6.0 and vice versa for
the time being. Also add a note that says we are going to remove the
band-aides for 4.early -> 6.0 after 5.3-RELEASE so people get used to
the idea, even though it has been planned since before 5.0 was
released.
le [Mon, 23 Aug 2004 17:50:18 +0000 (17:50 +0000)]
Compare the addresses of two RAID5 work packets directly instead
of the addresses of their related bios when locking one out, since
they could share a bio and this could lead to parity corruption.
njl [Mon, 23 Aug 2004 16:28:42 +0000 (16:28 +0000)]
Rework sysresource management. Instead of having each sysresource object
hold its own values, pass them up to the parent (acpi0) and merge/uniq them
on the way. After the namespace evaluation, acpi will reserve these
resources and manage them via rman before bus_generic_probe() and
bus_generic_attach(). This is necessary because some systems specify
conflicting resources in separate sysresource objects. It's also cleaner
in that the interface between sysresource and acpi is now merely the parent's
resource list. This code handles the following cases:
1. Unique resource: add it to the parent via bus_set_resource().
2. New wholly contained in old: discard new.
3. New tail overlaps old head: grow old head downward.
AND/OR
4. New head overlaps old tail: grow old tail upward.
obrien [Mon, 23 Aug 2004 16:25:07 +0000 (16:25 +0000)]
Forced commit to document:
Doug Rabson <dfr@nlsystems.com>
Message-Id: <200408220940.18504.dfr@nlsystems.com>
Size does matter for the alpha loader. The firmware gives it 256k
of address space which we overflowed many years ago. I extended it
in sys/boot/alpha/common/main.c:extend_heap() by adding 512k to the
loader's mapped address space.
sobomax [Mon, 23 Aug 2004 15:55:03 +0000 (15:55 +0000)]
My recent measurement shows that CPU_DISABLE_CMPXCHG is no longer necessary
with VmWare 4.x. At least with VmWare version 4.5.2, i386 version of
atomic_cmpset_int() is about 30 times slower than non-i386 version. It
makes this delta a good 5.3 MFC candidate, since otherwise it will
mislead users who run FreeBSD under modern VmWare otherwise.
des [Mon, 23 Aug 2004 12:41:29 +0000 (12:41 +0000)]
Don't try to translate the control message unless we're certain it's
valid; otherwise a caller could trick us into changing any 32-bit word
in kernel memory to LINUX_SOL_SOCKET (0x00000001) if its previous value
is SOL_SOCKET (0x0000ffff).
imp [Mon, 23 Aug 2004 03:38:21 +0000 (03:38 +0000)]
Make this compile again in the standalone and the MODULES_WITH_WORLD
environments. Chances are good that this doesn't produce a good
module, but I leave the proper defaults to the dummy opt_* files to
the author.
rwatson [Mon, 23 Aug 2004 03:00:27 +0000 (03:00 +0000)]
Remove in6_prefix.[ch] and the contained router renumbering capability.
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files. Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.
This functionality has also been removed from NetBSD and OpenBSD.
Submitted by: George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by: suz, keiichi at kame.net, core at kame.net
kan [Mon, 23 Aug 2004 02:39:45 +0000 (02:39 +0000)]
Temporarily back out r1.74 as it seems to cause a number of regressions
accordimg to numerous reports. It might get reintroduced some time later
when an exact failure mode is understood better.
mux [Sun, 22 Aug 2004 23:01:13 +0000 (23:01 +0000)]
Pass a correct lowaddr to bus_dma_tag_create(), lnc(4) cards can only
deal with 24-bit addresses. While the two other attachments, namely
isa and cbus, do it properly, the PCI attachment was passing
BUS_SPACE_MAXADDR instead of BUS_SPACE_MAXADDR_24BIT. This bug
became apparent with the new contigmalloc() code.
This fixes the problem reported with lnc(4) interfaces inside VMWare,
and should theoritically also fix any user of a PCI lnc(4) card. It
is a RELENG_5 MFC candidate.
marcel [Sun, 22 Aug 2004 20:52:23 +0000 (20:52 +0000)]
Move the cow field between wire_count and hold_count. This is the
position that is 64-bit aligned and makes sure that the valid and
dirty fields are also 64-bit aligned. This means that if PAGE_SIZE
is 32K, the size of the vm_page structure is only increased by 8
bytes instead of 16 bytes. More importantly, the vm_page structure
is either 120 or 128 bytes on ia64. These are "interesting" sizes.
cperciva [Sun, 22 Aug 2004 19:44:24 +0000 (19:44 +0000)]
When creating a new md, wait for geom's event queue to become empty
before returning. Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.
green [Sun, 22 Aug 2004 18:57:40 +0000 (18:57 +0000)]
The new contigmalloc code is exposing a lot of misuses of busdma memory
allocation. Notably, in this case, the driver tries to allocate several
pieces of memory and then fails if the pieces allocated after the first
do not come after it physically, and within a specific range (8MB I
believe). Of course, this could just as easily fail for any number of
reasons, but it almost always fails now that contiguous allocations start
at the end of possible specified memory locations rather than the beginning.
Allocate all the possibly-needed memory up front, even though it's a waste,
to get around this. The least bogus solution would be to take the physical
address from the first allocation and create a new tag that specified that
further allocations must follow it within that 8MB window, then use that
when allocating new channels, but that's left for anyone else that really
feels like doing it.
mlaier [Sun, 22 Aug 2004 16:42:28 +0000 (16:42 +0000)]
Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.
Previously the early drop was disabled unconditionally for ALTQ-enabled
kernels.
This should give some benefit for the normal gateway + LAN-server case with
a busy LAN leg and an ALTQ managed uplink.
pjd [Sun, 22 Aug 2004 16:21:12 +0000 (16:21 +0000)]
Implementation of 'verify reading' algorithm, which uses parity data for
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.
marcel [Sun, 22 Aug 2004 06:24:59 +0000 (06:24 +0000)]
Part 2 of fixing the boot code: gcc 3.4 fixes.
The whole problem seems to be size. Which is odd, because it is said
that size doesn't matter. Anyway... Add -Os to strategic places in the
makefile to have the final loader be as mall as possible. This seems
to be enough to make it work. For now... I think something is more
fundamentally wrong; or something more fundamental is wrong. Potato,
potaato.
kensmith [Sun, 22 Aug 2004 05:34:07 +0000 (05:34 +0000)]
Found another one. Why does mdconfig hate me? Add a "sleep 5" to
this script, without it sparc64 ISO building was consistently failing
because the /dev/md0 device name was not present when the commands
following mdconfig ran. Apparently there is the possibility of a delay
between when mdconfig finishes and the names become visible in /dev.
Yes, we could code this better than an unconditional call to "sleep 5"
but IMHO we should fix the underlying problem instead.
csjp [Sun, 22 Aug 2004 02:03:41 +0000 (02:03 +0000)]
Currently, if the secure level is low enough, system flags can
be manipulated by prison root. In 4.x prison root can not manipulate
system flags, regardless of the security level. This behavior
should remain consistent to avoid any surprises which could lead
to security problems for system administrators which give out
privileged access to jails.
This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL
to 0. This will prevent prison root from being able to manipulate
system flags on files.
rwatson [Sun, 22 Aug 2004 01:32:48 +0000 (01:32 +0000)]
When sliding the m_data pointer forward, update m_pktrhdr.len as well
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain. The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.
marcel [Sun, 22 Aug 2004 00:26:01 +0000 (00:26 +0000)]
Part 1 of fixing the boot code: binutils 2.15 fixes.
The binutils 2.15 assembler now automaticly and non-optionally adds
the .eh_frame section for unwind information. This section appears
to wreck havoc to the final boot code. Fix this by using a special
linker script that discards the .eh_frame sections, but is otherwise
identical to the linker internal script used for -N.
alc [Sun, 22 Aug 2004 00:08:43 +0000 (00:08 +0000)]
In the previous revision, I failed to condition an early release of Giant
in vm_fault() on debug_mpsafevm. If debug_mpsafevm was not set, the result
was an assertion failure early in the boot process.
rwatson [Sat, 21 Aug 2004 21:45:40 +0000 (21:45 +0000)]
If a tunable for the routing socket netisr queue max is defined, allow it
to override the default value, rather than the default value overriding
the tunable.
rwatson [Sat, 21 Aug 2004 21:20:06 +0000 (21:20 +0000)]
Allow the size of the routing socket netisr queue to be configured using
the tunable or sysctl 'net.route.netisr_maxqlen'. Default the maximum
depth to 256 rather than IFQ_MAXLEN due to the downsides of dropping
routing messages.
MT5 candidate.
Discussed with: mdodd, mlaier, Vincent Jardin <jardin at 6wind.com>
trhodes [Sat, 21 Aug 2004 20:19:19 +0000 (20:19 +0000)]
Allow mac_bsdextended(4) to log failed attempts to syslog's AUTHPRIV
facility. This is disabled by default but may be turned on by using
the mac_bsdextended_logging sysctl.
trhodes [Sat, 21 Aug 2004 20:15:08 +0000 (20:15 +0000)]
Give the mac_bsdextended(4) policy the ability to match and apply on a first
rule only in place of all rules match. This is similar to how ipfw(8) works.
Provide a sysctl, mac_bsdextended_firstmatch_enabled, to enable this
feature.
obrien [Sat, 21 Aug 2004 19:44:43 +0000 (19:44 +0000)]
Hit people over the head so they realize run-time errors of the form
/libexec/ld-elf.so.1: Undefined symbol "_ZNSs20_S_empty_rep_storageE"
does mean they are hitting the GCC 3.4 ABI change issue.
alc [Sat, 21 Aug 2004 19:20:21 +0000 (19:20 +0000)]
Further reduce the use of Giant by vm_fault(): Giant is held only when
manipulating a vnode, e.g., calling vput(). This reduces contention for
Giant during many copy-on-write faults, resulting in some additional
speedup on SMPs.
Note: debug_mpsafevm must be enabled for this optimization to take effect.
pjd [Sat, 21 Aug 2004 18:11:46 +0000 (18:11 +0000)]
Implement new reading algorithm, which will use parity component for reading
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:
Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3
csjp [Sat, 21 Aug 2004 17:38:57 +0000 (17:38 +0000)]
When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
rwatson [Sat, 21 Aug 2004 17:38:48 +0000 (17:38 +0000)]
When notifying protocol components of an event on an in6pcb, use the
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd
rwatson [Sat, 21 Aug 2004 16:14:04 +0000 (16:14 +0000)]
When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header. If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer. This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
tjr [Sat, 21 Aug 2004 07:37:08 +0000 (07:37 +0000)]
Re-word compatibility section, taking care to use the word "obsolete" to
describe the 4.4BSD extension of accepting characters (runes) outside of
the range of unsigned char.
tjr [Sat, 21 Aug 2004 07:00:40 +0000 (07:00 +0000)]
Let GCC know that ___runetype(), ___tolower() and ___toupper() are pure
functions, allowing it to generate better code for the <ctype.h> and
<wctype.h> functions. For example, it can now keep _CurrentRuneLocale
in a register across calls to these functions, and can delete calls to
___runetype() if the result is already known or not used.
anholt [Sat, 21 Aug 2004 06:24:21 +0000 (06:24 +0000)]
Fix aperture size detection on some ALi chipsets by only using the lowest 4 bits
to check aperture size, avoiding hangs. Maintain the rest of the bits when
setting/unsetting ATTBASE. This essentially matches Linux's AGP driver as well.
PR: kern/70037
Submitted by: Mark Tinguely <tinguely at casselton dot net>
Obtained from: NetBSD