imp [Mon, 23 Aug 2004 03:38:21 +0000 (03:38 +0000)]
Make this compile again in the standalone and the MODULES_WITH_WORLD
environments. Chances are good that this doesn't produce a good
module, but I leave the proper defaults to the dummy opt_* files to
the author.
rwatson [Mon, 23 Aug 2004 03:00:27 +0000 (03:00 +0000)]
Remove in6_prefix.[ch] and the contained router renumbering capability.
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files. Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.
This functionality has also been removed from NetBSD and OpenBSD.
Submitted by: George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by: suz, keiichi at kame.net, core at kame.net
kan [Mon, 23 Aug 2004 02:39:45 +0000 (02:39 +0000)]
Temporarily back out r1.74 as it seems to cause a number of regressions
accordimg to numerous reports. It might get reintroduced some time later
when an exact failure mode is understood better.
mux [Sun, 22 Aug 2004 23:01:13 +0000 (23:01 +0000)]
Pass a correct lowaddr to bus_dma_tag_create(), lnc(4) cards can only
deal with 24-bit addresses. While the two other attachments, namely
isa and cbus, do it properly, the PCI attachment was passing
BUS_SPACE_MAXADDR instead of BUS_SPACE_MAXADDR_24BIT. This bug
became apparent with the new contigmalloc() code.
This fixes the problem reported with lnc(4) interfaces inside VMWare,
and should theoritically also fix any user of a PCI lnc(4) card. It
is a RELENG_5 MFC candidate.
marcel [Sun, 22 Aug 2004 20:52:23 +0000 (20:52 +0000)]
Move the cow field between wire_count and hold_count. This is the
position that is 64-bit aligned and makes sure that the valid and
dirty fields are also 64-bit aligned. This means that if PAGE_SIZE
is 32K, the size of the vm_page structure is only increased by 8
bytes instead of 16 bytes. More importantly, the vm_page structure
is either 120 or 128 bytes on ia64. These are "interesting" sizes.
cperciva [Sun, 22 Aug 2004 19:44:24 +0000 (19:44 +0000)]
When creating a new md, wait for geom's event queue to become empty
before returning. Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.
green [Sun, 22 Aug 2004 18:57:40 +0000 (18:57 +0000)]
The new contigmalloc code is exposing a lot of misuses of busdma memory
allocation. Notably, in this case, the driver tries to allocate several
pieces of memory and then fails if the pieces allocated after the first
do not come after it physically, and within a specific range (8MB I
believe). Of course, this could just as easily fail for any number of
reasons, but it almost always fails now that contiguous allocations start
at the end of possible specified memory locations rather than the beginning.
Allocate all the possibly-needed memory up front, even though it's a waste,
to get around this. The least bogus solution would be to take the physical
address from the first allocation and create a new tag that specified that
further allocations must follow it within that 8MB window, then use that
when allocating new channels, but that's left for anyone else that really
feels like doing it.
mlaier [Sun, 22 Aug 2004 16:42:28 +0000 (16:42 +0000)]
Allow early drop for non-ALTQ enabled queues in an ALTQ-enabled kernel.
Previously the early drop was disabled unconditionally for ALTQ-enabled
kernels.
This should give some benefit for the normal gateway + LAN-server case with
a busy LAN leg and an ALTQ managed uplink.
pjd [Sun, 22 Aug 2004 16:21:12 +0000 (16:21 +0000)]
Implementation of 'verify reading' algorithm, which uses parity data for
verification of regular data when device is in complete state.
On verification error, EIO error is returned for the bio and sysctl
kern.geom.raid3.stat.parity_mismatch is increased.
marcel [Sun, 22 Aug 2004 06:24:59 +0000 (06:24 +0000)]
Part 2 of fixing the boot code: gcc 3.4 fixes.
The whole problem seems to be size. Which is odd, because it is said
that size doesn't matter. Anyway... Add -Os to strategic places in the
makefile to have the final loader be as mall as possible. This seems
to be enough to make it work. For now... I think something is more
fundamentally wrong; or something more fundamental is wrong. Potato,
potaato.
kensmith [Sun, 22 Aug 2004 05:34:07 +0000 (05:34 +0000)]
Found another one. Why does mdconfig hate me? Add a "sleep 5" to
this script, without it sparc64 ISO building was consistently failing
because the /dev/md0 device name was not present when the commands
following mdconfig ran. Apparently there is the possibility of a delay
between when mdconfig finishes and the names become visible in /dev.
Yes, we could code this better than an unconditional call to "sleep 5"
but IMHO we should fix the underlying problem instead.
csjp [Sun, 22 Aug 2004 02:03:41 +0000 (02:03 +0000)]
Currently, if the secure level is low enough, system flags can
be manipulated by prison root. In 4.x prison root can not manipulate
system flags, regardless of the security level. This behavior
should remain consistent to avoid any surprises which could lead
to security problems for system administrators which give out
privileged access to jails.
This commit changes suser_cred's flag argument from SUSER_ALLOWJAIL
to 0. This will prevent prison root from being able to manipulate
system flags on files.
rwatson [Sun, 22 Aug 2004 01:32:48 +0000 (01:32 +0000)]
When sliding the m_data pointer forward, update m_pktrhdr.len as well
as m_len, or the pkthdr length will be inconsistent with the actual
length of data in the mbuf chain. The symptom of this occuring was
"out of data" warnings from in_cksum_skip() on large UDP packets sent
via the loopback interface.
marcel [Sun, 22 Aug 2004 00:26:01 +0000 (00:26 +0000)]
Part 1 of fixing the boot code: binutils 2.15 fixes.
The binutils 2.15 assembler now automaticly and non-optionally adds
the .eh_frame section for unwind information. This section appears
to wreck havoc to the final boot code. Fix this by using a special
linker script that discards the .eh_frame sections, but is otherwise
identical to the linker internal script used for -N.
alc [Sun, 22 Aug 2004 00:08:43 +0000 (00:08 +0000)]
In the previous revision, I failed to condition an early release of Giant
in vm_fault() on debug_mpsafevm. If debug_mpsafevm was not set, the result
was an assertion failure early in the boot process.
rwatson [Sat, 21 Aug 2004 21:45:40 +0000 (21:45 +0000)]
If a tunable for the routing socket netisr queue max is defined, allow it
to override the default value, rather than the default value overriding
the tunable.
rwatson [Sat, 21 Aug 2004 21:20:06 +0000 (21:20 +0000)]
Allow the size of the routing socket netisr queue to be configured using
the tunable or sysctl 'net.route.netisr_maxqlen'. Default the maximum
depth to 256 rather than IFQ_MAXLEN due to the downsides of dropping
routing messages.
MT5 candidate.
Discussed with: mdodd, mlaier, Vincent Jardin <jardin at 6wind.com>
trhodes [Sat, 21 Aug 2004 20:19:19 +0000 (20:19 +0000)]
Allow mac_bsdextended(4) to log failed attempts to syslog's AUTHPRIV
facility. This is disabled by default but may be turned on by using
the mac_bsdextended_logging sysctl.
trhodes [Sat, 21 Aug 2004 20:15:08 +0000 (20:15 +0000)]
Give the mac_bsdextended(4) policy the ability to match and apply on a first
rule only in place of all rules match. This is similar to how ipfw(8) works.
Provide a sysctl, mac_bsdextended_firstmatch_enabled, to enable this
feature.
obrien [Sat, 21 Aug 2004 19:44:43 +0000 (19:44 +0000)]
Hit people over the head so they realize run-time errors of the form
/libexec/ld-elf.so.1: Undefined symbol "_ZNSs20_S_empty_rep_storageE"
does mean they are hitting the GCC 3.4 ABI change issue.
alc [Sat, 21 Aug 2004 19:20:21 +0000 (19:20 +0000)]
Further reduce the use of Giant by vm_fault(): Giant is held only when
manipulating a vnode, e.g., calling vput(). This reduces contention for
Giant during many copy-on-write faults, resulting in some additional
speedup on SMPs.
Note: debug_mpsafevm must be enabled for this optimization to take effect.
pjd [Sat, 21 Aug 2004 18:11:46 +0000 (18:11 +0000)]
Implement new reading algorithm, which will use parity component for reading
as well, even if device is in complete state.
I observe 40% of speed-up with this option for random read operations,
but slowdown for sequential reads.
Basically, without this option reading from a RAID3 device built from 5
components (c0-c4) looks like this:
Request no. Used components
1 c0+c1+c2+c3
2 c0+c1+c2+c3
3 c0+c1+c2+c3
csjp [Sat, 21 Aug 2004 17:38:57 +0000 (17:38 +0000)]
When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
rwatson [Sat, 21 Aug 2004 17:38:48 +0000 (17:38 +0000)]
When notifying protocol components of an event on an in6pcb, use the
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd
rwatson [Sat, 21 Aug 2004 16:14:04 +0000 (16:14 +0000)]
When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header. If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer. This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
tjr [Sat, 21 Aug 2004 07:37:08 +0000 (07:37 +0000)]
Re-word compatibility section, taking care to use the word "obsolete" to
describe the 4.4BSD extension of accepting characters (runes) outside of
the range of unsigned char.
tjr [Sat, 21 Aug 2004 07:00:40 +0000 (07:00 +0000)]
Let GCC know that ___runetype(), ___tolower() and ___toupper() are pure
functions, allowing it to generate better code for the <ctype.h> and
<wctype.h> functions. For example, it can now keep _CurrentRuneLocale
in a register across calls to these functions, and can delete calls to
___runetype() if the result is already known or not used.
anholt [Sat, 21 Aug 2004 06:24:21 +0000 (06:24 +0000)]
Fix aperture size detection on some ALi chipsets by only using the lowest 4 bits
to check aperture size, avoiding hangs. Maintain the rest of the bits when
setting/unsetting ATTBASE. This essentially matches Linux's AGP driver as well.
PR: kern/70037
Submitted by: Mark Tinguely <tinguely at casselton dot net>
Obtained from: NetBSD
truckman [Fri, 20 Aug 2004 21:47:48 +0000 (21:47 +0000)]
Don't bother calling the module event handlers from module_shutdown()
in the shutdown_final state if the RB_NOSYNC flag is set.
The specific motivation in this case is that a system panic in an
interrupt context results in a call to module_shutdown(), which
calls g_modevent(), which calls g_malloc(..., M_WAITOK), which
results in a second panic. While g_modevent() could be fixed to
not call malloc() for MOD_SHUTDOWN events (which it doesn't handle
in any case), it is probably also a good idea to entirely skip the
execution of the module shutdown handlers after a panic.
truckman [Fri, 20 Aug 2004 19:21:47 +0000 (19:21 +0000)]
Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.
njl [Fri, 20 Aug 2004 16:52:44 +0000 (16:52 +0000)]
Correctly handle BIOS resources that are duplicated (!). There are many
systems that have overlapping regions specified in their sysresource
objects. This patch fixes ATA DMA and acpi_timer allocation for such
sysctems. It should eventually be moved to resource_list_add() if it is
a valid generalized approach. The minimal approach for 5.3 is:
"Loop through all current resources to see if the new one overlaps
any existing ones. If so, the old one always takes precedence and
the new one is adjusted (or rejected). We check for three cases:
1. Tail of new resource overlaps head of old resource: truncate the
new resource so it is contiguous with the start of the old.
2. New resource wholly contained within the old resource: error.
3. Head of new resource overlaps tail of old resource: truncate the
new resource so it is contiguous, following the old."
njl [Fri, 20 Aug 2004 16:34:30 +0000 (16:34 +0000)]
Remove a check that is too strict. With BIOSen that specify an IO/ctl port
of 0x3f2-0x3f5,0x3f7 the ports are not 7 bytes apart. This should fix
floppy probing on such systems. (We handle the case of adjusting for
a start of 0x3f2 -> 0x3f0 separately, although that code should still be
checked if there are still floppy problems for others.)
Tested by: Sarunas Vancevicius <vsarunas_at_eircom.net>
MFC after: 3 days
rwatson [Fri, 20 Aug 2004 16:24:23 +0000 (16:24 +0000)]
Back out uipc_socket.c:1.208, as it incorrectly assumes that all
sockets are connection-oriented for the purposes of kqueue
registration. Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS). Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.
phk [Fri, 20 Aug 2004 15:14:25 +0000 (15:14 +0000)]
Rewrite of the floppy driver to make it MPsafe & GEOM friendly:
Centralize the fdctl_wr() function by adding the offset in
the resource to the softc structure.
Bugfix: Read the drive-change signal from the correct place:
same place as the ctl register.
Remove the cdevsw{} related code and implement a GEOM class.
Ditch the state-engine and park a thread on each controller
to service the queue.
Make the interrupt FAST & MPSAFE since it is just a simple
wakeup(9) call.
Rely on a per controller mutex to protect the bioqueues.
Grab GEOMs topology lock when we have to and Giant when
ISADMA needs it. Since all access to the hardware is
isolated in the per controller thread, the rest of the
driver is lock & Giant free.
Create a per-drive queue where requests are parked while
the motor spins up. When the motor is running the requests
are purged to the per controller queue. This allows
requests to other drives to be serviced during spin-up.
Only setup the motor-off timeout when we finish the last
request on the queue and cancel it when a new request
arrives. This fixes the bug in the old code where the motor
turned off while we were still retrying a request.
Make the "drive-change" work reliably. Probe the drive on
first opens. Probe with a recal and a seek to cyl=1 to
reset the drive change line and check again to see if we
have a media.
When we see the media disappear we destroy the geom provider,
create a new one, and flag that autodetection should happen
next time we see a media (unless a specific format is configured).
Add sysctl tunables for a lot of drive related parameters.
If you spend a lot of time waiting for floppies you can
grab the i82078 pdf from Intels web-page and try tuning
these.
Add sysctl debug.fdc.debugflags which will enable various
kinds of debugging printfs.
Add central definitions of our well known floppy formats.
Simplify datastructures for autoselection of format and
call the code at the right times.
Bugfix: Remove at least one piece of code which would have
made 2.88M floppies not work.
Use implied seeks on enhanced controllers.
Use multisector transfers on all controllers. Increase
ISADMA bounce buffers accordingly.
Fall back to single sector when retrying. Reset retry count
on every successful transaction.
Sort functions in a more sensible order and generally tidy
up a fair bit here and there.
Assorted related fixes and adjustments in userland utilities.
WORKAROUNDS:
Do allow r/w opens of r/o media but refuse actual write
operations. This is necessary until the p4::phk_bufwork
branch gets integrated (This problem relates to remounting
not reopening devices, see sys/*/*/${fs}_vfsops.c for details).
Keep PC98's private copy of the old floppy driver compiling
and presumably working (see below).
TODO (planned)
Move probing of drives until after interrupts/timeouts work
(like for ATA/SCSI drives).
TODO (unplanned)
This driver should be made to work on PC98 as well.
Test on YE-DATA PCMCIA floppy drive.
Fix 2.88M media.
This is a MT5 candidate (depends on the bioq_takefirst() addition).
pjd [Fri, 20 Aug 2004 12:02:34 +0000 (12:02 +0000)]
Add the raidtest tool, which can be used for performance tests of storage devices.
It uses random offsets, random requests size and random operation type (READ or
WRITE). It also allows to run many processes to send I/O requests in parallel.