csjp [Sat, 21 Aug 2004 17:38:57 +0000 (17:38 +0000)]
When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.
This commit makes the following changes:
- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.
Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.
Looking forward, we will probably want to eliminate the
references to curthread.
rwatson [Sat, 21 Aug 2004 17:38:48 +0000 (17:38 +0000)]
When notifying protocol components of an event on an in6pcb, use the
result of the notify() function to decide if we need to unlock the
in6pcb or not, rather than always unlocking. Otherwise, we may unlock
and already unlocked in6pcb.
Reported by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Tested by: kuriyama, Gordon Bergling <gbergling at 0xfce3.net>
Discussed with: mdodd
rwatson [Sat, 21 Aug 2004 16:14:04 +0000 (16:14 +0000)]
When prepending space onto outgoing UDP datagram payloads to hold the
UDP/IP header, make sure that space is also allocated for the link
layer header. If an mbuf must be allocated to hold the UDP/IP header
(very likely), then this will avoid an additional mbuf allocation at
the link layer. This trick is also used by TCP and other protocols to
avoid extra calls to the mbuf allocator in the ethernet (and related)
output routines.
tjr [Sat, 21 Aug 2004 07:37:08 +0000 (07:37 +0000)]
Re-word compatibility section, taking care to use the word "obsolete" to
describe the 4.4BSD extension of accepting characters (runes) outside of
the range of unsigned char.
tjr [Sat, 21 Aug 2004 07:00:40 +0000 (07:00 +0000)]
Let GCC know that ___runetype(), ___tolower() and ___toupper() are pure
functions, allowing it to generate better code for the <ctype.h> and
<wctype.h> functions. For example, it can now keep _CurrentRuneLocale
in a register across calls to these functions, and can delete calls to
___runetype() if the result is already known or not used.
anholt [Sat, 21 Aug 2004 06:24:21 +0000 (06:24 +0000)]
Fix aperture size detection on some ALi chipsets by only using the lowest 4 bits
to check aperture size, avoiding hangs. Maintain the rest of the bits when
setting/unsetting ATTBASE. This essentially matches Linux's AGP driver as well.
PR: kern/70037
Submitted by: Mark Tinguely <tinguely at casselton dot net>
Obtained from: NetBSD
truckman [Fri, 20 Aug 2004 21:47:48 +0000 (21:47 +0000)]
Don't bother calling the module event handlers from module_shutdown()
in the shutdown_final state if the RB_NOSYNC flag is set.
The specific motivation in this case is that a system panic in an
interrupt context results in a call to module_shutdown(), which
calls g_modevent(), which calls g_malloc(..., M_WAITOK), which
results in a second panic. While g_modevent() could be fixed to
not call malloc() for MOD_SHUTDOWN events (which it doesn't handle
in any case), it is probably also a good idea to entirely skip the
execution of the module shutdown handlers after a panic.
truckman [Fri, 20 Aug 2004 19:21:47 +0000 (19:21 +0000)]
Don't attempt to trigger the syncer thread final sync code in the
shutdown_pre_sync state if the RB_NOSYNC flag is set. This is the
likely cause of hangs after a system panic that are keeping crash
dumps from being done.
njl [Fri, 20 Aug 2004 16:52:44 +0000 (16:52 +0000)]
Correctly handle BIOS resources that are duplicated (!). There are many
systems that have overlapping regions specified in their sysresource
objects. This patch fixes ATA DMA and acpi_timer allocation for such
sysctems. It should eventually be moved to resource_list_add() if it is
a valid generalized approach. The minimal approach for 5.3 is:
"Loop through all current resources to see if the new one overlaps
any existing ones. If so, the old one always takes precedence and
the new one is adjusted (or rejected). We check for three cases:
1. Tail of new resource overlaps head of old resource: truncate the
new resource so it is contiguous with the start of the old.
2. New resource wholly contained within the old resource: error.
3. Head of new resource overlaps tail of old resource: truncate the
new resource so it is contiguous, following the old."
njl [Fri, 20 Aug 2004 16:34:30 +0000 (16:34 +0000)]
Remove a check that is too strict. With BIOSen that specify an IO/ctl port
of 0x3f2-0x3f5,0x3f7 the ports are not 7 bytes apart. This should fix
floppy probing on such systems. (We handle the case of adjusting for
a start of 0x3f2 -> 0x3f0 separately, although that code should still be
checked if there are still floppy problems for others.)
Tested by: Sarunas Vancevicius <vsarunas_at_eircom.net>
MFC after: 3 days
rwatson [Fri, 20 Aug 2004 16:24:23 +0000 (16:24 +0000)]
Back out uipc_socket.c:1.208, as it incorrectly assumes that all
sockets are connection-oriented for the purposes of kqueue
registration. Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS). Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.
phk [Fri, 20 Aug 2004 15:14:25 +0000 (15:14 +0000)]
Rewrite of the floppy driver to make it MPsafe & GEOM friendly:
Centralize the fdctl_wr() function by adding the offset in
the resource to the softc structure.
Bugfix: Read the drive-change signal from the correct place:
same place as the ctl register.
Remove the cdevsw{} related code and implement a GEOM class.
Ditch the state-engine and park a thread on each controller
to service the queue.
Make the interrupt FAST & MPSAFE since it is just a simple
wakeup(9) call.
Rely on a per controller mutex to protect the bioqueues.
Grab GEOMs topology lock when we have to and Giant when
ISADMA needs it. Since all access to the hardware is
isolated in the per controller thread, the rest of the
driver is lock & Giant free.
Create a per-drive queue where requests are parked while
the motor spins up. When the motor is running the requests
are purged to the per controller queue. This allows
requests to other drives to be serviced during spin-up.
Only setup the motor-off timeout when we finish the last
request on the queue and cancel it when a new request
arrives. This fixes the bug in the old code where the motor
turned off while we were still retrying a request.
Make the "drive-change" work reliably. Probe the drive on
first opens. Probe with a recal and a seek to cyl=1 to
reset the drive change line and check again to see if we
have a media.
When we see the media disappear we destroy the geom provider,
create a new one, and flag that autodetection should happen
next time we see a media (unless a specific format is configured).
Add sysctl tunables for a lot of drive related parameters.
If you spend a lot of time waiting for floppies you can
grab the i82078 pdf from Intels web-page and try tuning
these.
Add sysctl debug.fdc.debugflags which will enable various
kinds of debugging printfs.
Add central definitions of our well known floppy formats.
Simplify datastructures for autoselection of format and
call the code at the right times.
Bugfix: Remove at least one piece of code which would have
made 2.88M floppies not work.
Use implied seeks on enhanced controllers.
Use multisector transfers on all controllers. Increase
ISADMA bounce buffers accordingly.
Fall back to single sector when retrying. Reset retry count
on every successful transaction.
Sort functions in a more sensible order and generally tidy
up a fair bit here and there.
Assorted related fixes and adjustments in userland utilities.
WORKAROUNDS:
Do allow r/w opens of r/o media but refuse actual write
operations. This is necessary until the p4::phk_bufwork
branch gets integrated (This problem relates to remounting
not reopening devices, see sys/*/*/${fs}_vfsops.c for details).
Keep PC98's private copy of the old floppy driver compiling
and presumably working (see below).
TODO (planned)
Move probing of drives until after interrupts/timeouts work
(like for ATA/SCSI drives).
TODO (unplanned)
This driver should be made to work on PC98 as well.
Test on YE-DATA PCMCIA floppy drive.
Fix 2.88M media.
This is a MT5 candidate (depends on the bioq_takefirst() addition).
pjd [Fri, 20 Aug 2004 12:02:34 +0000 (12:02 +0000)]
Add the raidtest tool, which can be used for performance tests of storage devices.
It uses random offsets, random requests size and random operation type (READ or
WRITE). It also allows to run many processes to send I/O requests in parallel.
scottl [Fri, 20 Aug 2004 05:18:50 +0000 (05:18 +0000)]
In maybe_preempt(), ignore threads that are in an inconsistent state. This
is an effective band-aid for at least some of the scheduler corruption seen
recently. The real fix will involve protecting threads while they are
inconsistent, and will come later.
julian [Fri, 20 Aug 2004 01:24:23 +0000 (01:24 +0000)]
Align netgraph message fields ready for 64-bit (and 128 bit :-) machines.
requires a recompile of netgraph users.
Also change the size of a field in the bluetooth code
that was waiting for the next change that needed recompiles so
it could piggyback its way in.
davidxu [Thu, 19 Aug 2004 23:49:04 +0000 (23:49 +0000)]
Adjust code to support AMD64, on AMD64, thread needs to set fsbase by
itself before it can execute any other code, so new thread should be
created with all signals are masked until after fsbase is set.
andre [Thu, 19 Aug 2004 23:31:40 +0000 (23:31 +0000)]
When unloading ipfw module use callout_drain() to make absolutely sure that
all callouts are stopped and finished. Move it before IPFW_LOCK() to avoid
deadlocking when draining callouts.
simon [Thu, 19 Aug 2004 22:03:20 +0000 (22:03 +0000)]
- Remove note about device listings going away.
- Add a note about possibility for duplicate listing of devices.
- Auto generate device listings for the following drivers: ncr, sym,
umodem, and uscanner
kensmith [Thu, 19 Aug 2004 20:13:31 +0000 (20:13 +0000)]
Temporary bandaid to help sparc64 systems with ATA disks boot. Recent
changes to the ATA driver cause a kernel crash, no fault of the ATA
code. Work is in progress to add the necessary feature to the sparc64
kernel and this commit will be backed out when it is complete. This
bandaid is being put in mostly in the interests of getting the first
release snapshot done and out the door.
Tested on: Ultra-10 exhibiting the insta-panic.
MFC: Real Soon
njl [Thu, 19 Aug 2004 18:48:17 +0000 (18:48 +0000)]
Disable interrupts after using pmap_enter() to add the identity mapping.
Since pmap_enter() calls pmap_invalidate_page(), which needs interrupts
enabled in the SMP case, we defer the disable to right before saving the
register context. This has been incorrect for about a year but caused no
real problems because the identity page never actually replaces a previously
mapped page and suspend/resume on SMP systems has been uncommon.
andre [Thu, 19 Aug 2004 17:59:26 +0000 (17:59 +0000)]
Do not unconditionally ignore IPDIVERT and IPFIREWALL_FORWARD when building
the ipfw KLD.
For IPFIREWALL_FORWARD this does not have any side effects. If the module
has it but not the kernel it just doesn't do anything.
For IPDIVERT the KLD will be unloadable if the kernel doesn't have IPDIVERT
compiled in too. However this is the least disturbing behaviour. The user
can just recompile either module or the kernel to match the other one. The
access to the machine is not denied if ipfw refuses to load.
andre [Thu, 19 Aug 2004 17:38:47 +0000 (17:38 +0000)]
Bring back the sysctl 'net.inet.ip.fw.enable' to unbreak the startup scripts
and to be able to disable ipfw if it was compiled directly into the kernel.
jhb [Thu, 19 Aug 2004 12:46:02 +0000 (12:46 +0000)]
Catch up to recent API changes including the removal of the signal_caught
argument to sleepq_timedwait() and the enhancements to the flags argument
passed to sleepq_add().
jhb [Thu, 19 Aug 2004 11:31:42 +0000 (11:31 +0000)]
Now that the return value semantics of cv's for multithreaded processes
have been unified with that of msleep(9), further refine the sleepq
interface and consolidate some duplicated code:
- Move the pre-sleep checks for theaded processes into a
thread_sleep_check() function in kern_thread.c.
- Move all handling of TDF_SINTR to be internal to subr_sleepqueue.c.
Specifically, if a thread is awakened by something other than a signal
while checking for signals before going to sleep, clear TDF_SINTR in
sleepq_catch_signals(). This removes a sched_lock lock/unlock combo in
that edge case during an interruptible sleep. Also, fix
sleepq_check_signals() to properly handle the condition if TDF_SINTR is
clear rather than requiring the callers of the sleepq API to notice
this edge case and call a non-_sig variant of sleepq_wait().
- Clarify the flags arguments to sleepq_add(), sleepq_signal() and
sleepq_broadcast() by creating an explicit submask for sleepq types.
Also, add an explicit SLEEPQ_MSLEEP type rather than a magic number of
0. Also, add a SLEEPQ_INTERRUPTIBLE flag for use with sleepq_add() and
move the setting of TDF_SINTR to sleepq_add() if this flag is set rather
than sleepq_catch_signals(). Note that it is the caller's responsibility
to ensure that sleepq_catch_signals() is called if and only if this flag
is passed to the preceeding sleepq_add(). Note that this also removes a
sched_lock lock/unlock pair from sleepq_catch_signals(). It also ensures
that for an interruptible sleep, TDF_SINTR is always set when
TD_ON_SLEEPQ() is true.
jhb [Thu, 19 Aug 2004 11:09:13 +0000 (11:09 +0000)]
Generalize the UFS bad magic value used to determine when a filesystem
has only been partly initialized via newfs(8) so that it applies to both
UFS1 and UFS2.
Submitted by: "Xin LI" delphij at frontfree dot net
MFC: maybe?
jmg [Thu, 19 Aug 2004 06:38:26 +0000 (06:38 +0000)]
add options MPROF_BUFFERS and MPROF_HASH_SIZE that adjust the sizes of
the mutex profiling buffers. Document them in the man page and in NOTES.
Ensure _HASH_SIZE is larger than _BUFFERS with a cpp error.