Robert Watson [Tue, 22 Oct 2002 14:31:34 +0000 (14:31 +0000)]
Adapt MAC policies for the new user API changes; teach policies how
to parse their own label elements (some cleanup to occur here in the
future to use the newly added kernel strsep()). Policies now
entirely encapsulate their notion of label in the policy module.
John Baldwin [Tue, 22 Oct 2002 14:31:32 +0000 (14:31 +0000)]
- Check that a process isn't a new process (p_state == PRS_NEW) before
trying to acquire it's proc lock since the proc lock may not have been
constructed yet.
- Split up the one big comment at the top of the loop and put the pieces
in the right order above the various checks.
Robert Watson [Tue, 22 Oct 2002 14:29:47 +0000 (14:29 +0000)]
Support the new MAC user API in kernel: modify existing system calls
to use a modified notion of 'struct mac', and flesh out the new variation
system calls (almost identical to existing ones except that they permit
a pid to be specified for process label retrieval, and don't follow
symlinks). This generalizes the label API so that the framework is
now almost entirely policy-agnostic.
Robert Watson [Tue, 22 Oct 2002 14:27:44 +0000 (14:27 +0000)]
Revised APIs for user process label management; the existing APIs relied
on all label parsing occuring in userland, and knowledge of the loaded
policies in the user libraries. This revision of the API pushes that
parsing into the kernel, avoiding the need for shared library support
of policies in userland, permitting statically linked binaries (such
as ls, ps, and ifconfig) to use MAC labels. In these API revisions,
high level parsing of the MAC label is done in the MAC Framework,
and interpretation of label elements is delegated to the MAC policy
modules. This permits modules to export zero or more label elements
to user space if desired, and support them in the manner they want
and with the semantics they want. This is believed to be the final
revision of this interface: from the perspective of user applications,
the API has actually not changed, although the ABI has.
Robert Watson [Tue, 22 Oct 2002 14:22:24 +0000 (14:22 +0000)]
Flesh out prototypes for __mac_get_pid, __mac_get_link, and
__mac_set_link, based on __mac_get_proc() except with a pid,
and __mac_get_file(), __mac_set_file() except that they do
not follow symlinks. First in a series of commits to flesh
out the user API.
Constify some things.
Staticize some things.
Remove some unused things.
Prototype some things.
Don't install a gazillion man-pages links.
Drop support for ON-TRACK disk-manager.
Kirk McKusick [Tue, 22 Oct 2002 01:23:00 +0000 (01:23 +0000)]
This update further fine tunes the locking of snapshot vnodes in
the ffs_copyonwrite routine to avoid a deadlock between the syncer
daemon trying to sync out a snapshot vnode and the bufdaemon
trying to write out a buffer containing the snapshot inode.
With any luck this will be the last snapshot race condition.
Kirk McKusick [Tue, 22 Oct 2002 01:14:25 +0000 (01:14 +0000)]
This update is a performance improvement when allocating blocks on
a full filesystem. Previously, if the allocation failed, we had to
fsync the file before rolling back any partial allocation of indirect
blocks. Most block allocation requests only need to allocate a single
data block and if that allocation fails, there is nothing to unroll.
So, before doing the fsync, we check to see if any rollback will
really be necessary. If none is necessary, then we simply return.
This update eliminates the flurry of disk activity that got triggered
whenever a filesystem would run out of space.
Kirk McKusick [Tue, 22 Oct 2002 01:06:44 +0000 (01:06 +0000)]
This update removes a race between unmount and lookup. The lookup
locks the mount point directory while waiting for vfs_busy to clear.
Meanwhile the unmount which holds the vfs_busy lock tried to lock
the mount point vnode. The fix is to observe that it is safe for the
unmount to remove the vnode from the mount point without locking it.
The lookup will wait for the unmount to complete, then recheck the
mount point when the vfs_busy lock clears.
Kirk McKusick [Tue, 22 Oct 2002 00:59:49 +0000 (00:59 +0000)]
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
Semen Ustimenko [Tue, 22 Oct 2002 00:57:51 +0000 (00:57 +0000)]
Remove the OpenBSD comatibility stuff. Many changes to be more style(9)
compilant. Split two pieces if code into separate functions to do not
exceed line length due to indentation.
Robert Watson [Mon, 21 Oct 2002 23:51:18 +0000 (23:51 +0000)]
Add mac(9), a man page providing a basic introduction to the concepts
associated with the TrustedBSD MAC Framework, as well as some credits
to developers and contributors.
Julian Elischer [Mon, 21 Oct 2002 22:27:36 +0000 (22:27 +0000)]
Remove the process state PRS_WAIT.
It is never used. I left it there from pre-KSE days as I didn't know
if I'd need it or not but now I know I don't.. It's functionality
is in TDI_IWAIT in the thread.
Robert Watson [Mon, 21 Oct 2002 20:55:39 +0000 (20:55 +0000)]
Introduce mac_biba_copy() and mac_mls_copy(), which conditionally
copy elements of one Biba or MLS label to another based on the flags
on the source label element. Use this instead of
mac_{biba,mls}_{single,range}() to simplify the existing code, as
well as support partial label updates (we don't update if none is
requested).
Ian Dowse [Mon, 21 Oct 2002 20:40:02 +0000 (20:40 +0000)]
Implement a new IP_SENDSRCADDR ancillary message type that permits
a server process bound to a wildcard UDP socket to select the IP
address from which outgoing packets are sent on a per-datagram
basis. When combined with IP_RECVDSTADDR, such a server process can
guarantee to reply to an incoming request using the same source IP
address as the destination IP address of the request, without having
to open one socket per server IP address.
Ian Dowse [Mon, 21 Oct 2002 20:10:05 +0000 (20:10 +0000)]
Remove the "temporary connection" hack in udp_output(). In order
to send datagrams from an unconnected socket, we used to first block
input, then connect the socket to the sendmsg/sendto destination,
send the datagram, and finally disconnect the socket and unblock
input.
We now use in_pcbconnect_setup() to check if a connect() would have
succeeded, but we never record the connection in the PCB (local
anonymous port allocation is still recorded, though). The result
from in_pcbconnect_setup() authorises the sending of the datagram
and selects the local address and port to use, so we just construct
the header and call ip_output().
GEOM does not (and shall not) propagate flags like D_MEMDISK, so we will
revert to checking the name to determine if our root device is a ramdisk,
md(4) specifically to determine if we should attempt the root-mount RW
Robert Watson [Mon, 21 Oct 2002 18:42:01 +0000 (18:42 +0000)]
Add compartment support to Biba and MLS policies. The logic of the
policies remains the same: subjects and objects are labeled for
integrity or sensitivity, and a dominance operator determines whether
or not subject/object accesses are permitted to limit inappropriate
information flow. Compartments are a non-hierarchal component to
the label, so add a bitfield to the label element for each, and a
set check as part of the dominance operator. This permits the
implementation of "need to know" elements of MLS.
Robert Watson [Mon, 21 Oct 2002 18:05:12 +0000 (18:05 +0000)]
Demote sockets to single-label objects rather than maintaining a
range on them, leaving process credentials as the only kernel
objects with label ranges in the Biba and MLS policies. We
weren't using the range in any access control decisions, so this
lets us garbage collect effectively unused code.
Robert Watson [Mon, 21 Oct 2002 16:39:12 +0000 (16:39 +0000)]
Since the Biba and MLS access checks are identical to the open checks,
collapse the two cases more cleanly: rather than wrapping an access
check around open, simply provide the open implementation for the
access vector entry. No functional change.
Robert Watson [Mon, 21 Oct 2002 16:35:54 +0000 (16:35 +0000)]
Cleanup of relabel authorization checks -- almost identical logic,
we just break out some of the tests better. Minor change in that
we now better support incremental update of labels.
Ian Dowse [Mon, 21 Oct 2002 13:55:50 +0000 (13:55 +0000)]
Replace in_pcbladdr() with a more generic inner subroutine for
in_pcbconnect() called in_pcbconnect_setup(). This version performs
all of the functions of in_pcbconnect() except for the final
committing of changes to the PCB. In the case of an EADDRINUSE error
it can also provide to the caller the PCB of the duplicate connection,
avoiding an extra in_pcblookup_hash() lookup in tcp_connect().
This change will allow the "temporary connect" hack in udp_output()
to be removed and is part of the preparation for adding the
IP_SENDSRCADDR control message.
Andrew Gallatin [Mon, 21 Oct 2002 12:54:13 +0000 (12:54 +0000)]
Add some documentation of FreeBSD's special synchronization quirks
which may surprise developers coming from Solaris, or other platforms
which have a similar interface, but slightly different rules.
Murray Stokely [Mon, 21 Oct 2002 10:53:35 +0000 (10:53 +0000)]
Update comment to note that the third floppy (for modules) has been
implemented. Add a note reminding developers to update drivers.conf.5
if they add new functionality here.
Murray Stokely [Mon, 21 Oct 2002 10:48:19 +0000 (10:48 +0000)]
Note that support for the third 'drivers floppy' has been implemented.
Also point to the AWK scripts instead of the older Perl ones, now that
they've been rewritten.
Marcel Moolenaar [Mon, 21 Oct 2002 04:21:12 +0000 (04:21 +0000)]
Implement working on ELF corefiles. Use kvm_read() when reading
memory while mapping a virtual address to a physical address.
This allows us to work with virtual addresses for page tables,
provided it doesn't cause infinite recursion. Currently all
page tables are direct mapped.
Robert Watson [Mon, 21 Oct 2002 04:15:40 +0000 (04:15 +0000)]
Add a twiddle to create PTY's with a biba/equal or mls/equal label
instead of the default biba/high, mls/low, making it easier to use
ptys with these policies. This isn't the final solution, but does
help.
Robert Watson [Mon, 21 Oct 2002 03:54:24 +0000 (03:54 +0000)]
Unhook the per-policy parsing/printing MAC modules in libc to prepare
to bring in the new MAC label management API. With the new API
revision, we have only policy-agnostic code in libc and the base
kernel.
Brooks Davis [Mon, 21 Oct 2002 02:54:50 +0000 (02:54 +0000)]
Use if_printf(ifp, "blah") and device_printf(dev, "blah") instead of
printf("%s%d: blah", ifp->if_name, ifp->if_xname). This eliminates the
need to store the unit number in the softc.
David E. O'Brien [Mon, 21 Oct 2002 00:26:48 +0000 (00:26 +0000)]
Unbreak Alpha world.
We are seeing "/usr/libexec/ld-elf.so.1: groff: too few PT_LOAD segments",
however it appears that there really is only one PT_LOAD segment in the groff
binary. It is unclear if `rtld' or `ld' is at fault here -- but using an
RELENG_4 `ld' binary allows one to build a working dynamic groff binary.
Marcel Moolenaar [Sun, 20 Oct 2002 23:39:43 +0000 (23:39 +0000)]
In cb_dumphdr() we were calling buf_write() with di->priv as the
pointer to a dumperinfo instead of di. A brainfart, surely. This
bug went unnoticed for all this time because the pointer is only
used by buf_write() when it can write a completely filled buffer
to the dump device. This depends on the number of memory chunks
that needs to be dumped. This has apparently been low enough that
it has never happened up until this point.
Thomas Moestl [Sun, 20 Oct 2002 23:13:05 +0000 (23:13 +0000)]
Fix the calculations of the length of the unread message buffer
contents. The code was subtracting two unsigned ints, stored the
result in a log and expected it to be the same as of a signed
subtraction; this does only work on platforms where int and long
have the same size (due to overflows).
Instead, cast to long before the subtraction; the numbers are
guaranteed to be small enough so that there will be no overflows
because of that.
Fix two instances of variant struct definitions in sys/netinet:
Remove the never completed _IP_VHL version, it has not caught on
anywhere and it would make us incompatible with other BSD netstacks
to retain this version.
Add a CTASSERT protecting sizeof(struct ip) == 20.
Don't let the size of struct ipq depend on the IPDIVERT option.