Alexander Kabaev [Wed, 19 Nov 2003 04:12:32 +0000 (04:12 +0000)]
Do not call VOP_GETATTR in getdents function. It does not serve any
purpose and the resulting vattr structure was ignored. In addition,
the VOP_GETATTR call was made with no vnode lock held, resulting in
vnode locking violation panic with debug kernels.
Archie Cobbs [Tue, 18 Nov 2003 20:43:23 +0000 (20:43 +0000)]
Lower the maximum ACK timeout for GRE packets from 10 to 1 second.
In practice it seems that in situations of high packet loss the ACK
timeout seems to hit this maximum (perhaps inappropriately, but the
estimation algorithm is not perfect, so apparently it happens). In
any case, 10 seconds is way too high a value so lower to 1 second.
Call class->init() an class->fini() while the class is hooked up,
rather than right before and right after. This allows these routines
to manipulate the mesh.
KASSERT that nobody creates a geom on an alien class.
Søren Schmidt [Tue, 18 Nov 2003 15:23:37 +0000 (15:23 +0000)]
Work around the problem that some CDROM drives might return different
TOC's for the same media!! that borks up GEOM.
Although this looks like bad HW the following patch removes the
chance for GEOM panic'ing.
Jake Burkholder [Tue, 18 Nov 2003 14:21:41 +0000 (14:21 +0000)]
Install the user trap handlers that libc provides from a constructor, so
that they will be installed before application constructors are invoked.
Its possible to link applications such that this fails, application code
is invoked before they are installed, but, well, Don't Do That.
Robert Watson [Tue, 18 Nov 2003 00:39:07 +0000 (00:39 +0000)]
Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.
This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.
For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.
Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.
Kirk McKusick [Tue, 18 Nov 2003 00:36:40 +0000 (00:36 +0000)]
Document that the live dump command (`dump -L') creates its snapshot
in the .snap directory in the root of the filesystem being dumped.
Document that if the .snap directory is missing that it must be
created manually and that it should be owned by user root and
group operator and set to mode 770 before a live dump can be run.
Robert Watson [Mon, 17 Nov 2003 23:25:16 +0000 (23:25 +0000)]
Clarify UPDATING language: do buildworld before buildkernel, and
do installkernel before installworld, rather than don't make world
before installkernel.
Mark Murray [Mon, 17 Nov 2003 23:02:21 +0000 (23:02 +0000)]
Overhaul the entropy device:
o Each source gets its own queue, which is a FIFO, not a ring buffer.
The FIFOs are implemented with the sys/queue.h macros. The separation
is so that a low entropy/high rate source can't swamp the harvester
with low-grade entropy and destroy the reseeds.
o Each FIFO is limited to 256 (set as a macro, so adjustable) events
queueable. Full FIFOs are ignored by the harvester. This is to
prevent memory wastage, and helps to keep the kernel thread CPU
usage within reasonable limits.
o There is no need to break up the event harvesting into ${burst}
sized chunks, so retire that feature.
o Break the device away from its roots with the memory device, and
allow it to get its major number automagically.
Robert Watson [Mon, 17 Nov 2003 20:20:53 +0000 (20:20 +0000)]
Add a sysctl, security.bsd.see_other_gids, similar in semantics
to see_other_uids but with the logical conversion. This is based
on (but not identical to) the patch submitted by Samy Al Bahra.
Alan Cox [Mon, 17 Nov 2003 18:22:24 +0000 (18:22 +0000)]
- Change the i386's sf_buf implementation so that it never allocates
more than one sf_buf for one vm_page. To accomplish this, we add
a global hash table mapping vm_pages to sf_bufs and a reference
count to each sf_buf. (This is similar to the patches for RELENG_4
at http://www.cs.princeton.edu/~yruan/debox/.)
For the uninitiated, an sf_buf is nothing more than a kernel virtual
address that is used for temporary virtual-to-physical mappings by
sendfile(2) and zero-copy sockets. As such, there is no reason for
one vm_page to have several sf_bufs mapping it. In fact, using more
than one sf_buf for a single vm_page increases the likelihood that
sendfile(2) blocks, hurting throughput.
(See http://www.cs.princeton.edu/~yruan/debox/.)
Robert Watson [Mon, 17 Nov 2003 16:04:52 +0000 (16:04 +0000)]
Add an entry to the BUGS section indicating that Vinum cannot currently
be used on devices with a block size other than DEV_BSIZE (512),
which specifically includes being unable to run on a swap-backed
md device. Swap-backed md devices use a 4k block size.
Robert Watson [Mon, 17 Nov 2003 15:56:00 +0000 (15:56 +0000)]
Don't attempt to make devices if we're using devfs. This
substantially cleans up the output when running the vinum
management tool, and also makes it work better.
Eivind Eklund [Mon, 17 Nov 2003 14:02:04 +0000 (14:02 +0000)]
* Auto-detect what device to use if none is specified
* Replace references to mcd0 with acd0 (doc only)
* Remove references to the "c" partition (doc only - code was already fixed)
Instead of blindly loading the ums module and bailing out if that fails,
check if it's already loaded or compiled into the kernel, and only try to
load it if it isn't.
Peter Wemm [Mon, 17 Nov 2003 08:58:16 +0000 (08:58 +0000)]
Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.
Bruce Evans [Mon, 17 Nov 2003 07:21:19 +0000 (07:21 +0000)]
Tweaked the siointr1() so that it works better at 921600 bps, especially
with multiple ports on a shared interrupt demultiplexed by the puc_intr()
handler.
siointr1() first read as much input as possible and then checked all
possibly-relevant status registers, partly for robustness and partly
for historical reasons. This is very bad if it is called for every
port sharing an interrupt like puc_intr() does. It can spend too long
reading all the input for some ports when the interrupt is for a more
urgent event on another, or just too long checking all the status
registers when there are lots of ports. The inter-character time is
too long for reading all the input even when the interrupt is for a
transmitter interrupt on the same port, and at 921600 bps the inter-char
time is 10.85 usec and was often exceeded with just 2 ports, leaving
the transmitters idle for about 6% of the time.
The tweak is to break out of the read loop after reading 1 char if
output can be done. This avoids most of the idle transmitter time for
2 active ports at 921600 bps bidirectional on the test system. It
also reduces overhead by about 20%. More complete fixes use the
programmable tx low watermark on 16950's and reduce overhead by another
65%.
David Schultz [Mon, 17 Nov 2003 06:39:38 +0000 (06:39 +0000)]
Reimplement nologin(8) as a C program. This allows us to statically
link it at low cost and avoid environment poisoning attacks associated
with LD_LIBRARY_PATH.
Warner Losh [Mon, 17 Nov 2003 05:21:18 +0000 (05:21 +0000)]
Ignore errors on ln. This is a quick fix for the make depend twice in
a row being broken. A better filx will come as soon as I have time to
analyse things more deeply.
Jacques Vidrine [Mon, 17 Nov 2003 04:19:15 +0000 (04:19 +0000)]
Detect range errors when using the %s specifier. Previously, LONG_MAX
was rejected as a range error, while any values less than LONG_MIN
were silently substituted with LONG_MIN. Furthermore, on some
platforms `time_t' has less range than `long' (e.g. alpha), which may
give incorrect results when parsing some strings.
Brian Feldman [Mon, 17 Nov 2003 03:17:49 +0000 (03:17 +0000)]
Fix a few cases where MT_TAG-type "fake mbufs" are created on the stack, but
do not have mh_nextpkt initialized. Somtimes what's there is "1", and the
ip_input() code pukes trying to m_free() it, rendering divert sockets and
such broken.
This really underscores the need to get rid of MT_TAG.
Bruce Evans [Mon, 17 Nov 2003 02:55:25 +0000 (02:55 +0000)]
Fixed pedantic syntax errors. Many macros didn't permit a semicolon after
their invocation in the !KLD_MODULE case, but a semicolon is provided after
all invocations and is required in the KLD_MODULE case.
Bruce Evans [Mon, 17 Nov 2003 02:11:13 +0000 (02:11 +0000)]
Avoid a warning for compiling with `gcc -Wbad-function cast'. (This
is the warning that points to the bug in `(char *)malloc(...)' where
malloc() is implicitly declared as returning int. We do similar things
here, but they work because u_int is the same as uintptr_t on i386's.)
Robert Watson [Sun, 16 Nov 2003 23:31:45 +0000 (23:31 +0000)]
Implement sockets support for __mac_get_fd() and __mac_set_fd()
system calls, and prefer these calls over getsockopt()/setsockopt()
for ABI reasons. When addressing UNIX domain sockets, these calls
retrieve and modify the socket label, not the label of the
rendezvous vnode.
- Create mac_copy_socket_label() entry point based on
mac_copy_pipe_label() entry point, intended to copy the socket
label into temporary storage that doesn't require a socket lock
to be held (currently Giant).
- Implement mac_copy_socket_label() for various policies.
- Expose socket label allocation, free, internalize, externalize
entry points as non-static from mac_net.c.
- Use mac_socket_label_set() in __mac_set_fd().
MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and
mac_get_peer() to retrieve and set various socket labels without
directly invoking the getsockopt() interface.
Bruce Evans [Sun, 16 Nov 2003 23:05:52 +0000 (23:05 +0000)]
Don't waste so much space for the latency debugging buffer. Its size
will now need editing except for spot checks.
Changed this buffer from a circular one to a linear one. This is more
useful for some cases and the sysctl that prints it doesn't support
circular buffers.
Fixed (output) formatting bugs in this sysctl. An off by 1 error caused
a garbage byte to be returned after annotation of large deltas, and
a race with the writer sometimes caused premature string termination.
Warner Losh [Sun, 16 Nov 2003 22:33:42 +0000 (22:33 +0000)]
Gross kludge:
o when compiling lint, undefine certain things and redefine them so that the
driver doesn't #error out. Since lint kernels aren't supposed to be
bootable, I'm no troubled by this breakage.
David Malone [Sun, 16 Nov 2003 21:51:06 +0000 (21:51 +0000)]
logerror is used in syslogd to log errors from syslogd itself. It
is possible for an error to occur while trying to log an error, and
this can result in infinite recursion (or at least until we run out
of stack).
Rather than this, we ignore requests to log an error while logging an
error.
Gordon Tetlow [Sun, 16 Nov 2003 21:17:43 +0000 (21:17 +0000)]
Invert the condition that installs the dynamic linker early, since
DYNAMICROOT is now the default. Also document -DNO_DYNAMICROOT since
that is going to be a documented feature.
Robert Watson [Sun, 16 Nov 2003 20:21:21 +0000 (20:21 +0000)]
Update mac_set.3 to account for new behavior of mac_set_fd() in the
context of sockets, and document EINVAL as a possible failure mode
based on the object selected, not just the label provided.
Robert Watson [Sun, 16 Nov 2003 20:18:24 +0000 (20:18 +0000)]
Implement mac_get_peer(3) using getsockopt() with SOL_SOCKET and
SO_PEERLABEL. This provides an interface to query the label of a
socket peer without embedding implementation details of mac_t in
the application. Previously, sizeof(*mac_t) had to be specified
by an application when performing getsockopt().
Document mac_get_peer(3), and expand documentation of the other
mac_get(3) functions. Note that it's possible to get EINVAL back
from mac_get_fd(3) when pointing it at an inappropriate object.
NOTE: mac_get_fd() and mac_set_fd() support for sockets will
follow shortly, so the documentation is slightly ahead of the
code.
Robert Watson [Sun, 16 Nov 2003 20:01:50 +0000 (20:01 +0000)]
Abstract the label checking and setting logic from
mac_setsockopt_label() into mac_socket_label_set(); make it non-static
so that it can be invoked from kern_mac.c for mac_set_fd().
Ian Dowse [Sun, 16 Nov 2003 16:48:18 +0000 (16:48 +0000)]
If the unmount by file system ID fails, don't warn before retrying
a non-fsid unmount if the file system ID is all zeros. This is a
temporary workaround for warnings that occur in the vfs.usermount=1
case because non-root users get a zeroed filesystem ID. I have a
more complete fix in the works, but I won't get it done for 5.2.
Maxim Sobolev [Sun, 16 Nov 2003 15:07:10 +0000 (15:07 +0000)]
Pull latest changes from OpenBSD:
- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).
Add the following devices to the list of supported devices, to sync
manual page with the source code:
- HAL Corporation Crossam2+USB IR commander
- RATOC REX-USB60
- SOURCENEXT KeikaiDenwa 8 (with and without charger)
Bruce Evans [Sun, 16 Nov 2003 13:31:45 +0000 (13:31 +0000)]
Restored the call to schedsofttty() (now spelled swi_sched(...)) again.
Its restoration in rev.1.102 was mistranslated to the equivalent of
setsofttty() in rev.1.105. This increased overheads by causing a
context switch to the SWI handler after almost every interrupt. The
increase was approx. 50% on a Celeron 366 (from 23 usec to 34 usec
per interrupt).
Brian Feldman [Sun, 16 Nov 2003 08:10:59 +0000 (08:10 +0000)]
As mentioned by warner, previous revision (opt_ddb.h) was just a fluke --
I'm having bad luck with different parts of the sys tree being checked
out at slightly different times. Back it out, noting it doesn't cause
harm in any case. Tinderbox also makes these things more fun.
Kirk McKusick [Sun, 16 Nov 2003 08:01:58 +0000 (08:01 +0000)]
Convert the live dump command (`dump -L') to use mksnap_ffs instead
of trying to directly create the snapshot itself. This change allows
users logged into the system as operator to run live dumps.
Note that dump no longer tries to create the snapshot in the root of
the filesystem, but rather in a .snap directory in the root of the
filesystem. The reason is that the operator is usually not permitted
to write into the root of the filesystem. The newfs command and
background fsck have both been modified to create a .snap directory
in the root of the filesystem, but if neither of these have been run,
then the .snap directory must be created manually by the superuser
before a live dump can be run. The .snap directory should be owned
by user root and group operator and set to mode 770.