John Baldwin [Wed, 24 Jun 2009 21:10:52 +0000 (21:10 +0000)]
Change the ABI of some of the structures used by the SYSV IPC API:
- The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned
short.
- The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned
short.
- The mode member of struct ipc_perm is now mode_t instead of unsigned short
(this is merely a style bug).
- The rather dubious padding fields for ABI compat with SV/I386 have been
removed from struct msqid_ds and struct semid_ds.
- The shm_segsz member of struct shmid_ds is now a size_t instead of an
int. This removes the need for the shm_bsegsz member in struct
shmid_kernel and should allow for complete support of SYSV SHM regions
>= 2GB.
- The shm_nattch member of struct shmid_ds is now an int instead of a
short.
- The shm_internal member of struct shmid_ds is now gone. The internal
VM object pointer for SHM regions has been moved into struct
shmid_kernel.
- The existing __semctl(), msgctl(), and shmctl() system call entries are
now marked COMPAT7 and new versions of those system calls which support
the new ABI are now present.
- The new system calls are assigned to the FBSD-1.1 version in libc. The
FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls.
- A simplistic framework for tagging system calls with compatibility
symbol versions has been added to libc. Version tags are added to
system calls by adding an appropriate __sym_compat() entry to
src/lib/libc/incldue/compat.h. [1]
Andrew Gallatin [Wed, 24 Jun 2009 21:09:56 +0000 (21:09 +0000)]
Add a dying flag to prevent races at detach.
I tried re-ordering ether_ifdetach(), but this created a new race
where sometimes, when under heavy receive load (>1Mpps) and running
tcpdump, the machine would panic. At panic, the ithread was still in
the original (not dead) if_input() path, and was accessing stale BPF
data structs. By using a dying flag, I can close the interface prior
to if_detach() to be certain the interface cannot send packets up in
the middle of ether_ifdetach.
Robert Watson [Wed, 24 Jun 2009 21:00:25 +0000 (21:00 +0000)]
Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt
the code styles and conventions present in netinet where possible.
Robert Watson [Wed, 24 Jun 2009 20:57:50 +0000 (20:57 +0000)]
Use queue(9) instead of hand-crafted link lists for the global IPX
address list (ipx_ifaddr -> ipx_ifaddrhead), and generally adopt the
naming and usage conventions found in netinet.
Marius Strobl [Wed, 24 Jun 2009 20:56:06 +0000 (20:56 +0000)]
- Change this driver to do taskqueue(9) based TX and interrupt
handling in order to reduce interrupt overhead which results in
better performance.
- Call ether_ifdetach(9) before stopping the controller and the
callouts detach in order to prevent active BPF listeners to clear
promiscuous mode which may lead to the tick callout being restarted
which will trigger a panic once it's actually gone.
- Add explicit IFF_DRV_RUNNING checking in order to prevent extra
link up/down events when using dhclient(8).
- Use the correct macro for deciding whether 2/3 of the available TX
descriptors are used.
- Wrap the RX fault printing in #ifdef CAS_DEBUG in order to not
unnecessarily frighten users and as debugging was the actual
intention. Real errors caused by these faults still will be
accumulated as input errors. It might be a good idea to later on
add driver specific counters for the faults though.
Marius Strobl [Wed, 24 Jun 2009 20:49:02 +0000 (20:49 +0000)]
o merge from amd64:
- r187144: Add a reference to the config(5) manpage and
to the "env" kernel config option.
- Add/enable the default USB drivers. Originally the USB
controller and keyboard drivers were disabled as these
interacted badly with the Open Firmware console driver,
i.e. caused the keyboard to not work with ofw_console(4).
Even when switch to uart(4) and the frame buffer drivers
most of the USB drivers still were kept disabled as
several of them, amongst others all of the drivers for
USB Ethernet controllers, weren't endian clean. With the
new USB stack these problem should be gone now so there's
no longer a reason to not include the same set of USB
drivers amd64 does.
o Remove the commented out device ofw_console; apart from it
being currently broken by some TTY changes one really needs
to know how to actually enable and make it work correctly.
John Baldwin [Wed, 24 Jun 2009 20:01:13 +0000 (20:01 +0000)]
Deprecate the msgsys(), semsys(), and shmsys() system calls by moving
them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC
API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.)
rather than indirecting through the var-args *sys() system calls. The
shmsys() system call was already effectively deprecated for all but
COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case
was to simply invoke nosys().
Joerg Wunsch [Wed, 24 Jun 2009 19:47:53 +0000 (19:47 +0000)]
Drop the defunct FDOPT_NOERRLOG option from all the floppy utilities.
The kernel does not log floppy media errors anymore.
In fdcontrol, do always open the file descriptor in read-only mode so
it can operate on read-only media, as there is no longer a separate
control device to operate on.
Joerg Wunsch [Wed, 24 Jun 2009 19:30:31 +0000 (19:30 +0000)]
With the fdc control device disappearing some 5 years ago, it is no
longer useful for the FD_STYPE and FD_SOPTS ioctls to insist on being
issued on a writable file descriptor. Otherwise, there's no longer a
chance to set the drive type or options when a read-only medium is
present in the drive, as there is no way to obtain a writable fd then.
Marius Strobl [Wed, 24 Jun 2009 19:04:08 +0000 (19:04 +0000)]
Revert the part of r194763 which added a dying flag and instead
call ether_ifdetach(9) before stopping the controller and the
callouts. The consensus is that the latter is now safe to do and
should also solve the problem of active BPF listeners clearing
promiscuous mode can result in the tick callout being restarted
which in turn will trigger a panic once it's actually gone.
Rick Macklem [Wed, 24 Jun 2009 18:30:14 +0000 (18:30 +0000)]
If the initial attempt to refresh credentials in the RPCSEC_GSS client
side fails, the entry in the cache is left with no valid context
(gd_ctx == GSS_C_NO_CONTEXT). As such, subsequent hits on the cache
will result in persistent authentication failure, even after the user has
done a kinit or similar and acquired a new valid TGT. This patch adds a test
for that case upon a cache hit and calls rpc_gss_init() to make another
attempt at getting valid credentials. It also moves the setting of gc_proc
to before the import of the principal name to ensure that, if that case
fails, it will be detected as a failure after going to "out:".
Jack F Vogel [Wed, 24 Jun 2009 18:27:07 +0000 (18:27 +0000)]
Update for the Intel 10G driver, this adds support for
newest hardware, adds multiqueue tx interface, infrastructure
cleanup to allow up to 32 MSIX vectors on newer Nehalem systems.
Bug fixes, etc.
Warner Losh [Wed, 24 Jun 2009 17:23:10 +0000 (17:23 +0000)]
Remove usb. The need to have core@ approve major changes to usb has
passed now that the new usb stack is in the tree. The coordination
issues that necessitated this entry are now OBE.
Alexander Motin [Wed, 24 Jun 2009 17:03:06 +0000 (17:03 +0000)]
Some DMA related changes:
- honor parent DMA tag limitations, as man page requires,
- allow data buffer to be allocated within full 64bit address range, when
support is announced by hardware,
- add quirk, disabling 64bit addresses for broken chips, use it for MCP78.
Fix end-of-line issues that can come up when `lpq' reads information
about a queue from a remote host. That remote host may use \r, \r\n,
or \n\r as the line-ending character. In some cases the remote host
will write a single line of information without *any* EOL sequence.
Translate all the non-unix EOL's to the standard newline, and make
sure the final line includes a terminating newline. Logic is also
added to translate all unprintable characters to '?', but that is
#if-ed out for now.
Unbreak sparc64 after the swap accounting changes: mark kernel_map
entries allocated for translations in pmap_init() as MAP_NOFAULT. This
prevents vm_map_insert from trying to account the entries for swap
usage, that is both wrong and too early to work.
While there, change FALSE to VMFS_NO_SPACE.
Reported and tested by: Florian Smeets <flo at kasimir com>
Reviewed by: marius
Andriy Gapon [Wed, 24 Jun 2009 16:03:57 +0000 (16:03 +0000)]
dtrace/amd64: fix virtual address checks
On amd64 KERNBASE/kernbase does not mean start of kernel memory.
This should fix a KASSERT panic in dtrace_copycheck when copyin*()
is used in D program.
Also make checks for user memory a bit stricter.
Robert Watson [Wed, 24 Jun 2009 14:29:40 +0000 (14:29 +0000)]
Clear 'ia' after iterating if_addrhead for unicast address matching: since
'ifa' was used as the TAILQ_FOREACH() iterator argument, and 'ia' was just
derived form it, it could be left non-NULL which confused later
conditional freeing code. This could cause kernel panics if multicast IP
packets were received. [1]
Call 'struct in_ifaddr *' in ip_rtaddr() 'ia', not 'ifa' in keeping with
normal conventions.
When 'ipstealth' is enabled returns from ip_input early, properly release
the 'ia' reference.
Robert Watson [Wed, 24 Jun 2009 12:06:15 +0000 (12:06 +0000)]
Add stack_print_short() and stack_print_short_ddb() interfaces to
stack(9), which generate a more compact rendition of a stack trace
via the kernel's printf.
Robert Watson [Wed, 24 Jun 2009 10:32:44 +0000 (10:32 +0000)]
Break at_ifawithnet() into two variants:
- at_ifawithnet(), which acquires an locks it needs and returns an
at_ifaddr reference.
- at_ifawithnet_locked(), which relies on the caller locking
at_ifaddr_list, and returns a pointer rather than a reference.
Update various consumers to prefer one or the other, including ether
and fddi output, to properly release at_ifaddr references.
Rework at_control() to manage locking and references in a manner
identical to in_control().
Alan Cox [Wed, 24 Jun 2009 04:45:03 +0000 (04:45 +0000)]
The bits set in a page's dirty mask are a subset of the bits set in its
valid mask. Consequently, there is no need to perform a bit-wise and of
the page's dirty and valid masks in order to determine which parts of a
page are dirty and valid.
Xin LI [Tue, 23 Jun 2009 23:27:35 +0000 (23:27 +0000)]
Merge NetBSD revision 1.14: humanize_number.c is now 2-clause BSD licensed.
(humanize_number.3 intentionally hold back until I make sure why we didn't
merged dehumanize_number(3)).
Jeff Roberson [Tue, 23 Jun 2009 22:42:39 +0000 (22:42 +0000)]
Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.
Jeff Roberson [Tue, 23 Jun 2009 22:12:37 +0000 (22:12 +0000)]
- Use cpuset_t and the CPU_ macros in place of cpumask_t so that ULE
supports arbitrary numbers of cpus rather than being limited by
cpumask_t to the number of bits in a long.
Bjoern A. Zeeb [Tue, 23 Jun 2009 22:08:55 +0000 (22:08 +0000)]
Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.
Jilles Tjoelker [Tue, 23 Jun 2009 21:50:06 +0000 (21:50 +0000)]
Do not fork for a subshell if it is the last thing this shell is doing
(EV_EXIT). The fork is still done as normal if any traps are active.
In many cases, the fork can be avoided even without this change by using {}
instead of (), but in practice many scripts use (), likely because the
syntax is simpler.
Example:
sh -c '(/bin/sleep 10)& sleep 1;ps -p $! -o comm='
Now prints "sleep" instead of "sh". $! is more useful this way.
Most shells (dash, bash, pdksh, ksh93, zsh) seem to print "sleep" for this.
Example:
sh -c '( ( ( (ps jT))))'
Now shows no waiting shell processes instead of four.
Most shells (dash, bash, pdksh, ksh93, zsh) seem to show zero or one.
Rick Macklem [Tue, 23 Jun 2009 21:48:04 +0000 (21:48 +0000)]
When mountd.c parses the nfsv4 root line(s) in /etc/exports, it
allocates data structures that are never linked into the tree or free'd.
As such, mountd would leak memory every time it parsed an nfsv4 root line.
This patch frees up those structures to plug the leak.
Alexander Motin [Tue, 23 Jun 2009 21:45:33 +0000 (21:45 +0000)]
Rework r193814:
While general idea of patch was good, it was not working properly due the way
it was implemented. When we are using same timer interrupt for several of
hard/prof/stat purposes we should not send several IPIs same time to other
CPUs. Sending several IPIs same time leads to terrible accounting/profiling
results due to strong synchronization effect, when the second interrupt
handler accounts processing of the first one.
Interlink timer events in a such way, that no more then one IPI is sent for
any original timer interrupt.
Ed Schouten [Tue, 23 Jun 2009 21:43:02 +0000 (21:43 +0000)]
Improve my last commit: use a separate condvar to serialize.
The advantage of using a separate condvar is that we can just use
cv_signal(9) instead of cv_broadcast(9). It makes no sense to wake up
multiple threads. It also makes the TTY code easier to understand.
t_dcdwait sounds totally unrelated.
Ed Schouten [Tue, 23 Jun 2009 21:33:26 +0000 (21:33 +0000)]
Use dcdwait to block threads to serialize writes.
I suspect the usage of bgwait causes a lot of spurious wakeups when
threads are blocked in the background, because they will be woken up
each time a write() call is performed.
Usermode portion of the support for swap allocation accounting:
- update for getrlimit(2) manpage;
- support for setting RLIMIT_SWAP in login class;
- addition to the limits(1) and sh and csh limit-setting builtins;
- tuning(7) documentation on the sysctls controlling overcommit.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.
The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.
The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.
The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).
Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.
In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)
Jilles Tjoelker [Tue, 23 Jun 2009 20:45:12 +0000 (20:45 +0000)]
sh: Improve handling of setjmp/longjmp volatile:
- remove ineffective and unnecessary (void) &var; [1]
- remove some unnecessary volatile keywords
- add a necessary volatile keyword
- save the old handler before doing something that could use the saved
value
Submitted by: Christoph Mallon [1]
Approved by: ed (mentor)