John Baldwin [Tue, 10 Mar 2009 17:00:28 +0000 (17:00 +0000)]
- Remove a recently added comment from kernel_sysctlbyname() that isn't
needed.
- Move the release of the sysctl sx lock after the vsunlock() in
userland_sysctl() to restore the original memlock behavior of
minimizing the amount of memory wired to handle sysctl requests.
John Baldwin [Tue, 10 Mar 2009 15:26:50 +0000 (15:26 +0000)]
Add an ABI compat shim for the vfs.bufspace sysctl for sysctl requests that
try to fetch it as an int rather than a long. If the current value is
greater than INT_MAX it reports a value of INT_MAX.
Guido van Rooij [Tue, 10 Mar 2009 15:23:43 +0000 (15:23 +0000)]
When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)
Guido van Rooij [Tue, 10 Mar 2009 15:19:49 +0000 (15:19 +0000)]
When swap resides on a mirror and it is not stopped, the mirror
is degraded upon the next reboot and will have to be rebuild.
Thus call swapoff when rebooting (read: when stopping swap1)
Robert Watson [Tue, 10 Mar 2009 14:52:17 +0000 (14:52 +0000)]
Add tcpp -- TCP parallelism microbenchmark.
This tool creates large numbers of TCP connections, each of which will
transmit a fixed amount of data, between client and server hosts. tcpp can
use multiple workers (typically up to the number of hardware cores), and can
use multiple source IPs in order to use an expanded port/IP 4-tuple space to
avoid problems from reusing 4-tuples too quickly. Aggregate bandwidth use
will be reported after a client run.
While by no means a perfect tool, it has proven quite useful in generating
and optimizing TCP stack lock contention by easily generating high-intensity
workloads. It also proves surprisingly good at finding device driver bugs.
Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup. pflogd(8)
is an example. The problem is understood and there is a fix coming in
shortly.
Folks who want to continue using it can do so by setting
Robert Watson [Tue, 10 Mar 2009 11:46:41 +0000 (11:46 +0000)]
Merge r183430 from vendor/top/dist to head/contrib/top, although with
record-only mergeinfo because an automated merge is confused by the
flattening that took place:
Move install to install-sh to prevent name-clashes.
Ed Schouten [Tue, 10 Mar 2009 11:28:54 +0000 (11:28 +0000)]
Make a 1:1 mapping between syscons stats and terminal emulators.
After I imported libteken into the source tree, I noticed syscons didn't
store the cursor position inside the terminal emulator, but inside the
virtual terminal stat. This is not very useful, because when you
implement more complex forms of line wrapping, you need to keep track of
more state than just the cursor position.
Because the kernel messages didn't share the same terminal emulator as
ttyv0, this caused a lot of strange things, like kernel messages being
misplaced and a missing notification to resize the terminal emulator for
kernel messages never to be resized when using vidcontrol.
This patch just removes kernel_console_ts and adds a special parameter
to te_puts to determine whether messages should be printed using regular
colors or the ones for kernel messages.
Marcel Moolenaar [Tue, 10 Mar 2009 06:21:52 +0000 (06:21 +0000)]
Fix a buglet in revision 189401: when restoring a 64-bit BAR,
write the upper 32-bits in the adjacent bar. The consequences
of the buglet were severe enough though: a machine check.
Alan Cox [Tue, 10 Mar 2009 02:12:03 +0000 (02:12 +0000)]
Eliminate the last use of the recursive mapping to access user-space page
table pages. Now, all accesses to user-space page table pages are
performed through the direct map. (The recursive mapping is only used
to access kernel-space page table pages.)
Eliminate the TLB invalidation on the recursive mapping when a user-space
page table page is removed from the page table and when a user-space
superpage is demoted.
Alexander Motin [Mon, 9 Mar 2009 20:48:57 +0000 (20:48 +0000)]
Add type specific suspend/resume ata channel functions. Add checks to avoid
crash on detached channel resume. Add placeholder for possible type-specific
suspend/resume routines.
John Baldwin [Mon, 9 Mar 2009 19:35:20 +0000 (19:35 +0000)]
Adjust some variables (mostly related to the buffer cache) that hold
address space sizes to be longs instead of ints. Specifically, the follow
values are now longs: runningbufspace, bufspace, maxbufspace,
bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace,
hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a
relatively small number (~ 44000) of buffers set in kern.nbuf would result
in integer overflows resulting either in hangs or bogus values of
hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see
such problems. There was a check for a nbuf setting that would cause
overflows in the auto-tuning of nbuf. I've changed it to always check and
cap nbuf but warn if a user-supplied tunable would cause overflow.
Note that this changes the ABI of several sysctls that are used by things
like top(1), etc., so any MFC would probably require a some gross shims
to allow for that.
John Baldwin [Mon, 9 Mar 2009 19:04:53 +0000 (19:04 +0000)]
Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the
debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on
a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of
the total runtime of 'sysctl -a'. It also performs lots of copyout's while
holding the namecache lock (this does not attempt to fix that).
Bruce M Simpson [Mon, 9 Mar 2009 17:53:05 +0000 (17:53 +0000)]
Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.
Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.
John Baldwin [Mon, 9 Mar 2009 17:16:29 +0000 (17:16 +0000)]
- Make it possible to disable GPT support by setting LOADER_NO_GPT_SUPPORT
in make.conf or src.conf.
- When GPT is enabled (which it is by default), use memory above 1 MB and
leave the memory from the end of the bss to the end of the 640k window
purely for the stack. The loader has grown and now it is much more
common for the heap and stack to grow into each other when both are
located in the 640k window.
Andrew Thompson [Mon, 9 Mar 2009 17:05:31 +0000 (17:05 +0000)]
Install libusb20.so.1 as libusb.so.1, there will be a followup commit to the
ports tree so that programs use libusb from the base by default. Thanks to
Stanislav Sedov for sorting out the ports build.
Warner Losh [Mon, 9 Mar 2009 13:20:23 +0000 (13:20 +0000)]
Fix a long-standing bug in newbus. It was introduced when subclassing
was introduced. If you have a bus, say cardbus, that is derived from
a base-bus (say PCI), then ordinarily all PCI drivers would attach to
cardbus devices. However, there had been one exception: kldload
wouldn't work.
The problem is in devclass_add_driver. In this routine, all we did
was call to the pci device's BUS_DRIVER_ADDED routine. However, since
cardbus bus instances had a different devclass, none of them were
called.
The solution is to call all subclass devclasses, recursively down the
tree, of the class that was loaded. Since we don't have a 'children
class' pointer, we search the whole list of devclasses for a class
whose parent matches. Since just done a kldload time, this isn't as
bad as it sounds. In addition, we short-circuit the whole process by
marking those classes with subclasses with a flag. We'll likely have
to reevaluate this method the number of devclasses with subclasses
gets large.
This means we can remove the "cardbus" lines from all the PCI drivers
since we have no cardbus specific attach device attachments in the
tree.
Robert Watson [Mon, 9 Mar 2009 13:12:48 +0000 (13:12 +0000)]
Use a u_int for p_lock instead of a char: this avoids a (somewhat
unlikely but not impossible given modern thread counts) wrap-around,
and the compiler was padding it out to an int (at least) anyway.
Robert Watson [Mon, 9 Mar 2009 13:11:16 +0000 (13:11 +0000)]
Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise. Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).
Robert Watson [Mon, 9 Mar 2009 10:45:58 +0000 (10:45 +0000)]
Add a new thread-private flag, TDP_AUDITREC, to indicate whether or
not there is an audit record hung off of td_ar on the current thread.
Test this flag instead of td_ar when auditing syscall arguments or
checking for an audit record to commit on syscall return. Under
these circumstances, td_pflags is much more likely to be in the cache
(especially if there is no auditing of the current system call), so
this should help reduce cache misses in the system call return path.
Pyun YongHyeon [Mon, 9 Mar 2009 08:17:46 +0000 (08:17 +0000)]
For IP1001 PHYs, read auto-negotiation advertisement register to
get default next page configuration. While I'm here explicitly set
IP1000PHY_ANAR_CSMA bit. This bit is read-only and always set
by hardware so setting it has no effect but it would clear the
intention. With this change controllers that couldn't establish
1000baseT link should work.
Robert Noland [Mon, 9 Mar 2009 07:47:03 +0000 (07:47 +0000)]
Change the flags to bus_dmamem around to allow it to sleep waiting for
resources during allocation, but not during map load. Also, zero the
buffers here.
Robert Noland [Mon, 9 Mar 2009 07:38:22 +0000 (07:38 +0000)]
Fix the flags to bus_dmamem_* to allow the allocation to sleep while
waiting for resources. It is really the load that we can't defer.
BUS_DMA_NOCACHE belongs on bus_dmamap_load() as well.
Pyun YongHyeon [Mon, 9 Mar 2009 06:02:55 +0000 (06:02 +0000)]
Add a new tunable hw.re.prefer_iomap which disables memory register
mapping. The tunable is OFF for all controllers except RTL8169SC
family. RTL8169SC seems to require more magic to use memory
register mapping. r187483 added a fix for RTL8169SCe controller but
it does not looke like fix other variants of RTL8169SC.
Tested by: Gavin Stone-Tolcher g.stone-tolcher <> its dot uq dot edu dot au
Alan Cox [Mon, 9 Mar 2009 03:35:25 +0000 (03:35 +0000)]
Change pmap_enter_quick_locked() so that it uses the kernel's direct map
instead of the pmap's recursive mapping to access the lowest level of the
page table when it maps a user-space virtual address.
Andrew Thompson [Sun, 8 Mar 2009 22:58:19 +0000 (22:58 +0000)]
MFp4 //depot/projects/usb@158868
Fix bugs and improve HID parsing.
- fix possible memory leak found
- fix possible NULL pointer access
- fix possible invalid memory read
- parsing improvements
- reset item data position when a new report ID is detected.
Robert Watson [Sun, 8 Mar 2009 22:19:28 +0000 (22:19 +0000)]
By default, don't compile in counters of calls to various time
query functions in the kernel, as these effectively serialize
parallel calls to the gettimeofday(2) system call, as well as
other kernel services that use timestamps.
Use the NetBSD version of the fix (kern_tc.c:1.32 by ad@) as
they have picked up our timecounter code and also ran into the
same problem.
Reported by: kris
Obtained from: NetBSD
MFC after: 3 days
Robert Watson [Sun, 8 Mar 2009 21:48:29 +0000 (21:48 +0000)]
Decompose the global UNIX domain sockets rwlock into two different
locks: a global list/counter/generation counter protected by a new
mutex unp_list_lock, and a global linkage rwlock, unp_global_rwlock,
which protects the connections between UNIX domain sockets.
This eliminates conditional lock acquisition that was previously a
property of the global lock being held over sonewconn() leading to a
call to uipc_attach(), which also required the global lock, but
couldn't rely on it as other paths existed to uipc_attach() that
didn't hold it: now uipc_attach() uses only the list lock, which
follows the linkage lock in the lock order. It may also reduce
contention on the global lock for some workloads.
Add global UNIX domain socket locks to hard-coded witness lock
order.
Ed Schouten [Sun, 8 Mar 2009 19:09:55 +0000 (19:09 +0000)]
Don't disable CR-to-NL translation when waiting for data to arrive.
A difference between the old and the new TTY layer is that the new
implementation does not perform any post-processing before returning
data back to userspace when calling read().
sh(1)'s read turns the TTY into a raw mode before calling select(). This
means that the first character will not receive any ICRNL processing.
Inherit this flag from the original terminal attributes.
Even though this issue is not present on RELENG_*, I'm MFCing it to make
sh(1) in jails behave better.
Add a default implementation for VOP_VPTOCNP(9) which scans the parent
directory of a vnode to find a dirent with a matching file number. The
name from that dirent is then used to provide the component name.
Note: if the initial vnode argument is not a directory itself, then
the default VOP_VPTOCNP(9) implementation still returns ENOENT.
Robert Watson [Sun, 8 Mar 2009 12:32:06 +0000 (12:32 +0000)]
Remove 'uio' argument from MAC Framework and MAC policy entry points for
extended attribute get/set; in the case of get an uninitialized user
buffer was passed before the EA was retrieved, making it of relatively
little use; the latter was simply unused by any policies.
Obtained from: TrustedBSD Project
Sponsored by: Google, Inc.
Robert Watson [Sun, 8 Mar 2009 10:58:37 +0000 (10:58 +0000)]
Improve the consistency of MAC Framework and MAC policy entry point
naming by renaming certain "proc" entry points to "cred" entry points,
reflecting their manipulation of credentials. For some entry points,
the process was passed into the framework but not into policies; in
these cases, stop passing in the process since we don't need it.
Tim Kientzle [Sun, 8 Mar 2009 06:03:15 +0000 (06:03 +0000)]
Merge r687-689,691,693-701,720 from libarchive.googlecode.com:
Translate getdate.y into C for portability. Make the get_date()
function easier to test as well:
* Have it accept a time_t "now" to use as a reference so that test
code can verify relative time specifications against known starting
points.
* Set up default date after parsing the string so that we
can use the specified timezone (if any) instead of the local
default. Otherwise, local DST makes it almost impossible to
reliably test time specifications such as "sunday UTC"
Tim Kientzle [Sun, 8 Mar 2009 05:47:21 +0000 (05:47 +0000)]
Merger r629-631,633-646,648,654,678,681,682 from libarchive.googlecode.com:
Many changes for Windows compatibility. bsdtar_test now runs successfully
on both POSIX platforms and Windows.
Tim Kientzle [Sun, 8 Mar 2009 05:38:45 +0000 (05:38 +0000)]
Merge r368,496,625,626 from libarchive.googlecode.com: A number of
style and portability tweaks to the test harness. Most significantly,
don't use getopt().