Nathan Whitehorn [Wed, 11 Mar 2009 03:19:19 +0000 (03:19 +0000)]
Change the PVO zone for fictitious pages to the unmanaged PVO zone, to match
the unmanaged flag set in the PVO attributes. Without doing this,
pmap_remove() could try to remove fictitious pages (like those created
by mmap of physical memory) from the wrong UMA zone, causing a panic.
Sam Leffler [Wed, 11 Mar 2009 01:12:52 +0000 (01:12 +0000)]
o disallow write to RedBoot and FIS directory partitions; these are painful
to resurrect (maybe honor foot shooting bit in kern.geom_debugflags)
o fix match macro so we now recognize we want to merge FIS dir with RedBoot
config parameters even if we don't actually do it
Robert Watson [Wed, 11 Mar 2009 00:29:22 +0000 (00:29 +0000)]
Add INP_INHASHLIST flag for inpcb->inp_flags to indicate whether
or not the inpcb is currenty on various hash lookup lists, rather
than using (lport != 0) to detect this. This means that the full
4-tuple of a connection can be retained after close, which should
lead to more sensible netstat output in the window between TCP
close and socket close.
Sam Leffler [Tue, 10 Mar 2009 22:29:42 +0000 (22:29 +0000)]
choose the size of the last region for d_stripsize instead of the first;
this fixes geom_redboot on boards that have multiple parts/regions as it
uses the value to locate the FIS directory which is in the last erase
region of flash
Sam Leffler [Tue, 10 Mar 2009 21:49:22 +0000 (21:49 +0000)]
add IXP4XX_FLASH_SIZE config knob that can be used to override the default
flash size; this is necessary at the moment because we map all of flash at
boot, eventually we'll do this on the fly
John Baldwin [Tue, 10 Mar 2009 21:28:43 +0000 (21:28 +0000)]
- Make maxpipekva a signed long rather than an unsigned long as overflow
is more likely to be noticed with signed types.
- Make amountpipekva a long as well to match maxpipekva.
John Baldwin [Tue, 10 Mar 2009 21:27:15 +0000 (21:27 +0000)]
In the ABI shim for vfs.bufspace, rather than truncating values larger than
INT_MAX to INT_MAX, just go ahead and write out the full long to give an
error of ENOMEM to the user process.
Update the Chelsio driver to the latest bits from Chelsio
Firmware upgraded to 7.1.0 (from 5.0.0).
T3C EEPROM and SRAM added; Code to update eeprom/sram fixed.
fl_empty and rx_fifo_ovfl counters can be observed via sysctl.
Two new cxgbtool commands to get uP logic analyzer info and uP IOQs
Synced up with Chelsio's "common code" (as of 03/03/09)
Sam Leffler [Tue, 10 Mar 2009 19:15:35 +0000 (19:15 +0000)]
o add missing bus_release_resource and bus_deactivate_resource that just
operate on the resource (we have no local resources to manage); this
fixes drivers that alloc/release resources in their probe method and
then do it again in attach
o while here add some prints to catch failures and massage style a bit
John Baldwin [Tue, 10 Mar 2009 18:41:06 +0000 (18:41 +0000)]
- Remove code to set SAVENAME for CREATE or RENAME requests that get a -ve
hit in the name cache. cache_lookup() doesn't actually return ENOENT
for such requests to force the filesystem to do an explicit lookup, so
this was effectively dead code.
- Grab the nfsnode mutex while writing to n_dmtime. We don't grab the lock
when comparing the time against the cached directory mod time (just as
we don't when comparing ctime's for +ve name cache hits) since the
attribute caching is already racy for NFS clients as it is.
Sam Leffler [Tue, 10 Mar 2009 17:16:16 +0000 (17:16 +0000)]
Small cleanup of memory resource allocation from Cambria branch:
o encode need for A4 bus space tag hackery according to the memory
address; checking for "uart" breaks down with the GPS chip support
which is also a uart but does not require the same hackery
o encode the correct memory window instead of carving up all of i/o
space, potentially with a larger window than a device should have;
this likely should be handled in the drivers by using a proper bus
alloc call but since some drivers depend on the bus support to figure
this out we cannot simply mod them
o add optional GPS and RS485 support (conditionally as the support
isn't ready yet)
John Baldwin [Tue, 10 Mar 2009 17:00:28 +0000 (17:00 +0000)]
- Remove a recently added comment from kernel_sysctlbyname() that isn't
needed.
- Move the release of the sysctl sx lock after the vsunlock() in
userland_sysctl() to restore the original memlock behavior of
minimizing the amount of memory wired to handle sysctl requests.
John Baldwin [Tue, 10 Mar 2009 15:26:50 +0000 (15:26 +0000)]
Add an ABI compat shim for the vfs.bufspace sysctl for sysctl requests that
try to fetch it as an int rather than a long. If the current value is
greater than INT_MAX it reports a value of INT_MAX.
Guido van Rooij [Tue, 10 Mar 2009 15:23:43 +0000 (15:23 +0000)]
When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)
Guido van Rooij [Tue, 10 Mar 2009 15:19:49 +0000 (15:19 +0000)]
When swap resides on a mirror and it is not stopped, the mirror
is degraded upon the next reboot and will have to be rebuild.
Thus call swapoff when rebooting (read: when stopping swap1)
Robert Watson [Tue, 10 Mar 2009 14:52:17 +0000 (14:52 +0000)]
Add tcpp -- TCP parallelism microbenchmark.
This tool creates large numbers of TCP connections, each of which will
transmit a fixed amount of data, between client and server hosts. tcpp can
use multiple workers (typically up to the number of hardware cores), and can
use multiple source IPs in order to use an expanded port/IP 4-tuple space to
avoid problems from reusing 4-tuples too quickly. Aggregate bandwidth use
will be reported after a client run.
While by no means a perfect tool, it has proven quite useful in generating
and optimizing TCP stack lock contention by easily generating high-intensity
workloads. It also proves surprisingly good at finding device driver bugs.
Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup. pflogd(8)
is an example. The problem is understood and there is a fix coming in
shortly.
Folks who want to continue using it can do so by setting
Robert Watson [Tue, 10 Mar 2009 11:46:41 +0000 (11:46 +0000)]
Merge r183430 from vendor/top/dist to head/contrib/top, although with
record-only mergeinfo because an automated merge is confused by the
flattening that took place:
Move install to install-sh to prevent name-clashes.
Ed Schouten [Tue, 10 Mar 2009 11:28:54 +0000 (11:28 +0000)]
Make a 1:1 mapping between syscons stats and terminal emulators.
After I imported libteken into the source tree, I noticed syscons didn't
store the cursor position inside the terminal emulator, but inside the
virtual terminal stat. This is not very useful, because when you
implement more complex forms of line wrapping, you need to keep track of
more state than just the cursor position.
Because the kernel messages didn't share the same terminal emulator as
ttyv0, this caused a lot of strange things, like kernel messages being
misplaced and a missing notification to resize the terminal emulator for
kernel messages never to be resized when using vidcontrol.
This patch just removes kernel_console_ts and adds a special parameter
to te_puts to determine whether messages should be printed using regular
colors or the ones for kernel messages.
Marcel Moolenaar [Tue, 10 Mar 2009 06:21:52 +0000 (06:21 +0000)]
Fix a buglet in revision 189401: when restoring a 64-bit BAR,
write the upper 32-bits in the adjacent bar. The consequences
of the buglet were severe enough though: a machine check.
Alan Cox [Tue, 10 Mar 2009 02:12:03 +0000 (02:12 +0000)]
Eliminate the last use of the recursive mapping to access user-space page
table pages. Now, all accesses to user-space page table pages are
performed through the direct map. (The recursive mapping is only used
to access kernel-space page table pages.)
Eliminate the TLB invalidation on the recursive mapping when a user-space
page table page is removed from the page table and when a user-space
superpage is demoted.
Alexander Motin [Mon, 9 Mar 2009 20:48:57 +0000 (20:48 +0000)]
Add type specific suspend/resume ata channel functions. Add checks to avoid
crash on detached channel resume. Add placeholder for possible type-specific
suspend/resume routines.
John Baldwin [Mon, 9 Mar 2009 19:35:20 +0000 (19:35 +0000)]
Adjust some variables (mostly related to the buffer cache) that hold
address space sizes to be longs instead of ints. Specifically, the follow
values are now longs: runningbufspace, bufspace, maxbufspace,
bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace,
hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a
relatively small number (~ 44000) of buffers set in kern.nbuf would result
in integer overflows resulting either in hangs or bogus values of
hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see
such problems. There was a check for a nbuf setting that would cause
overflows in the auto-tuning of nbuf. I've changed it to always check and
cap nbuf but warn if a user-supplied tunable would cause overflow.
Note that this changes the ABI of several sysctls that are used by things
like top(1), etc., so any MFC would probably require a some gross shims
to allow for that.
John Baldwin [Mon, 9 Mar 2009 19:04:53 +0000 (19:04 +0000)]
Move the debug.hashstat sysctl tree under DIAGNOSTIC. I measured the
debug.hashstat.rawnchash sysctl in particular as taking 7 milliseconds on
a 3GHz Intel Xeon (4x2) running 7.1. It accounted for almost a quarter of
the total runtime of 'sysctl -a'. It also performs lots of copyout's while
holding the namecache lock (this does not attempt to fix that).
Bruce M Simpson [Mon, 9 Mar 2009 17:53:05 +0000 (17:53 +0000)]
Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.
Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.
John Baldwin [Mon, 9 Mar 2009 17:16:29 +0000 (17:16 +0000)]
- Make it possible to disable GPT support by setting LOADER_NO_GPT_SUPPORT
in make.conf or src.conf.
- When GPT is enabled (which it is by default), use memory above 1 MB and
leave the memory from the end of the bss to the end of the 640k window
purely for the stack. The loader has grown and now it is much more
common for the heap and stack to grow into each other when both are
located in the 640k window.
Andrew Thompson [Mon, 9 Mar 2009 17:05:31 +0000 (17:05 +0000)]
Install libusb20.so.1 as libusb.so.1, there will be a followup commit to the
ports tree so that programs use libusb from the base by default. Thanks to
Stanislav Sedov for sorting out the ports build.
Warner Losh [Mon, 9 Mar 2009 13:20:23 +0000 (13:20 +0000)]
Fix a long-standing bug in newbus. It was introduced when subclassing
was introduced. If you have a bus, say cardbus, that is derived from
a base-bus (say PCI), then ordinarily all PCI drivers would attach to
cardbus devices. However, there had been one exception: kldload
wouldn't work.
The problem is in devclass_add_driver. In this routine, all we did
was call to the pci device's BUS_DRIVER_ADDED routine. However, since
cardbus bus instances had a different devclass, none of them were
called.
The solution is to call all subclass devclasses, recursively down the
tree, of the class that was loaded. Since we don't have a 'children
class' pointer, we search the whole list of devclasses for a class
whose parent matches. Since just done a kldload time, this isn't as
bad as it sounds. In addition, we short-circuit the whole process by
marking those classes with subclasses with a flag. We'll likely have
to reevaluate this method the number of devclasses with subclasses
gets large.
This means we can remove the "cardbus" lines from all the PCI drivers
since we have no cardbus specific attach device attachments in the
tree.
Robert Watson [Mon, 9 Mar 2009 13:12:48 +0000 (13:12 +0000)]
Use a u_int for p_lock instead of a char: this avoids a (somewhat
unlikely but not impossible given modern thread counts) wrap-around,
and the compiler was padding it out to an int (at least) anyway.
Robert Watson [Mon, 9 Mar 2009 13:11:16 +0000 (13:11 +0000)]
Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise. Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).
Robert Watson [Mon, 9 Mar 2009 10:45:58 +0000 (10:45 +0000)]
Add a new thread-private flag, TDP_AUDITREC, to indicate whether or
not there is an audit record hung off of td_ar on the current thread.
Test this flag instead of td_ar when auditing syscall arguments or
checking for an audit record to commit on syscall return. Under
these circumstances, td_pflags is much more likely to be in the cache
(especially if there is no auditing of the current system call), so
this should help reduce cache misses in the system call return path.
Pyun YongHyeon [Mon, 9 Mar 2009 08:17:46 +0000 (08:17 +0000)]
For IP1001 PHYs, read auto-negotiation advertisement register to
get default next page configuration. While I'm here explicitly set
IP1000PHY_ANAR_CSMA bit. This bit is read-only and always set
by hardware so setting it has no effect but it would clear the
intention. With this change controllers that couldn't establish
1000baseT link should work.
Robert Noland [Mon, 9 Mar 2009 07:47:03 +0000 (07:47 +0000)]
Change the flags to bus_dmamem around to allow it to sleep waiting for
resources during allocation, but not during map load. Also, zero the
buffers here.
Robert Noland [Mon, 9 Mar 2009 07:38:22 +0000 (07:38 +0000)]
Fix the flags to bus_dmamem_* to allow the allocation to sleep while
waiting for resources. It is really the load that we can't defer.
BUS_DMA_NOCACHE belongs on bus_dmamap_load() as well.
Pyun YongHyeon [Mon, 9 Mar 2009 06:02:55 +0000 (06:02 +0000)]
Add a new tunable hw.re.prefer_iomap which disables memory register
mapping. The tunable is OFF for all controllers except RTL8169SC
family. RTL8169SC seems to require more magic to use memory
register mapping. r187483 added a fix for RTL8169SCe controller but
it does not looke like fix other variants of RTL8169SC.
Tested by: Gavin Stone-Tolcher g.stone-tolcher <> its dot uq dot edu dot au
Alan Cox [Mon, 9 Mar 2009 03:35:25 +0000 (03:35 +0000)]
Change pmap_enter_quick_locked() so that it uses the kernel's direct map
instead of the pmap's recursive mapping to access the lowest level of the
page table when it maps a user-space virtual address.