John Baldwin [Mon, 18 Jul 2011 21:15:47 +0000 (21:15 +0000)]
Rework the dynamic per-CPU stats code a bit. Always set 'statics->ncpus'
to the maximum number of CPUs to ensure that lcpustates[] array is always
allocated to the maximum size. Previously, if top was started without
per-CPU stats it would allocate a smaller lcpustates[] array. When
per-CPU stats were then enabled, it would overflow the array and trash
the cpustates_columns[] array causing the CPU stats to be printed in the
wrong locations.
Fix building of 32-bit compat libraries on amd64 with clang, and using
-g, by reverting r219139. The LLVM PR referenced in that revision was
fixed in the mean time, and we imported a clang snapshot soon
afterwards, so the temporary workaround of disabling clang's integrated
assembler is no longer needed.
In this particular case, using e.g. DEBUG_FLAGS=-g causes clang to
output certain directives into assembly that our version of GNU as
chokes on.
sintrcnt/sintrnames is the address of the size, not the actual size.
Use them appropriately to fetch the actual size.
That fixes vmstat -i with kvm backend.
John Baldwin [Mon, 18 Jul 2011 17:33:08 +0000 (17:33 +0000)]
- Export each thread's individual resource usage in in struct kinfo_proc's
ki_rusage member when KERN_PROC_INC_THREAD is passed to one of the
process sysctls.
- Correctly account for the current thread's cputime in the thread when
doing the runtime fixup in calcru().
- Use TIDs as the key to lookup the previous thread to compute IO stat
deltas in IO mode in top when thread display is enabled.
- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.
No MFC is previewed for this patch.
Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)
If compiling RESCUE always ignore feature_present(3) calls so that
a /rescue/ifconfig more modern than the kernel could still configure
IPv4 or IPv6 addresses.
Reported by: Andrzej Tobola (ato iem.pw.edu.pl)
Reported by: gcooper
MFC after: 1 day
X-MFC: will not MFC any time soon, just reminder for r222527
Martin Matuska [Mon, 18 Jul 2011 08:29:49 +0000 (08:29 +0000)]
ZFS tries to allocate blocks evenly across all devices. This means when
devices are imbalanced zfs will lots of CPU searching for space on devices
which tend to be pretty full. It should instead fail quickly on the full
devices and move onto devices which have more availability.
New loader tunable: vfs.zfs.mg_alloc_failures (min = 8)
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
cddl/contrib/opensolaris/cmd/zpool/zpool.8:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c:
Add the "zpool labelclear" command. This command can be
used to wipe the label data from a drive that is not
active in a pool. The optional "-f" argument can be
used to treat an exported or foreign vdev as "inactive"
thus allowing its label information to be cleared.
Correct reporting of missing leaf vdevs so that the GUID required to
perform pool actions is always displayed.
cddl/contrib/opensolaris/cmd/zpool/zpool_main.c:
The "zpool status" command reports the "last seen at"
device node path when the vdev name is being reported
by GUID. Augment this code to assume a GUID is reported
when a device goes missing after initial boot in addition
to the previous behavior of doing this for devices that
aren't seen at boot.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c:
In zpool_vdev_name(), report recently missing devices
by GUID. There is no guarantee they will return at
their previous location.
cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h:
cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c:
o Add zpool_pool_state_to_name() API to libzfs which converts a
pool_state_t into a user consumable string.
o While here, correct constness of make zpool_state_to_name()
and zpool_label_disk().
Robert Watson [Sun, 17 Jul 2011 23:05:24 +0000 (23:05 +0000)]
Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which
may be jointly referenced via the mask CTLFLAG_CAPRW. Sysctls with these
flags are available in Capsicum's capability mode; other sysctl nodes are
not.
Flag several useful sysctls as available in capability mode, such as memory
layout sysctls required by the run-time linker and malloc(3). Also expose
access to randomness and available kernel features.
A few sysctls are enabled to support name->MIB conversion; these may leak
information to capability mode by virtue of providing resolution on names
not flagged for access in capability mode. This is, generally, not a huge
problem, but might be something to resolve in the future. Flag these cases
with XXX comments.
Revert r222135 by allowing controller reinitialization. Due to
unknown reason Apple UniNorth2 gem(4) device required manual
interface down/up operation after r222135. Even though this is not
correct thing and I don't like to revert it but it would be better
than breaking gem(4) on PPC. This should be revisited.
Ryan Stone [Sun, 17 Jul 2011 21:53:42 +0000 (21:53 +0000)]
Fix a LOR between hwpmc and the kernel linker. When a system-wide
sampling mode PMC is allocated, hwpmc calls linker_hwpmc_list_objects()
while already holding an exclusive lock on pmc-sx lock. list_objects()
tries to acquire an exclusive lock on the kld_sx lock. When a KLD module
is loaded or unloaded successfully, kern_kld(un)load calls into the pmc
hook while already holding an exclusive lock on the kld_sx lock. Calling
the pmc hook requires acquiring a shared lock on the pmc-sx lock.
Fix this by only acquiring a shared lock on the kld_sx lock in
linker_hwpmc_list_objects(), and also downgrading to a shared lock on the
kld_sx lock in kern_kld(un)load before calling into the pmc hook. In
kern_kldload this required moving some modifications of the linker_file_t
to happen before calling into the pmc hook.
This fixes the deadlock by ensuring that the hwpmc -> list_objects() case
is always able to proceed. Without this patch, I was able to deadlock a
multicore system within minutes by constantly loading and unloading an KLD
module while I simultaneously started a sampling mode PMC in a loop.
Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)
Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.
Ryan Stone [Sun, 17 Jul 2011 21:08:16 +0000 (21:08 +0000)]
The MBR uses a 32-bit unsigned integer to store the size of a slice, but
fdisk(1) internally uses a signed int. Should a user attempt to specify
a slice containing more than 2^31 - 1 sectors, an error will be reported
on systems with sizeof(long) == 4 and the slice size will be silently
truncated on systems with sizeof(long) > 4.
Instead use an unsigned long to store the slice size in fdisk(1). This
allows the user to specify a slice size up to the maximum permitted by
the MBR on-disk format and does not have any problems with silent
truncation should the use specify an slice size larger than 2^32 on systems
with sizeof(long) > 4.
Submitted by: Mark Johnston (markjdb AT gmail DOT com)
MFC after: 2 weeks
Hiroki Sato [Sun, 17 Jul 2011 19:24:54 +0000 (19:24 +0000)]
- Improve interface list handling. The rtadvd(8) now supports dynamically-
added/removed interfaces in a more consistent manner and reloading the
configuration file.
- Implement burst unsolicited RA sending into the internal RA timer framework
when AdvSendAdvertisements and/or configuration entries are changed as
described in RFC 4861 6.2.4. This fixes issues that make termination of the
rtadvd(8) daemon take very long time.
An interface now has three internal states, UNCONFIGURED, TRANSITIVE, or
CONFIGURED, and the burst unsolicited sending happens in TRANSITIVE.
See rtadvd.h for the details.
- rtadvd(8) now accepts non-existent interfaces as well in the command line.
- Add control socket support and rtadvctl(8) utility to show the RA information
in rtadvd(8). Dumping by SIGUSR1 has been removed in favor of it.
rc.d/routing: Fix ugly output with additional routing options.
Print a separate "Additional routing options" line for each address family
which has additional options, so that it does not get mixed up with the
output from adding routes.
This also reverts r224048 which added newlines to two arbitrary routing
options.
When building some of the boot loaders with clang, and DEBUG_FLAGS or
CFLAGS having '-g' in it, clang outputs several assembly directives that
are too new for our version of binutils.
Therefore, assemble the resulting .s files with clang instead. A more
general solution can be implemented when a GNU as-compatible driver for
clang's integrated assembler appears.
Ed Schouten [Sun, 17 Jul 2011 08:19:19 +0000 (08:19 +0000)]
Restore binary compatibility for GIO_KEYMAP and PIO_KEYMAP.
Back in 2009 I changed the ABI of the GIO_KEYMAP and PIO_KEYMAP ioctls
to support wide characters. I created a patch to add ABI compatibility
for the old calls, but I didn't get any feedback to that.
It seems now people are upgrading from 8 to 9 they experience this
issue, so add it anyway.
John Baldwin [Sun, 17 Jul 2011 01:23:50 +0000 (01:23 +0000)]
Don't include mptable_pci.c in Xen kernels. It is only meant for systems
that truly have an MPTable. The MPTable code in Xen is really a Xen
specific CPU enumerator and probably shouldn't be using the mptable name
at all.
Rick Macklem [Sat, 16 Jul 2011 20:53:27 +0000 (20:53 +0000)]
The new NFSv4 client handled NFSERR_GRACE as a fatal error
for the remove and rename operations. Some NFSv4 servers will
report NFSERR_GRACE for these operations. This patch changes
the behaviour of the client so that it handles NFSERR_GRACE
like NFSERR_DELAY for non-state related operations like
remove and rename. It also exempts the delegreturn operation
from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE
so that it can handle NFSERR_GRACE in the same manner as before.
This problem was resolved thanks to discussion with bfields at fieldses.org.
The problem was identified at the recent NFSv4 ineroperability
bakeathon.
Don't assume pmap_mapdev() gets called only for memory mapped I/O
addresses (i.e. uncacheable). ACPI in particular uses pmap_mapdev()
rather excessively (number of calls) just to get a valid KVA. To
that end, have pmap_mapdev():
1. cache the last result so that we don't waste time for multiple
consecutive invocations with the same PA/SZ.
2. find the memory descriptor that covers the PA and return NULL
if none was found or when the PA is for a common DRAM address.
3. Use either a region 6 or region 7 KVA, in accordance with the
memory attribute.
Jayachandran C. [Sat, 16 Jul 2011 20:31:29 +0000 (20:31 +0000)]
MIPS changes for Netlogic XLP support.
This patch adds support for the Netlogic XLP mips64 processors in
the common MIPS code. The changes are :
- Add CPU_NLM processor type
- Add cases for CPU_NLM, mostly were CPU_RMI is used.
- Update cache flush changes for CPU_NLM
- Add kernel build configuration files for xLP.
In collaboration with: Prabhath Raman <prabhathpr at netlogicmicro com>
Don't send EOI to the CPU before we handled the interrupt. This could
potentially trigger multiple pending interrupts for level-sensitive
interrupts. However, the event timer interrupt does need EOI before
being handled to avoid missing clock events.
These conflicting requirements are handled by having the XIV handler
inform the dispatch code whether or not it send EOI to the CPU. If not,
the dispatch code will do it. This allows handlers to send EOI before
doing potentially long-running activities, while still have a sensible
default behaviour.
Add a few more helper functions for working with memory descriptors:
o efi_md_find() - returns the md that covers the given address
o efi_md_last() - returns the last md in the list
o efi_md_prev() - returns the md that preceeds the given md.
Add support for booting PS3s from disk. This is still a little hackish until
we can find a way to get the information from petitboot or to guess it, so
the current algorithm is:
1. See if ps3disk3p1 (first GPT slice on OtherOS partition) exists, and if
so try to boot it.
2. Otherwise, netboot.
Submitted by: glevand <geoffrey.levand at mail dot ru >
Jayachandran C. [Sat, 16 Jul 2011 17:22:01 +0000 (17:22 +0000)]
Support compiling MIPS elf trampoline with a different ABI.
Allow changing the trampoline ABI with makeoptions, this will allow
us to have a trampoline with a different ABI from the kernel.
Useful in cases where we have to boot a 64 bit kernel from a
bootloader which supports only 32 bit or vice versa.
Philip Paeps [Sat, 16 Jul 2011 15:43:14 +0000 (15:43 +0000)]
Garbage-collect the tools for maintaining the previous PCI vendors list. The
sources this tool collates are no longer available and the format of the
current database is directly usable by pciconf(8) without needing any special
processing.
Update to use the latest version of the PCI IDs Repository.
As discussed on -current@ in May, this brings in a new source of the database,
which is also used by other operating systems. Our previous sources no longer
exist and this database is actively maintained and more complete in general.
bmake and other updates necessary for the BIND 9.8.x upgrade.
This includes a structural change regarding atomic ops. Previously they
were enabled on all platforms unless we had knowledge that they did not
work. However both work performed by marius@ on sparc64 and the fact that
the 9.8.x branch is fussier in this area has demonstrated that this is
not a safe approach. So I've modified a patch provided by marius to
enable them for i386, amd64, and ia64 only.
Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.
This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.
- Add two missing functions to the LibUSB v0.1 API.
- Clamp the string length to 255 bytes when getting
the interface description.
- Clamp data request length to 65535 bytes when doing
control requests.
Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions.
Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER.
Adrian Chadd [Sat, 16 Jul 2011 00:30:23 +0000 (00:30 +0000)]
The i8259 controller is initialized incorrectly on MALTA. It writes
mask bits to control register and control bits to mask register.
The former causes ICW1_RESET|ICW1_LTIM combination to be written to
control register, which on QEMU results in "level sensitive irq not
supported" error.
John Baldwin [Fri, 15 Jul 2011 21:08:58 +0000 (21:08 +0000)]
Respect the BIOS/firmware's notion of acceptable address ranges for PCI
resource allocation on x86 platforms:
- Add a new helper API that Host-PCI bridge drivers can use to restrict
resource allocation requests to a set of address ranges for different
resource types.
- For the ACPI Host-PCI bridge driver, use Producer address range resources
in _CRS to enumerate valid address ranges for a given Host-PCI bridge.
This can be disabled by including "hostres" in the debug.acpi.disabled
tunable.
- For the MPTable Host-PCI bridge driver, use entries in the extended
MPTable to determine the valid address ranges for a given Host-PCI
bridge. This required adding code to parse extended table entries.
Similar to the new PCI-PCI bridge driver, these changes are only enabled
if the NEW_PCIB kernel option is enabled (which is enabled by default on
amd64 and i386).
Implement two previously-reserved Capsicum system calls:
- cap_new() creates a capability to wrap an existing file descriptor
- cap_getrights() queries the rights mask of a capability.
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
Add an FFS specific mount option to allow a filesystem checker
(typically fsck_ffs) to register that it wishes to use FFS specific
sysctl's to update the filesystem. This ensures that two checkers
cannot run on a given filesystem at the same time and that no other
process accidentally or maliciously uses the filesystem updating
sysctls inappropriately. This functionality is needed by the
journaling soft-updates recovery code.
Code to actually implement Capsicum capabilities, including fileops and
kern_capwrap(), which creates a capability to wrap an existing file
descriptor.
We also modify kern_close() and closef() to handle capabilities.
Finally, remove cap_filelist from struct capability, since we don't
actually need it.
Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
Do not call platform_gpio_init() early. It doesn't work because we do
not have enough information to reliably setup GPIO pins. Do it when
we attach the gpio driver. This prevents hangs and the need to fake
up a softc.