jhb [Thu, 5 Mar 2009 16:03:44 +0000 (16:03 +0000)]
At least one BIOS bogusly includes duplicate entries for I/O APICs. The
bogus entries have a starting IRQ that is invalid (> 255, so won't fit
into a PCI intline config register). It had the side effect of breaking
MSI by "claiming" several IRQs in the MSI range. Fix this by ignoring such
I/O APICs.
jhb [Thu, 5 Mar 2009 15:33:04 +0000 (15:33 +0000)]
Always read/write the full 64-bit value of 64-bit BARs. Specifically,
when determining the size of a BAR by writing all 1's to the BAR and
reading back the result, always operate on the full 64-bit size.
jhb [Thu, 5 Mar 2009 15:28:46 +0000 (15:28 +0000)]
Honor the prefetchable flag in memory BARs by setting the RF_PREFETCHABLE
flag when calling bus_alloc_resource() to allocate resources from a parent
PCI bridge. For PCI-PCI bridges this asks the bridge to satisfy the
request using the prefetchable memory range rather than the normal
memory range.
kib [Thu, 5 Mar 2009 11:45:42 +0000 (11:45 +0000)]
Systematically use vm_size_t to specify the size of the segment for VM KPI.
Do not overload the local variable size in kern_shmat() due to vm_size_t
change.
Fix style bug by adding explicit comparision with 0.
rodrigc [Thu, 5 Mar 2009 08:57:35 +0000 (08:57 +0000)]
Add a -o mountprog parameter to mount which explicitly allows
an alternative program to be used for mounting a file system.
Ideally, all file systems
should be converted to pass string arguments to nmount(), so that
/sbin/mount can handle them. However, certain file systems such as FUSE have
not done this, and want to have their own userland mount programs.
For example, to mount an NTFS file system with the FUSE NTFS driver:
mount -t ntfs -o mountprog=/usr/local/bin/ntfs-3g /dev/acd0 /mnt
kientzle [Thu, 5 Mar 2009 02:37:05 +0000 (02:37 +0000)]
Merge r551,r561 from libarchive.googlecode.com: Update gzip read filter
to fully take advantage of the new peek/consume I/O support.
In particular, this now properly handles concatenated gzip streams.
kientzle [Thu, 5 Mar 2009 02:19:42 +0000 (02:19 +0000)]
Merge r364, r378, r379, r393, and r539 from libarchive.googlecode.com:
This is the last phase of the "big decompression refactor" that
puts a lazy reblocking layer between each pair of read filters.
I've also changed the terminology for this area---the two kinds
of objects are now called "read filters" and "read filter bidders"---and
moved ownership of these objects to the archive_read core.
This greatly simplifies implementing new read filters, which
can now use peek/consume I/O semantics both for bidding (arbitrary
look-ahead!) and for reading streams (look-ahead simplifies handling
concatenated streams, for instance).
The first merge here is the overhaul proper; the remainder are small
fixes to correct errors in the initial implementation.
kientzle [Thu, 5 Mar 2009 00:42:50 +0000 (00:42 +0000)]
Merge r357 from libarchive.googlecode.com: bzip2 compression
support can always be enabled even if bzlib doesn't exist on
this platform; don't give up until we fail to open the file.
kientzle [Thu, 5 Mar 2009 00:35:21 +0000 (00:35 +0000)]
Merge r344 from libarchive.googlecode.com: __LA_INT64_T and __LA_SSIZE_T
are part of the public API and therefore need to be exposed. This is
ugly; I'd like to find a better solution for this.
kientzle [Thu, 5 Mar 2009 00:31:48 +0000 (00:31 +0000)]
Merge r341,r345,r346,347 from libarchive.googlecode.com: Style
fixes to test harness and a few extra guards to detect tests
that can't succeed on certain platforms.
jhb [Wed, 4 Mar 2009 21:04:52 +0000 (21:04 +0000)]
The recent PCI resource allocation fixes exposed a bug where the same
BAR could be allocated twice by different children of a vgapci0 device.
To fix this, change the vgapci0 device to track references on its associated
resources so that they are only allocated once from the parent PCI bus and
released when no children are using them. Previously this leaked a small
amount of KVA on at least some architectures.
rrs [Wed, 4 Mar 2009 20:54:42 +0000 (20:54 +0000)]
- PR-SCTP bug, where the CUM-ACK was not being updated
into the advance_peer_ack point so we would incorrectly
send a wrong value in the FWD-TSN
- PR-SCTP bug, where an PR packet is used for a window
probe which could incorrectly get the packet moved
back into the send_queue, which will cause major issues and
should not happen.
- Fix a trace to use the proper macro.
rnoland [Wed, 4 Mar 2009 18:23:48 +0000 (18:23 +0000)]
Extend the management of PCIM_CMD_INTxDIS.
We now explicitly enable INTx during bus_setup_intr() if it is needed.
Several of the ata drivers were managing this bit internally. This is
better handled in pci and it should work for all drivers now.
We also mask INTx during bus_teardown_intr() by setting this bit.
dchagin [Wed, 4 Mar 2009 12:14:33 +0000 (12:14 +0000)]
Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.
Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.
Fix some minor style issues.
Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week
bms [Wed, 4 Mar 2009 03:45:34 +0000 (03:45 +0000)]
In ip_output(), do not acquire the IN_MULTI_LOCK(),
and do not attempt to perform a group lookup.
This is a socket layer lock, and the bottom half of IP
really has no business taking it.
Use the value of the in_mcast_loop sysctl to determine
if we should loop back by default, in the absence of
any multicast socket options. Because the check on
group membership is now deferred to the input path,
an m_copym() is now required.
This should increase multicast send performance where the
source has not requested loopback, although this has not been
benchmarked or measured.
It is also a necessary change for IN_MULTI_LOCK to become
non-recursive, which is required in order to implement IGMPv3
in a thread-safe way.
bms [Wed, 4 Mar 2009 03:40:02 +0000 (03:40 +0000)]
Add sysctl net.inet.ip.mcast.loop. This controls whether or not
IPv4 multicast sends are looped back to senders by default
on a stack-wide basis, rather than relying on the socket option.
Note that the sysctl only applies to newly created multicast sockets.
das [Wed, 4 Mar 2009 03:38:51 +0000 (03:38 +0000)]
Add dprintf() and vdprintf() from POSIX.1-2008. Like getline(),
dprintf() is a simple wrapper around another function, so we may as
well implement it. But also like getline(), we can't prototype it by
default right now because it would break too many ports.
das [Wed, 4 Mar 2009 03:32:28 +0000 (03:32 +0000)]
- Add getsid, fchdir, getpgid, lchown, pread, pwrite, truncate,
*at, and fexecve to the POSIX.1-2008 namespace.
- Remove getwd, ualarm, usleep, and vfork from the XSI namespace.
- Remove mkdtemp from the POSIX.1-2008 namespace (should be in stdlib.h).
das [Wed, 4 Mar 2009 03:31:10 +0000 (03:31 +0000)]
- Add getsubopt and mkdtemp to the POSIX.1-2008 namespace.
- Add mkstemp to the POSIX.1-2008 and BSD namespaces.
- Remove mktemp from the XSI namespace.
bms [Wed, 4 Mar 2009 03:22:03 +0000 (03:22 +0000)]
Merge header file definitions used by the new IGMPv3 implementation.
This is a partial merge. Compatibility defines are retained for
the existing IGMPv2 implementation.
bms [Wed, 4 Mar 2009 02:55:04 +0000 (02:55 +0000)]
Overlay a uint16_t field suitable for use by the
IGMPv3 code. It is used to maintain the number of
group records contained in a pending IGMPv3 output
mbuf chain.
davidch [Wed, 4 Mar 2009 00:05:40 +0000 (00:05 +0000)]
- Updated firmware to latest 4.6.X release.
- Added missing firmware for 5709 A1 controllers.
- Changed some debug statistic variable names to be more consistent.
For the moment disable the VIMAGE_CTASSERTs as people have trouble
while developing and compiling with kernel options that change the
size of at least one structure. The current kernel build framework
does not allow us to pass -Dxxx to module builds so we would possibly
need a kernel option to disable the checks and that might not work
for people just building modules alone.
For now they helped to identify possibly API problems and bring
those back into minds of developers seeking for better solutions.
Problems reported by: kib, warner
Reviewed by: warner
Clang disallows structs with variable length arrays to be nested inside
other structs, because this is in violation with ISO C99. Even though we
can keep bugging the LLVM folks about this issue, we'd better just fix
our code to not do this. This code seems to be the only code in the
entire source tree that does this.
I haven't tested this patch by using the kernel modules in question, but
Diane Bruce and I have compared disassembled versions of these kernel
modules. We would have expected them to be exactly the same, but due to
randomness in the register allocator and reordering of instructions,
there were some minor differences.
rwatson [Tue, 3 Mar 2009 17:15:05 +0000 (17:15 +0000)]
Reduce the verbosity of SDT trace points for DTrace by defining several
wrapper macros that allow trace points and arguments to be declared
using a single macro rather than several. This means a lot less
repetition and vertical space for each trace point.
Use these macros when defining privilege and MAC Framework trace points.
kientzle [Tue, 3 Mar 2009 17:07:27 +0000 (17:07 +0000)]
Merge r340 from libarchive.googlecode.com: If zlib/bzlib aren't available,
we can still detect gzip/bzip2 compressed streams, we just can't
decompress them.
jhb [Tue, 3 Mar 2009 16:38:59 +0000 (16:38 +0000)]
Further refine the handling of resources for BARs in the PCI bus driver.
A while back, Warner changed the PCI bus code to reserve resources when
enumerating devices and simply give devices the previously allocated
resources when they call bus_alloc_resource(). This ensures that address
ranges being decoded by a BAR are always allocated in the nexus0 device
(or whatever device the PCI bus gets its address space from) even if a
device driver is not attached to the device. This patch extends this
behavior further:
- To let the PCI bus distinguish between a resource being allocated by
a device driver vs. merely being allocated by the bus, use
rman_set_device() to assign the device to the bus when it is owned
by the bus and to the child device when it is allocated by the child
device's driver. We can now prevent a device driver from allocating
the same device twice. Doing so could result in odd things like
allocating duplicate virtual memory to map the resource on some
archs and leaking the original mapping.
- When a PCI device driver releases a resource, don't pass the request
all the way up the tree and release it in the nexus (or similar device)
since the BAR is still active and decoding. Otherwise, another device
could later allocate the same range even though it is still in use.
Instead, deactivate the resource and assign it back to the PCI bus
using rman_set_device().
- pci_delete_resource() will actually completely free a BAR including
attemping to disable it.
- Disable BAR decoding via the command register when sizing a BAR in
pci_alloc_map() which is used to allocate resources for a BAR when
the BIOS/firmware did not assign a usable resource range during boot.
This mirrors an earlier fix to pci_add_map() which is used when to
size BARs during boot.
- Move the activation of I/O decoding in the PCI command register into
pci_activate_resource() instead of doing it in pci_alloc_resource().
Previously we could actually enable decoding before a BAR was
initialized via pci_alloc_map().
Start removing IPv6 Type 0 Routing header code.
RH0 was deprecated by RFC 5095.
While most of the code had been disabled by #if 0 already, leave a
bit of infrastructure for possible RH2 code and a log message under
BURN_BRIDGES in case a user still tries to send RH0 packets.
avg [Tue, 3 Mar 2009 13:10:25 +0000 (13:10 +0000)]
udf_readdir: do not advance offset if entry can not be uio-ed
Previosly readdir missed some directory entries because there was
no space for them in current uio but directory stream offset
was advanced nevertheless.
jhb has discoved the issue and provided a test-case.
kientzle [Tue, 3 Mar 2009 07:01:57 +0000 (07:01 +0000)]
Merge r294 from libarchive.googlecode.com: Skip testing for
locale-based failures on systems where the "C" locale is so permissive
that it cannot possibly fail. In particular, this fixes a test
problem on Cygwin.
mav [Tue, 3 Mar 2009 06:39:38 +0000 (06:39 +0000)]
Set PortMultiplier port only for SATA2 channels, where it is applicable.
Doing it on old SATA controllers like Promise PDC20375 SATA150 breaks
their operation.
delphij [Mon, 2 Mar 2009 23:47:18 +0000 (23:47 +0000)]
Diff reduction against OpenBSD: ANSI'fy prototypes.
(This is part of a larger changeset which is intended to reduce diff only,
thus some prototypes were left intact since they will be changed in the
future).
jamie [Mon, 2 Mar 2009 23:26:30 +0000 (23:26 +0000)]
Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.
Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.
While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.
luigi [Mon, 2 Mar 2009 22:16:50 +0000 (22:16 +0000)]
curr_time is a 64 bit variable so SYSCTL_LONG is not appropriate
as a handler.
The variable was exported only for debugging, but there is little reason
to do it now that the timekeeping is supported by various other variables.
For the time being just comment out the sysctl, but I think this
should go away.
luigi [Mon, 2 Mar 2009 22:11:48 +0000 (22:11 +0000)]
fw_debug has been unused for ages, so remove it from the list
of sysctl_variables.
I would also remove it from the VNET record but I am unsure if
there is any ABI issue -- so for the time being just mark it as
unused in ip_fw.h, and then we will collect the garbage at some
appropriate time in the future.
kan [Mon, 2 Mar 2009 20:51:39 +0000 (20:51 +0000)]
Change vfs_busy to wait until an outcome of pending unmount
operation is known and to retry or fail accordingly to that
outcome. This fixes the problem with namespace traversing
programs failing with random ENOENT errors if someone just
happened to try to unmount that same filesystem at the same
time.
rnoland [Mon, 2 Mar 2009 19:00:41 +0000 (19:00 +0000)]
Disable INTx when enabling MSI/MSIX
This addresses interrupt storms that were noticed after enabling MSI
in drm. I think this is due to a loose interpretation of the PCI 2.3
spec, which states that a function using MSI is prohibitted from using
INTx. It appears that some vendors interpretted that to mean that they
should handle it in hardware, while others felt it was the drivers
responsibility.
This fix will also likely resolve interrupt storm related issues with
devices other than drm.
kib [Mon, 2 Mar 2009 18:53:30 +0000 (18:53 +0000)]
Correct types of variables used to track amount of allocated SysV shared
memory from int to size_t. Implement a workaround for current ABI not
allowing to properly save size for and report more then 2Gb sized segment
of shared memory.
This makes it possible to use > 2 Gb shared memory segments on 64bit
architectures. Please note the new BUGS section in shmctl(2) and
UPDATING note for limitations of this temporal solution.
kib [Mon, 2 Mar 2009 18:43:50 +0000 (18:43 +0000)]
Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.
nwhitehorn [Mon, 2 Mar 2009 15:22:01 +0000 (15:22 +0000)]
Some Apple I2C buses give the device's I2C address in a property with the
name i2c-address instead of reg. Change the OFW I2C probe to check both
locations for the address.
Submitted by: Marco Trillo
Reported by: Justin Hibbits
- The filesz parameter in audit_control(5) now accepts suffixes: 'B' for
Bytes, 'K' for Kilobytes, 'M' for Megabytes, and 'G' for Gigabytes.
For legacy support no suffix defaults to bytes.
- Audit trail log expiration support added. It is configured in
audit_control(5) with the expire-after parameter. If there is no
expire-after parameter in audit_control(5), the default, then the audit
trail files are not expired and removed. See audit_control(5) for
more information.
- Change defaults in audit_control: warn at 5% rather than 20% free for audit
partitions, rotate automatically at 2mb, and set the default policy to
cnt,argv rather than cnt so that execve(2) arguments are captured if
AUE_EXECVE events are audited. These may provide more usable defaults for
many users.
- Use au_domain_to_bsm(3) and au_socket_type_to_bsm(3) to convert
au_to_socket_ex(3) arguments to BSM format.
- Fix error encoding AUT_IPC_PERM tokens.