Don't assume that UFS on-disk format of a directory is the same as
defined by <sys/dirent.h>
Always start parsing at DIRBLKSIZ aligned offset, skip first entries if
uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too
small for single entry.
Preallocate buffer for cookies. Cookies will be replaced with d_off
field in struct dirent at later point.
Skip entries with zero inode number.
Stop mangling dirent in ufs_extattr_iterate_directory().
Reviewed by: kib
Sponsored by: Google Summer Of Code 2011
In UFS, i_gen is a random generated value and there is not way for
it to be negative. Actually, the value of i_gen is just used to
match bit patterns and it is of not consequence if the values are
signed or not.
Following other filesystems, set it to unsigned and use it as such,
Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.
On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.
On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.
PowerPC64 is treated the same as amd64.
For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.
Noted by: bde
Sponsored by: The FreeBSD Foundation
Ed Schouten [Sun, 30 Jun 2013 10:38:20 +0000 (10:38 +0000)]
Make atomic_fetch_add() and atomic_fetch_sub() work for pointers with GCC 4.2.
According to the standard, atomic_fetch_*() has to behave identical to
regular arithmetic. This means that for pointer types, we have to apply
the stride when doing addition/subtraction.
The GCC documentation seems to imply this is done for __sync_*() as
well. Unfortunately, both tests and Googling seems to reveal this is not
really the case. Fix this by performing the multiplication with the
stride manually.
Ed Schouten [Sun, 30 Jun 2013 08:59:33 +0000 (08:59 +0000)]
Convert this piece of code to use C11 atomics.
As mentioned before, we should at least aim to have one piece of code in
both user space and kernel space that uses C11 atomics, to get some
coverage. This piece of code can be migrated trivially, so it's a good
candidate.
Ed Schouten [Sun, 30 Jun 2013 08:54:41 +0000 (08:54 +0000)]
Make various fixes to <stdatomic.h>.
- According to the standard, memory_order is a type. Use a typedef.
- atomic_*_fence() and atomic_flag_*() are described by the standard as
functions. Use inline functions to implement them.
- Only expose the atomic_*_explicit() functions in kernel space. We
should not use the short-hand functions, as they will always use
memory_order_seq_cst.
Pyun YongHyeon [Sun, 30 Jun 2013 05:12:18 +0000 (05:12 +0000)]
Fix triggering false watchdog timeout when controller is in PAUSE
state. Previously it used to check if controller has sent a
PAUSE frame to the remote peer.
Reported by: David Imhoff via Brad Smith <brad@OpenBSD.org>
Submitted by: davidch (initial version)
Reviewed by: davidch, David Imhoff <dimhoff_devel@xs4all.nl>
- Fix IMAPx registers values calculation
- Initialize SMAPx registers too although they're unused in QEMU
- Do not pass IO/MEM resources to upper bus for activation, handle them locally.
Previously ACTIVATE method of upper bus was no-op so nothing bad
happened. But now FDT maps physaddr to vaddr and it causes
troubles: fdtbus_activate_resource resource assumes that
bustag/bushandle are already set which in this case is wrong.
Xin LI [Sat, 29 Jun 2013 22:04:04 +0000 (22:04 +0000)]
- Modify swapon(8) so that it uses most of geli(8) defaults for swap,
which is presently: AES-XTS, no authentication. Create provider
with pagesize as sectorsize by default.
- Rewrite parsing code for geli(8)-backed swap options, now options
are required to be exact match, and unrecognized options will trigger
a warning.
- Don't initialize GELI device if it's already initialized. This
restores previous behavior.
- Don't duplicate file descriptor when working with geli(8) and
gbde(8) as there is no need to communicate with the utility other
than exit status.
- When calling swap_on_off_* routines, which_prog can only be SWAP_ON
or SWAP_OFF. Eliminate unneeded case branches by replacing switch
with if's.
- Plug a few memory leaks.
Adrian Chadd [Sat, 29 Jun 2013 19:57:57 +0000 (19:57 +0000)]
Don't log anything if npkts == 0.
This occurs at RX DMA start, even though the RX FIFO has plenty of
space. I'll go figure out why, but this shouldn't cause people to
be spammed by these messages.
Scott Long [Sat, 29 Jun 2013 17:48:59 +0000 (17:48 +0000)]
Introduce accessors for the ccb status word. Convert one (of many more)
modules to use it, will convert the others once the appropriate shed
color is selected by consensus.
Pedro F. Giffuni [Sat, 29 Jun 2013 01:35:28 +0000 (01:35 +0000)]
Bring some updates from ufs_lookup to ext2fs.
r156418:
Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems. This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).
r183079:
- Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.
r187528:
Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.
Davide Italiano [Fri, 28 Jun 2013 21:00:08 +0000 (21:00 +0000)]
- Trim an unused and bogus Makefile for mount_smbfs.
- Reconnect with some minor modifications, in particular now selsocket()
internals are adapted to use sbintime units after recent'ish calloutng
switch.
Davide Italiano [Fri, 28 Jun 2013 20:32:48 +0000 (20:32 +0000)]
Properly use v_data field. This magically worked (even if wrong) until
now because v_data is the first field of the structure, but it's not
something we should rely on.
Mikolaj Golub [Fri, 28 Jun 2013 18:07:41 +0000 (18:07 +0000)]
Rework r252313:
The filedesc lock may not be dropped unconditionally before exporting
fd to sbuf: fd might go away during execution. While it is ok for
DTYPE_VNODE and DTYPE_FIFO because the export is from a vrefed vnode
here, for other types it is unsafe.
Instead, drop the lock in export_fd_to_sb(), after preparing data in
memory and before writing to sbuf.
John Baldwin [Fri, 28 Jun 2013 16:33:45 +0000 (16:33 +0000)]
Make a pass over this page to correct and clarify a few things as well as
some general word-smithing.
- Don't claim that adaptive mutexes have a timeout (they don't).
- Don't treat pool mutexes as a separate primitive in a few places.
- Describe sleepable read-mostly locks as a separate lock type and add
them to the various tables.
- Don't claim that sx locks are less efficient. That hasn't been true in
a few years now.
- Describe lockmanager locks next to sx locks since they are very similar
in terms of rules, etc., and so that all the lock primitives are
grouped together before the non-lock primitives.
- Similarly, move the section on Giant after the description of all the
non-lock primitives to preserve grouping.
- Condition variables work on several types of locks, not just mutexes.
- Add a bit of language to compare/contrast condition variables with
sleep/wakeup.
- Add a note about why pause(9) is unique.
- Add some language to define bounded vs unbounded sleeps and explain
why they are treated separately (bounded sleeps only need CPU time
to make forward progress).
- Don't state that using mtx_sleep() is a bad idea. It is in fact rather
necessary.
- Rework the interaction table a bit. First, it did not include really
include sleepable rmlocks and it left out lockmgr entirely. To get
things to fit, combine similar lock types into the same column / row,
and explicitly state what "sleep" means. The notes about recursion
and lock order were also a bit banal (lock order is always important,
not just in the few places annotated here), so remove them. In
particular, the lock order note would need to be on just about every
cell. If we want to document recursion I think a better approach
would be a separate table summarizing the recursion rules for each
lock as having too many notes clutters the table.
- Tweak the tables to use less indentation so everything still fits with
the added columns.
- Correct a few cells in the context mode table.
- Use mdoc markup instead of explicit markup in a few places.
Ryan Stone [Fri, 28 Jun 2013 15:55:30 +0000 (15:55 +0000)]
Correct a bug that prevented deadlkres from (almost) ever firing.
deadlkres was using a reversed test to check whether ticks had rolled over.
This meant that deadlkres could only fire after ticks had rolled over.
This test was actually unnecessary as deadlkres only ever took the
difference of ticks values which is safe even in the presence of ticks
rollover. Remove the tests entirely. Now deadlkres will properly fire
after a lock has been held after the timeout period.
Peter Grehan [Fri, 28 Jun 2013 06:05:33 +0000 (06:05 +0000)]
Make sure all CPUID values are handled, instead of exiting the
bhyve process when an unhandled one is encountered.
Hide some additional capabilities from the guest (e.g. debug store).
This fixes the issue with FreeBSD 9.1 MP guests exiting the VM on
AP spinup (where CPUID is used when sync'ing the TSCs) and the
issue with the Java build where CPUIDs are issued from a guest
userspace.
Submitted by: tycho nightingale at pluribusnetworks com
Reviewed by: neel
Reported by: many
Jeff Roberson [Fri, 28 Jun 2013 03:51:20 +0000 (03:51 +0000)]
- Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
of usenix 2001. The NetBSD version was heavily refactored for bugs
and simplicity.
- Use this resource allocator to allocate the buffer and transient maps.
Buffer cache defrags are reduced by 25% when used by filesystems with
mixed block sizes. Ultimately this may permit dynamic buffer cache
sizing on low KVA machines.
Mark Johnston [Fri, 28 Jun 2013 03:14:40 +0000 (03:14 +0000)]
The dtmalloc provider uses the short description of a malloc type as the
function name of its corresponding DTrace probes. These descriptions may
contain whitespace, but probe names cannot, so just replace any whitespace
with underscores when creating probes.
Andrew Turner [Thu, 27 Jun 2013 22:26:56 +0000 (22:26 +0000)]
Support reading registers r0-r3 when unwinding. There is a seperate
instruction to load these. We only hit it when unwinding past an trap frame
as in C r0-r3 would never have been saved onto the stack.
John Baldwin [Thu, 27 Jun 2013 20:21:54 +0000 (20:21 +0000)]
Make detaching drivers from PCI devices more robust. While here, fix a
bug where a PCI device would be powered down if it failed to probe, but
not when its driver was detached (e.g. via kldunload).
- Add a new helper method resource_list_release_active() which forcefully
releases any active resources of a specified type from a resource list.
- Add a bus_child_detached method for the PCI bus driver which forces any
active resources to be released (and whines to the console if it finds
any) and then powers the device down.
- Call pci_child_detached() if we fail to probe a device when a driver
is kldloaded. This isn't perfect but can avoid leaking resources
from a probe() routine in the kldload case.
Hiroki Sato [Thu, 27 Jun 2013 18:28:45 +0000 (18:28 +0000)]
- Add vnode-backed swap space specification support. This is enabled when
device names "md" or "md[0-9]*" and a "file" option are specified in
/etc/fstab like this:
md none swap sw,file=/swap.bin 0 0
- Add GBDE/GELI encrypted swap space specification support, which
rc.d/encswap supported. The /etc/fstab lines are like the following:
.eli devices accepts aalgo, ealgo, keylen, and sectorsize as options.
swapctl(8) can understand an encrypted device in the command line
like this:
# swapctl -a /dev/ada2p1.bde
- "-L" flag is added to support "late" option to defer swapon until
rc.d/mountlate runs.
- rc.d script change:
rc.d/encswap -> removed
rc.d/addswap -> just display a warning message if $swapfile is defined
rc.d/swap1 -> renamed to rc.d/swap
rc.d/swaplate -> newly added to support "late" option
These changes alleviate a race condition between device creation/removal
and swapon/swapoff.