jhb [Tue, 2 Oct 2012 12:25:30 +0000 (12:25 +0000)]
Rename the module for 'device enc' to "if_enc" to avoid conflicting with
the CAM "enc" peripheral (part of ses(4)). Previously the two modules
used the same name, so only one was included in a linked kernel causing
enc0 to not be created if you added IPSEC to GENERIC. The new module
name follows the pattern of other network interfaces (e.g. "if_loop").
glebius [Tue, 2 Oct 2012 12:03:02 +0000 (12:03 +0000)]
There is a complex race in in_pcblookup_hash() and in_pcblookup_group().
Both functions need to obtain lock on the found PCB, and they can't do
classic inter-lock with the PCB hash lock, due to lock order reversal.
To keep the PCB stable, these functions put a reference on it and after PCB
lock is acquired drop it. If the reference was the last one, this means
we've raced with in_pcbfree() and the PCB is no longer valid.
This approach works okay only if we are acquiring writer-lock on the PCB.
In case of reader-lock, the following scenario can happen:
- 2 threads locate pcb, and do in_pcbref() on it.
- These 2 threads drop the inp hash lock.
- Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock,
does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which
doesn't free the pcb due to two references on it. Then it unlocks the pcb.
- 2 aforementioned threads acquire reader lock on the pcb and run
in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues,
second gets 0 and considers pcb freed, returns.
- The thread that got 1 continutes working with detached pcb, which later
leads to panic in the underlying protocol level.
To plumb that problem an additional INPCB flag introduced - INP_FREED. We
check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend
that that was the last reference.
Discussed with: rwatson, jhb
Reported by: Vladimir Medvedkin <medved rambler-co.ru>
eadler [Tue, 2 Oct 2012 00:30:15 +0000 (00:30 +0000)]
Correct the tip about finding all the directories on the system
Add a tip about clearing the screen.
Make things more consistent by removing quotes around 'make search'
rmacklem [Mon, 1 Oct 2012 12:28:58 +0000 (12:28 +0000)]
Attila Bogar and Herbert Poeckl both reported similar problems
w.r.t. a Linux NFS client doing a krb5 NFS mount against the
FreeBSD server. We determined this was a Linux bug:
http://www.spinics.net/lists/linux-nfs/msg32466.html, however
the mount failed to work, because the Destroy operation with a
bogus encrypted checksum destroyed the authenticator handle.
This patch changes the rpcsec_gss code so that it doesn't
Destroy the authenticator handle for this case and, as such,
the Linux mount will work.
Tested by: Attila Bogar and Herbert Poeckl
MFC after: 2 weeks
pjd [Mon, 1 Oct 2012 05:43:24 +0000 (05:43 +0000)]
- Enforce CAP_MKFIFO on mkfifoat(2), not on mknodat(2). Without this change
mkfifoat(2) was not restricted.
- Introduce CAP_MKNOD and enforce it on mknodat(2).
Sponsored by: FreeBSD Foundation
MFC after: 2 weeks
hselasky [Mon, 1 Oct 2012 05:42:43 +0000 (05:42 +0000)]
Inherit USB mode from RootHUB port where the USB device is connected.
Only RootHUB ports can be dual mode. Disallow OTG ports on external HUBs.
This simplifies some checks in the USB controller drivers.
andrew [Mon, 1 Oct 2012 05:12:17 +0000 (05:12 +0000)]
Fix the clobber list on the atomic operators that do comparisons. Without
this some compilers will place a cmp instruction before the atomic operation
and expect to be able to use the result afterwards. By adding "cc" to the
list of used registers we tell the compiler to not do this.
The USB Bluetooth driver should only grab its own interfaces. This allows the
USB bluetooth driver to co-exist with other USB device classes and drivers.
- Simplify the implementation of atomic_compare_exchange_strong_explicit.
- Evaluate the memory order argument in atomic_fetch_*_explicit macros.
- Implement atomic_store_explicit using atomic_exchange_explicit instead
of a plain assignment.
Simplify send out queue code:
- Write method of a queue now is void,length of item is taken
as queue property.
- Write methods don't need to know about mbud, supply just buf
to them.
- No need for safe queue iterator in pfsync_sendout().
Add support for mincore(). Specifically, this is an adaptation of the
pmap_mincore() implementation that was added to the original arm pmap
in r235717.
Almost each time when loader opens a file, this leads to calling
disk_open(). Very often this is called several times for one file.
This leads to reading partition table metadata for each call. To
reduce the number of disk I/O we have a simple block cache, but it
is very dumb and more than half of I/O operations related to reading
metadata, misses this cache.
Introduce new cache layer to resolve this problem. It is independent
and doesn't need initialization like bcache, and will work by default
for all loaders which use the new DISK API. A successful disk_open()
call to each new disk or partition produces new entry in the cache.
Even more, when disk was already open, now opening of any nested
partitions does not require reading top level partition table.
So, if without this cache, partition table metadata was read around
20-50 times during boot, now it reads only once. This affects the booting
from GPT and MBR from the UFS.
* src/math_private.h:
. Change the API for the LD80C by removing the explicit passing
of the sign bit. The sign can be determined from the last
parameter of the macro.
. On i386, load long double by bit manipulations to work around
at least a gcc compiler issue. On non-i386 ld80 architectures,
use a simple assignment.
* ld80/s_expl.c:
. Update the only consumer of LD80C.
Disable splitfs support, since we aren't support floppies for a long
time. This slightly reduces an overhead, when loader tries to open
file that doesn't exist.
libc: Use O_CLOEXEC for various internal file descriptors.
This fixes a race condition where another thread may fork() before CLOEXEC
is set, unintentionally passing the descriptor to the child process.
This commit only adds O_CLOEXEC flags to open() or openat() calls where no
fcntl(fd, F_SETFD, FD_CLOEXEC) follows. The separate fcntl() call still
leaves a race window so it should be fixed later.
Allow deferred word-splitting via f_sysrc_get() by allowing $IFS in the
"clean-room" environment used to query rc.conf(5) parameters.
This brings bsdconfig(8)'s sysrc.subr in-line with both the sysrc(8) manual
[provided by sysutils/sysrc] and sysrc(8)'s own sysrc.subr (now identical to
bsdconfig(8)'s sysrc.subr as of this patch).
Finally, this will allow a clean import of sysutils/sysrc (sans sysrc.subr,
already provided here).
Reviewed by: jilles
Approved by: adrian (co-mentor)
Simplify and somewhat redesign interaction between pf_purge_thread() and
pf_purge_expired_states().
Now pf purging daemon stores the current hash table index on stack
in pf_purge_thread(), and supplies it to next iteration of
pf_purge_expired_states(). The latter returns new index back.
The important change is that whenever pf_purge_expired_states() wraps
around the array it returns immediately. This makes our knowledge about
status of states expiry run more consistent. Prior to this change it
could happen that n-th run stopped on i-th entry, and returned (1) as
full run complete, then next (n+1) full run stopped on j-th entry, where
j < i, and that broke the mark-and-sweep algorythm that saves references
rules. A referenced rule was freed, and this later lead to a crash.
The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.
o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.
o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.
o Drivers handle their stats theirselves: if_obytes, if_omcasts.
o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.
o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.
o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.
o mxge(4) just no longer needs NO_SLOW_STATS define.
o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.
Change queue overflow checks from DIAGNOSTIC+panic() to KASSERT() to make
them enabled on HEAD by default. It is probably better to do single compare
then hunt for unexpected memory corruption.
- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific
bits under #ifdef _KERNEL but leave definitions for various structures
defined by standards ($PIR table, SMAP entries, etc.) available to
userland.
- Consolidate duplicate SMBIOS table structure definitions in ipmi(4)
and smbios(4) in <machine/pc/bios.h> and make them available to
userland.
Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.
Make the loader a bit smarter, when it tries to open disk and the slice
number is not exactly specified. When the disk has MBR, also try to read
BSD label after ptable_getpart() call. When the disk has GPT, also set
d_partition to 255. Mostly, this is how it worked before.
Remove the topology lock from disk_gone(), it might be called with regular
mutexes held and the topology lock is an sx lock.
The topology lock was there to protect traversing through the list of providers
of disk's geom, but it seems that disk's geom has always exactly one provider.
Change the code to call g_wither_provider() for this one provider, which is
safe to do without holding the topology lock and assert that there is indeed
only one provider.
libc/fts: Use O_CLOEXEC for internal file descriptors.
Because fts keeps internal file descriptors open across calls, making such
descriptors close-on-exec helps not only multi-threaded applications but
also single-threaded applications.
In particular, this prevents passing a temporary file descriptor for saving
the current directory to processes created via find -exec.
Ensure that all cases that enqueue a netgraph item for delivery by a
ngthread properly set the item's depth to 1. In particular, prior to this
change if ng_snd_item failed to acquire a lock on a node, the item's depth
would not be set at all. This fix ensures that the error code from rcvmsg/
rcvdata is properly passed back to the apply callback. For example, this
fixes a bug where an error from rcvmsg/rcvdata would not previously
propagate back to a libnetgraph consumer when the message was queued.
The attempt to merge changes from the linux libtirpc caused
rpc.lockd to exit after startup under unclear conditions.
After many hours of selective experiments and inconsistent results
the conclusion is that it's better to just revert everything and
restart in a future time with a much smaller subset of the
changes.
____
MFC after: 3 days
Reported by: David Wolfskill
Tested by: David Wolfskill
The attempt to merge changes from the linux libtirpc caused
rpc.lockd to exit after startup under unclear conditions.
After many hours of selective experiments and inconsistent results
the conclusion is that it's better to just revert everything and
restart in a future time with a much smaller subset of the
changes.
____
MFC after: 3 days
Reported by: David Wolfskill
Tested by: David Wolfskill
Passing an invalid pointer results in undefined behaviour.
The wrappers in libthr access some of the data pointed to by the arguments
in userland, so that an invalid pointer will cause a signal and not an
[EFAULT] error return.
Furthermore, if the [EFAULT] error occurs when the kernel is writing, it is
not a proper error in the sense that the call still commits (changing the
signal disposition or accepting the signal).
Make sure we record NAK tokens in the TD structure for IN direction.
Improve host channel disabling. Wait two times 125us for channel to be
disabled. The DWC OTG doesn't like when channels are re-used too early.
Kernel and modules have "set_vnet" linker set, where virtualized
global variables are placed. When a module is loaded by link_elf
linker its variables from "set_vnet" linker set are copied to the
kernel "set_vnet" ("modspace") and all references to these variables
inside the module are relocated accordingly.
The issue is when a module is loaded that has references to global
variables from another, previously loaded module: these references are
not relocated so an invalid address is used when the module tries to
access the variable. The example is V_layer3_chain, defined in ipfw
module and accessed from ipfw_nat.
The same issue is with DPCPU variables, which use "set_pcpu" linker
set.
Fix this making the link_elf linker on a module load recognize
"external" DPCPU/VNET variables defined in the previously loaded
modules and relocate them accordingly. For this set_pcpu_list and
set_vnet_list are used, where the addresses of modules' "set_pcpu" and
"set_vnet" linker sets are stored.
Note, archs that use link_elf_obj (amd64) were not affected by this
issue.
Fix bug in TCP_KEEPCNT setting, which slipped in in the last round
of reviewing of r231025.
Unlike other options from this family TCP_KEEPCNT doesn't specify
time interval, but a count, thus parameter supplied doesn't need
to be multiplied by hz.
adrian [Thu, 27 Sep 2012 06:05:54 +0000 (06:05 +0000)]
Track the last ANI TX/RX sample correctly.
This doesn't specifically fix the issue(s) i'm seeing in this 2GHz
environment (where setting/increasing spur immunity causes OFDM restart
errors to skyrocket through the roof; but leaving it at 0 would leave
the environment cleaner..)
Pointy-hat-to: me, for committing this broken code in the first place.
Implementing pmap_kextract(va) as pmap_extract(kernel_pmap, va) is
problematic because some callers to pmap_kextract() expect its
implementation to be lock-less. In particular, uma_dbg_alloc() implicitly
requires this. Otherwise, lock-order reversals occur between pmap locks and
UMA zone locks. So, this change introduces a lock-less implementation of
pmap_kextract().
Disable recursion on the pvh global lock in the new armv6 pmap. While
recursion on this locks occurs in the old arm pmap, it thankfully doesn't
occur in the armv6 pmap.
Add 32-bit ABI compat shims. Those are necessary for i386 binary-only
tools like sysutils/hpacucli (HP P4xx RAID controller management
suite) working on amd64 systems.
- Make C11 atomic macros usable in expressions:
- Replace do-while statements with void expressions.
- Wrap __asm statements in statement expressions.
- Make the macros function-like:
- Evaluate all arguments exactly once.
- Make sure there's a sequence point between evaluation of the arguments
and the function body. Arguments should be evaluated before any memory
barriers.
- Fix use of __atomic_is_lock_free built-in. It requires the address of
an atomic variable as second argument. Use this built-in on clang as
well because clang's __c11_atomic_is_lock_free only takes the size of the
variable into account.
- In atomic_exchange_explicit put the barrier before instead of after the
__sync_lock_test_and_set call.