gibbs [Wed, 29 Feb 2012 17:47:01 +0000 (17:47 +0000)]
blkif interface comment cleanups. No functional changes
sys/xen/interface/io/blkif.h:
o Insert space in "Red Hat".
o Fix typo "discard-aligment" -> "discard-alignment"
o Fix typo "unamp" -> "unmap"
o Fix typo "formated" -> "formatted"
o Clarify the text for "params".
o Clarify the text for "sector-size".
o Clarify the text for "max-requests" in the backend section.
kib [Wed, 29 Feb 2012 15:15:36 +0000 (15:15 +0000)]
In null_reclaim(), assert that reclaimed vnode is fully constructed,
instead of accepting half-constructed vnode. Previous code cannot decide
what to do with such vnode anyway, and although processing it for hash
removal, paniced later when getting rid of nullfs reference on lowervp.
While there, remove initializations from the declaration block.
kib [Wed, 29 Feb 2012 15:09:20 +0000 (15:09 +0000)]
Always request exclusive lock for the lower vnode in nullfs_vget().
The null_nodeget() requires exclusive lock on lowervp to be able to
insmntque() new vnode.
kib [Wed, 29 Feb 2012 15:06:00 +0000 (15:06 +0000)]
Move the code to destroy half-contructed nullfs vnode into helper
function null_destroy_proto() from null_insmntque_dtr(). Also
apply null_destroy_proto() in null_nodeget() when we raced and a vnode
is found in the hash, so the currently allocated protonode shall be
destroyed.
Lock the vnode interlock around reassigning the v_vnlock.
In fact, this path will not be exercised after several later commits,
since null_nodeget() cannot take shared-locked lowervp at all due to
insmntque() requirements.
gonzo [Wed, 29 Feb 2012 05:48:29 +0000 (05:48 +0000)]
Revert part of old logic of assigning MAC addressess:
- Reserver respective number of addresses for managment port
- octm uses base address directly
- other drivers get MACs on "first come first served" basis
alc [Wed, 29 Feb 2012 05:41:29 +0000 (05:41 +0000)]
Simplify kmem_alloc() by eliminating code that existed on account of
external pagers in Mach. FreeBSD doesn't implement external pagers.
Moreover, it don't pageout the kernel object. So, the reasons for
having code don't hold.
dim [Tue, 28 Feb 2012 21:45:21 +0000 (21:45 +0000)]
Change definition of pipe_chmod() from K&R to C99, to avoid the
following clang warning:
sys/kern/sys_pipe.c:1556:10: error: promoted type 'int' of K&R function parameter is not compatible with the parameter type 'mode_t'
(aka 'unsigned short') declared in a previous prototype [-Werror]
mode_t mode;
^
sys/kern/sys_pipe.c:155:19: note: previous declaration is here
static fo_chmod_t pipe_chmod;
^
emaste [Tue, 28 Feb 2012 19:42:40 +0000 (19:42 +0000)]
Workaround for PCIe 4GB boundary issue
Enforce a boundary of no more than 4GB - transfers crossing a 4GB
boundary can lead to data corruption due to PCIe limitations. This
change is a less-intrusive workaround that can be quickly merged back
to older branches; a cleaner implementation will arrive in HEAD later
but may require KPI changes.
tijl [Tue, 28 Feb 2012 19:39:54 +0000 (19:39 +0000)]
Copy amd64 endian.h to x86 and merge with i386 endian.h. Replace
amd64/i386/pc98 endian.h with stubs.
In __bswap64_const(x) the conflict between 0xffUL and 0xffULL has been
resolved by reimplementing the macro in terms of __bswap32(x). As a side
effect __bswap64_var(x) is now implemented using two bswap instructions on
i386 and should be much faster. __bswap32_const(x) has been reimplemented
in terms of __bswap16(x) for consistency.
dim [Tue, 28 Feb 2012 18:30:18 +0000 (18:30 +0000)]
Define several extra macros in bsd.sys.mk and sys/conf/kern.pre.mk, to
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.
Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang (disables -Werror)
- NO_WCAST_ALIGN.clang (disables -Wcast-align)
- NO_WFORMAT.clang (disables -Wformat and friends)
- CLANG_NO_IAS (disables integrated assembler)
- CLANG_OPT_SMALL (adds flags for extra small size optimizations)
As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf! For clang, use the following:
yongari [Tue, 28 Feb 2012 05:23:29 +0000 (05:23 +0000)]
Prefer RL_GMEDIASTAT register to RGEPHY_MII_SSR register to
extract a link status of PHY when parent driver is re(4).
RGEPHY_MII_SSR register does not seem to report correct PHY status
on some integrated PHYs used with re(4).
Unfortunately, RealTek PHYs have no additional information to
differentiate integrated PHYs from external ones so relying on PHY
model number is not enough to know that. However, it seems
RGEPHY_MII_SSR register exists for external RealTek PHYs so
checking parent driver would be good indication to know which PHY
was used. In other words, for non-re(4) controllers, the PHY is
external one and its revision number is greater than or equal to 2.
This change fixes intermittent link UP/DOWN messages reported on
RTL8169 controller.
Also, mii_attach(9) is tried after setting interface name since
rgephy(4) have to know parent driver name.
kib [Mon, 27 Feb 2012 21:10:10 +0000 (21:10 +0000)]
Currently, the debugger attached to the process executing vfork() does
not get syscall exit notification until the child performed exec of
exit. Swap the order of doing ptracestop() and waiting for P_PPWAIT
clearing, by postponing the wait into syscallret after ptracestop()
notification is done.
kib [Mon, 27 Feb 2012 20:52:20 +0000 (20:52 +0000)]
Fix a race in top non-interactive mode. Use plain sleep(3) call instead
of arming timer and then pausing. If SIGALRM is delivered before pause(3)
is entered, top hangs.
luigi [Mon, 27 Feb 2012 19:05:01 +0000 (19:05 +0000)]
A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements
before merging.
jhb [Mon, 27 Feb 2012 17:33:16 +0000 (17:33 +0000)]
- Panic up front if a kernel does not include 'device atpic' and an
APIC is not found.
- Don't panic if lapic_enable_cmc() is called and the APIC is not enabled.
This can happen due to booting a kernel with APIC disabled on a CPU that
supports CMCI.
- Wrap a long line.
jhb [Mon, 27 Feb 2012 16:08:18 +0000 (16:08 +0000)]
Clear the a device's description string anytime it's driver changes.
Descriptions are specific to drivers and we don't change drivers on attached
devices. This fixes a few places where we were not clearing the description
when detaching a driver (e.g. with device_attach() failed). While here, fix
a few other nits:
- Remove spurious call to remove a device's driver from
devclass_driver_deleted(). device_detach() removes it already.
- Fix a typo.
mav [Mon, 27 Feb 2012 10:31:54 +0000 (10:31 +0000)]
Rework CPU load balancing in SCHED_ULE:
- In sched_pickcpu() be more careful taking previous CPU on SMT systems.
Do it only if all other logical CPUs of that physical one are idle to avoid
extra resource sharing.
- In sched_pickcpu() change general logic of CPU selection. First
look for idle CPU, sharing last level cache with previously used one,
skipping SMT CPU groups. If none found, search all CPUs for the least loaded
one, where the thread with its priority can run now. If none found, search
just for the least loaded CPU.
- Make cpu_search() compare lowest/highest CPU load when comparing CPU
groups with equal load. That allows to differentiate 1+1 and 2+0 loads.
- Make cpu_search() to prefer specified (previous) CPU or group if load
is equal. This improves cache affinity for more complicated topologies.
- Randomize CPU selection if above factors are equal. Previous code tend
to prefer CPUs with lower IDs, causing unneeded collisions.
- Rework periodic balancer in sched_balance_group(). With cpu_search()
more intelligent now, make balansing process flat, removing recursion
over the topology tree. That fixes double swap problem and makes load
distribution more even and predictable.
All together this gives 10-15% performance improvement in many tests on
CPUs with SMT, such as Core i7, for number of threads is less then number
of logical CPUs. In some tests it also gives positive effect to systems
without SMT.
Reviewed by: jeff
Tested by: flo, hackers@
MFC after: 1 month
Sponsored by: iXsystems, Inc.
delphij [Mon, 27 Feb 2012 05:49:00 +0000 (05:49 +0000)]
Drop setuid status while doing file operations to prevent potential
information leak. This changeset is intended to be a minimal one
to make backports easier.
phk [Sun, 26 Feb 2012 20:56:49 +0000 (20:56 +0000)]
Also call the low-level driver if ->c_iflag & (IXON|IXOFF|IXANY) changes.
Uftdi(4) examines (c_iflag & (IXON|IXOFF)) to control hw XON-XOFF support.
This is obviously no good, if changes to those bits are not communicated
down the stack.
kib [Sun, 26 Feb 2012 13:55:43 +0000 (13:55 +0000)]
Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.
Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks
adrian [Sun, 26 Feb 2012 06:04:44 +0000 (06:04 +0000)]
Add in some debugging code to check whether the current rate table has
been bait-and-switched from the rate control code.
This will avoid the panic that I saw and will avoid sending invalid rates
(eg 11a/11g OFDM rates when in 11b, on 11b-only NICs (AR5211)) where the
rate table is not "big".
It also will point out situations where this occurs for the 11n NICs
which will have sufficiently large rate tables that "invalid rix" doesn't
occur.
I'll try to follow this up with a commit that adds a current operating mode
check. The "rix" is only relevant to the current operating mode and rate
table.
adrian [Sat, 25 Feb 2012 19:12:54 +0000 (19:12 +0000)]
Attempt to further fix some of the concurrency/reset issues that occur.
* ath_reset() is being called in softclock context, which may have the
thing sleep on a lock. To avoid this, since we really _shouldn't_
be sleeping on any locks, break out the no-loss reset path into a tasklet
and call that from:
+ ath_calibrate()
+ ath_watchdog()
This has the added advantage that it'll end up also doing the frame
RX cleanup from within the taskqueue context, rather than the softclock
context.
* Shuffle around the taskqueue_block() call to be before we grab the lock
and disable interrupts.
The trouble here is that taskqueue_block() doesn't block currently
queued (but not yet running) tasks so calling it doesn't guarantee
no further tasks (that weren't running on _A_ CPU at the time of this
call) will complete. Calling taskqueue_drain() on these tasks won't
work because if any _other_ thread calls taskqueue_enqueue() for whatever
reason, everything gets very angry and stops working.
This slightly changes the race condition enough to let ath_rx_tasklet()
run before we try disabling it, and thus quietens the warnings a bit.
The (more) true solution will be doing something like the following:
* having a taskqueue_blocked mask in ath_softc;
* having an interrupt_blocked mask in ath_softc;
* only calling taskqueue_drain() on each individual task _after_ the
lock has been acquired - that way no further tasklet scheduling
is going to occur.
* Then once the tasks have been blocked _and_ the interrupt has been
disabled, call taskqueue_drain() on each, ensuring that anything
that _was_ scheduled or running is removed.
The trouble is if something calls taskqueue_enqueue() on a task
after taskqueue_blocked() has been called but BEFORE taskqueue_drain()
has been called, ta_pending will be set to 1 and taskqueue_drain()
will sit there stuck in msleep() until you hard-kill the machine.
alc [Sat, 25 Feb 2012 17:49:59 +0000 (17:49 +0000)]
Simplify vmspace_fork()'s control flow by copying immutable data before
the vm map locks are acquired. Also, eliminate redundant initialization
of the new vm map's timestamp.
mm [Sat, 25 Feb 2012 10:58:02 +0000 (10:58 +0000)]
Update libarchive to 3.0.3
Some of new features:
- New readers: RAR, LHA/LZH, CAB reader, 7-Zip
- New writers: ISO9660, XAR
- Improvements to many formats, especially including ISO9660 and Zip
- Stackable write filters to write, e.g., tar.gz.uu in a single pass
- Exploit seekable input; new "seekable" Zip reader can exploit the Zip
Central Directory when it's available; the old "streamable" Zip reader
is still fully supported for cases where seeking is not possible.
Full release notes available at:
https://github.com/libarchive/libarchive/wiki/ReleaseNotes
trociny [Sat, 25 Feb 2012 10:15:41 +0000 (10:15 +0000)]
When detaching an unix domain socket, uipc_detach() checks
unp->unp_vnode pointer to detect if there is a vnode associated with
(binded to) this socket and does necessary cleanup if there is.
The issue is that after forced unmount this check may be too late as
the unp_vnode is reclaimed and the reference is stale.
To fix this provide a helper function that is called on a socket vnode
reclamation to do necessary cleanup.