mav [Thu, 23 Jan 2014 00:42:08 +0000 (00:42 +0000)]
MFC r259765:
Fix RPC server threads file handle affinity to work better with ZFS.
Instead of taking 8 specific bytes of file handle to identify file during
RPC thread affitinity handling, use trivial hash of the full file handle.
ZFS's struct zfid_short does not have padding field after the length field,
as result, originally picked 8 bytes are loosing lower 16 bits of object ID,
causing many false matches and unneeded requests affinity to same thread.
This fix substantially improves NFS server latency and scalability in SPEC
NFS benchmark by more flexible use of multiple NFS threads.
mav [Thu, 23 Jan 2014 00:41:23 +0000 (00:41 +0000)]
MFC r259659, r259662:
Remove several linear list traversals per request from RPC server code.
Do not insert active ports into pool->sp_active list if they are success-
fully assigned to some thread. This makes that list include only ports that
really require attention, and so traversal can be reduced to simple taking
the first one.
Remove idle thread from pool->sp_idlethreads list when assigning some
work (port of requests) to it. That again makes possible to replace list
traversals with simple taking the first element.
mav [Thu, 23 Jan 2014 00:40:28 +0000 (00:40 +0000)]
MFC r259632:
Rework flow control for connection-oriented (TCP) RPC server.
When processing receive buffer, write the amount of data, expected
in present request record, into socket's so_rcv.sb_lowat to make stack
aware about our needs. When processing following upcalls, ignore them
until socket collect enough data to be read and processed in one turn.
This change reduces number of context switches and other operations
in RPC stack during large NFS writes (especially via non-Jumbo networks)
by order of magnitude.
After precessing current packet, take another look into the pending
buffer to find out whether the next packet had been already received.
If not, deactivate this port right there without making RPC code to
push this port to another thread just to find that there is nothing.
If the next packet is received partially, also deactivate the port, but
also update socket's so_rcv.sb_lowat to not be woken up prematurely.
This change additionally reduces number of context switches per NFS
request about in half.
mav [Thu, 23 Jan 2014 00:35:41 +0000 (00:35 +0000)]
MFC r244008 (by rmacklem):
Add support for backchannels to the kernel RPC. Backchannels
are used by NFSv4.1 for callbacks. A backchannel is a connection
established by the client, but used for RPCs done by the server
on the client (callbacks). As a result, this patch mixes some
client side calls in the server side and vice versa. Some
definitions in the .c files were extracted out into a file called
krpc.h, so that they could be included in multiple .c files.
This code has been in projects/nfsv4.1-client for some time.
Although no one has given it a formal review, I believe kib@
has taken a look at it.
mav [Thu, 23 Jan 2014 00:26:24 +0000 (00:26 +0000)]
MFC r258132:
Some minor tuning to rpc/svc.c:
- close cosmetic race in svc_exit();
- do not set wait timeout for idle threads if we have no use for wakeups;
- create new requested thread sooner, not only after some another thread
wakeup, that may happen later under constant load.
mav [Tue, 21 Jan 2014 00:31:31 +0000 (00:31 +0000)]
MFC r257054:
Some microoptimizations for da and ada drivers:
- Replace ordered_tag_count counter with single flag;
- From da remove outstanding_cmds counter, duplicating pending_ccbs list;
- From da_softc remove unused links field.
mav [Tue, 21 Jan 2014 00:14:06 +0000 (00:14 +0000)]
MFC r254970 (by ken):
If a drive returns ASC/ASCQ 0x04,0x11 "Logical unit not ready,
notify (enable spinup) required", instead of doing the normal
retries, poll for a change in status.
We will poll every half second for a minute for the status to
change.
Hitachi drives (and likely other SAS drives) return that ASC/ASCQ
when they are waiting to spin up. What it means is that they are
waiting for the SAS expander to send them the SAS
NOTIFY (ENABLE SPINUP) primitive.
That primitive is the mechanism expanders/enclosures use to
sequence drive spinup to avoid overloading power supplies.
The first two were always GPL2. The last two were
added after the GPL3 transition, but were written
by aaw@google.com and Rafael EspĂndola got permission
to relicense them under the GPL2 for inclusion in
llvm-gcc.
pfg [Mon, 20 Jan 2014 19:38:16 +0000 (19:38 +0000)]
MFC r260014, r260099:
gcc: Add support for label attributes and "unavailable" attribute.
Apple GCC has extensions to support for both label attributes and
an "unavailable" attribute. These are critical for objc but are
also useful in regular C/C++.
Improve error message shown to the user when trying to load a module that is
already loaded or compiled withing the kernel
Point the user to dmesg(1) to get informations about why loading a module did fail
instead of printing the cryptic "Exec format error"
Update the BUGS section of kld(4) according the recent changes in kldload(8)
gjb [Sun, 19 Jan 2014 19:49:24 +0000 (19:49 +0000)]
MFC r259729:
Bootstrap etcupdate(8) as part of the release build, similar
to what is done for mergemaster(8). This allows etcupdate(8)
to work out-of-box after the first upgrade of a system.
mav [Sat, 18 Jan 2014 22:52:43 +0000 (22:52 +0000)]
MFC r245848 (by jhb):
Always update the hw.uart.console hint anytime a change is made to the
comconsole setup. Previously the hint would be set when if you set a
custom port, but it would not be updated if you later set a custom speed.
Also, leave the hw.uart.console hint mutable so it can be overridden or
unset by the user if needed.
jhb [Tue, 14 Jan 2014 21:20:51 +0000 (21:20 +0000)]
MFC 238423,238426,238428,258063,258063,258066,258097,258185,259134:
The etcupdate utility is a tool for managing updates to files that are
not updated as part of `make installworld' such as files in /etc. It
manages updates by doing a three-way merge of changes made to these files
against the local versions. It is also designed to minimize the amount
of user intervention with the goal of simplifying upgrades for clusters
of machines.
delphij [Tue, 14 Jan 2014 19:27:42 +0000 (19:27 +0000)]
On stable/8 and stable/9, disable hardware random number generators
by default. This is a direct commit to stable/ branches because
HEAD and stable/10 have superior implementation of random device.
pfg [Tue, 14 Jan 2014 15:21:19 +0000 (15:21 +0000)]
MFC r260545:
ext2fs: fix inode flag conversion.
After r252890 we are naively attempting to pass through the
inode flags. This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.
Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.
Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.
pfg [Sat, 11 Jan 2014 01:45:20 +0000 (01:45 +0000)]
MFC r260361:
gcc: Fix optimization bug.
GCC-PR rtl-optimization/34628
* combine.c (try_combine): Stop and undo after the first combination
if an autoincrement side-effect on the first insn has effectively
been lost.
asomers [Fri, 10 Jan 2014 18:14:15 +0000 (18:14 +0000)]
MFC 259339
sbin/devd/devd.cc
Increase the size of devd's client socket's send buffer from the
default (8k) to 128k. This prevents clients from getting
POLLHUPped during event storms. For example, during zpool creation,
the kernel emits a resource.fs.zfs.statechange event for every vdev
in the pool. A 128k buffer is large enough to hold the statechange
events for a pool with nearly 800 drives.
asomers [Fri, 10 Jan 2014 17:40:29 +0000 (17:40 +0000)]
MFC 259240
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
When a da or ada device dissappears, outstanding IOs fail with
ENXIO, not EIO. The check for EIO was probably copied from Illumos,
where that is indeed the correct errno.
Without this change, pulling a busy drive from a zpool would usually
turn it into UNAVAIL, even though pulling an idle drive would turn
it into REMOVED. With this change, it is REMOVED every time.
Also, vdev_geom_io_intr shouldn't do zfs_post_remove, because that
results in devd getting two resource.fs.zfs.removed events. The
comment said that the event had to be sent directly instead of
through the async removal thread because "the DE engine is using
this information to discard prevoius I/O errors". However, the fact
that vdev_geom_io_intr was never actually sending the events until
now, and that vdev_geom_orphan never sent them at all, and that
vdev_geom_orphan usually gets called about 2 seconds after the
actual removal, means that FreeBSD's userland can cope with a late
event just fine.
ae [Fri, 10 Jan 2014 12:22:49 +0000 (12:22 +0000)]
MFC r260151 (by adrian):
Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().
This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.
MFC r260187:
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.
MFC r260217:
Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.
ae [Fri, 10 Jan 2014 12:09:38 +0000 (12:09 +0000)]
MFC r259634:
Prevent users from deactivating the last component of a mirror.
MFC r259929:
Add an ability to stop gmirror and clear its metadata in one command.
This fixes the problem, when gmirror starts again just after stop.
The problem occurs when gmirror's component has geom label with equal size.
E.g. gpt and gptid have the same size as partition, diskid has the same
size as entire disk. When gmirror's geom has been destroyed, glabel
creates its providers and this initiate retaste.
Now "gmirror destroy" command is available. It destroys geom and also
erases gmirror's metadata.
dim [Thu, 9 Jan 2014 23:08:56 +0000 (23:08 +0000)]
MFC r260334:
Split the last gcc-specific flags off into CFLAGS.gcc. This also
removes the need to use -Qunused-arguments for clang throughout the
tree.
MFC r260369:
Apply band-aid for 32-bit compat libs failures after r260334: put back
-Qunused-arguments for clang for now, until I can figure out a way to
make it unneeded in all scenarios. Sorry about the breakage.
dim [Thu, 9 Jan 2014 22:40:51 +0000 (22:40 +0000)]
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
MFC r260322:
In addition to r260102, also define GCC_MS_EXTENSIONS in bsd.sys.mk,
since kernel module builds do not use kern.pre.mk.
mav [Thu, 9 Jan 2014 10:45:37 +0000 (10:45 +0000)]
MFC r259197:
Do not DELAY() for P-state transition unless we want to see the result.
Intel manual says: "If a transition is already in progress, transition to
a new value will subsequently take effect. Reads of IA32_PERF_CTL determine
the last targeted operating point." So seems it should be fine to just
trigger wanted transition and go. Linux does the same.
kib [Thu, 9 Jan 2014 03:33:12 +0000 (03:33 +0000)]
MFC r260205:
Update the description for pmap_remove_pages() to match the modern
times. Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.
cperciva [Wed, 8 Jan 2014 02:31:35 +0000 (02:31 +0000)]
MFC r258893, r258956:
Add a new sysctl / loader tunable kern.panic_reboot_wait_time which
defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config
setting). Use this now-variable value in place of the defined constant
to control how long the system waits after a panic before rebooting.
cperciva [Wed, 8 Jan 2014 02:30:24 +0000 (02:30 +0000)]
MFC r258894: Make rc(8) re-source rc.conf upon receipt of SIGALRM.
The rc system aggressively caches the contents of /etc/rc.conf in order to
improve boot performance; this produces arguably astonishing (non-)results
if /etc/rc.conf is modified during the boot process. This commit provides
a mechanism for explicitly requesting that rc.conf be reloaded.
dim [Sun, 5 Jan 2014 15:39:37 +0000 (15:39 +0000)]
Revert MFC of r260102 for now, until I can merge the required fix from
head. This should fix building modules which require -fms-extensions to
compile them with gcc.
luigi [Sun, 5 Jan 2014 11:58:07 +0000 (11:58 +0000)]
MFC revision 259907
use the correct netmap <-> nic slot mapping on the transmit ring for 'lem'.
This bug would manifest only in netmap mode and on packets transmitted after
a NIC reset while netmap mode is active.
dim [Sat, 4 Jan 2014 22:00:07 +0000 (22:00 +0000)]
MFC r260095:
For sys/boot/i386 and sys/boot/pc98, separate flags to be passed
directly to the linker (LD_FLAGS) from flags passed indirectly, via the
compiler driver (LDFLAGS).
This is because several Makefiles under sys/boot/i386 and sys/boot/pc98
use ${LD} directly to link, and the normal LDFLAGS value should not be
used in these cases.