delphij [Tue, 7 Jan 2014 20:04:41 +0000 (20:04 +0000)]
MFC r260403 (MFV r260399):
Apply vendor commits:
197e0ea Fix for TLS record tampering bug. (CVE-2013-4353). 3462896 For DTLS we might need to retransmit messages from the
previous session so keep a copy of write context in DTLS
retransmission buffers instead of replacing it after
sending CCS. (CVE-2013-6450). ca98926 When deciding whether to use TLS 1.2 PRF and record hash
algorithms use the version number in the corresponding
SSL_METHOD structure instead of the SSL structure. The
SSL structure version is sometimes inaccurate.
Note: OpenSSL 1.0.2 and later effectively do this already.
(CVE-2013-6449).
pjd [Tue, 7 Jan 2014 19:46:17 +0000 (19:46 +0000)]
MFC r260290:
Bring back the old size of the kinfo_file structure to preserve ABI.
Keep only one uint64_t spare for further cap_rights_t expension.
Add a comment clarifying that if the size of this structure changes,
a new sysctl MIB has to be allocate for it and the old structure has
to be returned by the old sysctl MIB.
mjg [Tue, 7 Jan 2014 19:28:10 +0000 (19:28 +0000)]
MFC r260232:
Don't check for fd limits in fdgrowtable_exp.
Callers do that already and additional check races with process
decreasing limits and can result in not growing the table at all, which
is currently not handled.
scottl [Tue, 7 Jan 2014 01:51:48 +0000 (01:51 +0000)]
MFC Alexander Motin's direct dispatch, multi-queue, and finer-grained
locking support for CAM
r256826:
Fix several target mode SIMs to not blindly clear ccb_h.flags field of
ATIO CCBs. Not all CCB flags there belong to them.
r256836:
Remove hard limit on number of BIOs handled with one ATA TRIM request.
r256843:
Merge CAM locking changes from the projects/camlock branch to radically
reduce lock congestion and improve SMP scalability of the SCSI/ATA stack,
preparing the ground for the coming next GEOM direct dispatch support.
r256888:
Unconditionally acquire periph reference on CCB allocation failure.
r256895:
Fix memory and references leak due to unfreed path.
r256960:
Move CAM_UNQUEUED_INDEX setting to the last moment and under the periph lock.
This fixes race condition with cam_periph_ccbwait(), causing use-after-free.
r256975:
Minor (mostly cosmetical) addition to r256960.
r257054:
Some microoptimizations for da and ada drivers:
- Replace ordered_tag_count counter with single flag;
- From da remove outstanding_cmds counter, duplicating pending_ccbs list;
- From da_softc remove unused links field.
r257482:
Fix lock recursion, triggered by `smartctl -a /dev/adaX`.
r257501:
Make getenv_*() functions and respectively TUNABLE_*_FETCH() macros not
allocate memory and so not require sleepable environment. getenv() has
already used on-stack temporary storage, so just use it more rationally.
getenv_string() receives buffer as argument, so don't need another one.
r257914:
Some CAM locks polishing:
- Fix LOR and possible lock recursion when handling high-power commands.
Introduce new lock to protect left power quota and list of frozen devices.
- Correct locking around xpt periph creation.
- Remove seems never used XPT_FLAG_OPEN xpt periph flag.
Again, Netflix assisted with testing the merge, but all of the credit goes
to Alexander and iX Systems.
scottl [Tue, 7 Jan 2014 01:32:23 +0000 (01:32 +0000)]
MFC Alexander Motin's GEOM direct dispatch work:
r256603:
Introduce new function devstat_end_transaction_bio_bt(), adding new argument
to specify present time. Use this function to move binuptime() out of lock,
substantially reducing lock congestion when slow timecounter is used.
r256606:
Move g_io_deliver() out of the lock, as required for direct dispatch.
Move g_destroy_bio() out too to reduce lock scope even more.
r256607:
Fix passing uninitialized bio_resid argument to g_trace().
r256610:
Add unmapped I/O support to GEOM RAID.
r256830:
Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping
temporary mapped buffer. That fixes double unmap if biodone() called twice
for the same BIO (but with different done methods).
r256880:
Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests
to GEOM g_up/g_down thread, executing them directly in the caller context.
That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid
several context switches per I/O.
r259247:
Fix bug introduced at r256607. We have to recalculate bp_resid here since
sizes of original and completed requests may differ due to end of media.
Testing of the stable/10 merge was done by Netflix, but all of the credit
goes to Alexander and iX Systems.
mav [Sun, 5 Jan 2014 23:00:38 +0000 (23:00 +0000)]
MFC r256614:
- Take BIO lock in biodone() only when there is no completion callback set
and so we should wake up thread waiting in biowait().
- Remove msleep() timeout from biowait(). It was added 11 years ago, when
there was no locks used, and it should not be needed any more.
mav [Sun, 5 Jan 2014 22:51:09 +0000 (22:51 +0000)]
MFC r258164:
Handle case when ACPI reports HPET device, but does not provide memory
resource for it. In such case take the address range from the HPET table.
This fixes hpet(4) driver attach on Asrock C2750D4I board.
mav [Sun, 5 Jan 2014 22:48:12 +0000 (22:48 +0000)]
MFC r257932:
Use relaxed (write-only) memory barriers when writing some of queue index
registers (for now on ISP2400+). We never read those registers back and
AFAIK their semantics does not require any immediate reaction on write.
mav [Sun, 5 Jan 2014 22:47:12 +0000 (22:47 +0000)]
MFC r257930:
Some more registers access optimizations:
- Process ATIO queue only if interrupt status tells so;
- Do not update queue out pointers after each processed command, do it
only once at the end of the loop.
mav [Sun, 5 Jan 2014 22:45:46 +0000 (22:45 +0000)]
MFC r257916:
Save one more register read per command by not reading rqstoutrp register
every time. The purpose of that register is unlikely output queue overflow
detection, so read it only when its last known (and probably stale now)
value signals overflow.
mav [Sun, 5 Jan 2014 22:38:44 +0000 (22:38 +0000)]
MFC r256705:
Optimize isp(4) to reduce CPU usage, especially in target mode:
- Remove two excessive and slow register reads from isp_intr(). Instead
of rereading value every time, assume that registers contain what we have
written there.
- Avoid sequential search through 4096 array elements when looking for
command tag. Use hash of lists to store active tags separately from free
ones and so greatly speedup the searches.
mav [Sun, 5 Jan 2014 22:14:12 +0000 (22:14 +0000)]
MFC r259168:
Don't even try to read vdev labels from devices smaller then SPA_MINDEVSIZE
(64MB). Even if we would find one somehow, ZFS kernel code rejects such
devices. It is funny to look on attempts to read 4 256K vdev labels from
1.44MB floppy, though it is not very practical and quite slow.
mav [Sun, 5 Jan 2014 22:12:45 +0000 (22:12 +0000)]
MFC r258342:
Reenable vfs.zfs.zio.use_uma for amd64, disabled at r209261.
On machines with seveal CPUs and enough RAM this can easily twice improve
ZFS performance or twice reduce CPU usage. It was disabled three years
ago due to memory and KVA exhaustion reports, but our VM subsystem got
improved a lot since that time, hopefully enough to make another try.
dim [Sun, 5 Jan 2014 15:39:37 +0000 (15:39 +0000)]
Revert MFC of r260102 for now, until I can merge the required fix from
head. This should fix building modules which require -fms-extensions to
compile them with gcc.
glebius [Sun, 5 Jan 2014 13:55:33 +0000 (13:55 +0000)]
Merge r260188 from head:
Fix regression from r249894. Now we pass "gw" as argument to if_output
method, thus for multicast case we need it to point at "dst".
mav [Sat, 4 Jan 2014 23:42:24 +0000 (23:42 +0000)]
MFC r258693:
Make UMA to not blindly force offpage slab header allocation for large
(> PAGE_SIZE) zones. If zone is not multiple to PAGE_SIZE, there may
be enough space for the header at the last page, so we may avoid extra
header memory allocation and hash table update/lookup.
ZFS creates bunch of odd-sized UMA zones (5120, 6144, 7168, 10240, 14336).
This change gives good use to at least some of otherwise lost memory there.
mav [Sat, 4 Jan 2014 23:40:47 +0000 (23:40 +0000)]
MFC r258691:
Don't count bucket allocation failures for UMA zones as their own failures.
There are good reasons for this to happen, such as recursion prevention, etc.
and they are not fatal since buckets are just an optimization mechanism.
Real bucket allocation failures are any way counted by the bucket zones
themselves, and we don't need double accounting there.
mav [Sat, 4 Jan 2014 23:39:39 +0000 (23:39 +0000)]
MFC r258340, r258497:
Implement mechanism to safely but slowly purge UMA per-CPU caches.
This is a last resort for very low memory condition in case other measures
to free memory were ineffective. Sequentially cycle through all CPUs and
extract per-CPU cache buckets into zone cache from where they can be freed.
mav [Sat, 4 Jan 2014 23:38:06 +0000 (23:38 +0000)]
MFC r258338:
Grow UMA zone bucket size also on lock congestion during item free.
Lock congestion is the same, whether it happens on alloc or free, so
handle it equally. Now that we have back pressure, there is no problem
to grow buckets a bit faster. Any way growth is much slower then in 9.x.
mav [Sat, 4 Jan 2014 23:37:01 +0000 (23:37 +0000)]
MFC r258337:
Add two new UMA bucket zones to store 3 and 9 items per bucket.
These new buckets make bucket size self-tuning more soft and precise.
Without them there are buckets for 1, 5, 13, 29, ... items. While at
bigger sizes difference about 2x is fine, at smallest ones it is 5x and
2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13,
29, reducing jumps between steps, making algorithm work softer, allocating
and freeing memory in better fitting chunks. Otherwise there is quite a
big gap between allocating 128K and 5x128K of RAM at once.
mav [Sat, 4 Jan 2014 23:35:34 +0000 (23:35 +0000)]
MFC r258336:
Implement soft pressure on UMA cache bucket sizes.
Every time system detects low memory condition decrease bucket sizes for
each zone by one item. As result, higher memory pressure will push to
smaller bucket sizes and so smaller per-CPU caches and so more efficient
memory use.
Before this change there was no force to oppose buckets growth as result
of practically inevitable zone lock conflicts, and after some run time
per-CPU caches could consume enough RAM to kill the system.
mav [Sat, 4 Jan 2014 23:31:34 +0000 (23:31 +0000)]
MFC r259232:
Create own free list for each of the first 32 possible allocation sizes.
In case of 4K allocation quantum that means for allocations up to 128K.
With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items). Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive. Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.
While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.
If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
dim [Sat, 4 Jan 2014 22:00:07 +0000 (22:00 +0000)]
MFC r260095:
For sys/boot/i386 and sys/boot/pc98, separate flags to be passed
directly to the linker (LD_FLAGS) from flags passed indirectly, via the
compiler driver (LDFLAGS).
This is because several Makefiles under sys/boot/i386 and sys/boot/pc98
use ${LD} directly to link, and the normal LDFLAGS value should not be
used in these cases.
glebius [Sat, 4 Jan 2014 19:51:57 +0000 (19:51 +0000)]
Merge r258690 by mav from head:
Fix bug introduced at r252226, when udata argument passed to bucket_alloc()
was used without making sure first that it was really passed for us.
On some of my systems this bug made user argument passed by ZFS code to
uma_zalloc_arg() unexpectedly block UMA per-CPU caches for those zones.
dim [Sat, 4 Jan 2014 18:53:31 +0000 (18:53 +0000)]
MFC r260040:
In sys/dev/mcd/mcd.c, mark the static const COPYRIGHT string as __used,
so it ends up in the object file, and no warnings are emitted about it
being actually unused.
dim [Sat, 4 Jan 2014 17:54:06 +0000 (17:54 +0000)]
MFC r260020:
For sys/dev/drm2/radeon, only use -fms-extensions with gcc. This flag
is only to stop gcc complaining about anonymous unions, which clang does
not do. For clang 3.4 however, -fms-extensions enables the Microsoft
__wchar_t type, which clashes with our own types.h.
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
dim [Sat, 4 Jan 2014 17:22:53 +0000 (17:22 +0000)]
MFC r260015:
In libc++'s type_traits header, avoid warnings (activated by our use of
-Wsystem-headers) about potential keyword compatibility problems, by
adding a __libcpp prefix to the applicable identifiers.
Upstream is still debating about this, but we need it now, to be able to
import clang 3.4.
glebius [Fri, 3 Jan 2014 12:28:33 +0000 (12:28 +0000)]
Merge r259681 from head:
Changes:
- Reinit uio_resid and flags before every call to soreceive().
- Set maximum acceptable size of packet to IP_MAXPACKET. As for now the
module doesn't support INET6.
- Properly handle MSG_TRUNC return from soreceive().
pluknet [Thu, 2 Jan 2014 16:37:23 +0000 (16:37 +0000)]
MFC r259872:
The compile time constant limit on number of swap devices was removed in 5.2.
As such, remove the EINVAL error saying so. Currently the vm.nswapdev sysctl
just represents the number of added swap devices.
scottl [Thu, 2 Jan 2014 01:51:54 +0000 (01:51 +0000)]
MFC r260070
Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.
scottl [Thu, 2 Jan 2014 01:44:14 +0000 (01:44 +0000)]
MFC r260068, r260069, r260076
Add the -R option to allow fsck_ffs to restart itself when too many critical
errors have been detected in a particular run.
Clean up the global state variables so that a restart can happen correctly.
Separate the global variables in fsck_ffs and fsdb to their own file. This
fixes header sharing with fscd.
Correctly initialize, static-ize, and remove global variables as needed in
dir.c. This fixes a problem with lost+found directories that was causing
a segfault.
Correctly initialize, static-ize, and remove global variables as needed in
suj.c.
Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode
jilles [Wed, 1 Jan 2014 20:22:29 +0000 (20:22 +0000)]
MFC r258281: Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.
Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.
This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).
rmacklem [Wed, 1 Jan 2014 02:49:45 +0000 (02:49 +0000)]
MFC: r259854
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.
rmacklem [Tue, 31 Dec 2013 22:00:25 +0000 (22:00 +0000)]
MFC: r259845
An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.
rmacklem [Tue, 31 Dec 2013 21:56:02 +0000 (21:56 +0000)]
MFC: r259801
The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.
rmacklem [Mon, 30 Dec 2013 21:24:41 +0000 (21:24 +0000)]
MFC: r259771
The NFSv4.1 client didn't return NFSv4.1 specific error codes
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.
rmacklem [Mon, 30 Dec 2013 21:17:20 +0000 (21:17 +0000)]
MFC: r259084
For software builds, the NFS client does many small
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.
dim [Mon, 30 Dec 2013 20:27:58 +0000 (20:27 +0000)]
MFC r259902:
In sys/dev/drm/mach64_dma.c, remove static function mach64_set_dma_eol(),
which has never been used, even by upstream, since its initial upstream
commit (see http://cgit.freedesktop.org/mesa/drm/commit/?id=873e1c4d )
dim [Mon, 30 Dec 2013 20:15:46 +0000 (20:15 +0000)]
MFC r257532 (by adrian):
Fix this build for clang.
MFC r259730:
To avoid having to explicitly test COMPILER_TYPE for setting
clang-specific or gcc-specific flags, introduce the following new
variables for use in Makefiles:
jmmv [Mon, 30 Dec 2013 14:09:04 +0000 (14:09 +0000)]
Fix 'make check-old' warnings when WITHOUT_TESTS is set.
This is a MFC of r258025 and r257940, both of which resolve issues with
dynamically setting the list of obsolete files based on the contents
of /usr/tests.
mckusick [Mon, 30 Dec 2013 05:22:22 +0000 (05:22 +0000)]
MFC of 256801, 256803, 256808, 256812, 256817, 256845, and 256860.
This set of changes puts in place the infrastructure to allow soft
updates to be multi-threaded. It introduces no functional changes
from its current operation.
MFC of 256860:
Allow kernels without options SOFTUPDATES to build. This should fix the
embedded tinderboxes.
Reviewed by: emaste
MFC of 256845:
Fix build problem on ARM (which defaults to building without soft updates).
Reported by: Tinderbox
Sponsored by: Netflix
MFC of 256817:
Restructuring of the soft updates code to set it up so that the
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256812:
Fourth of several cleanups to soft dependency implementation.
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256808:
Third of several cleanups to soft dependency implementation.
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256803:
Second of several cleanups to soft dependency implementation.
Delete two unused functions in ffs_sofdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256801:
First of several cleanups to soft dependency implementation.
Convert three functions exported from ffs_softdep.c to static
functions as they are not used outside of ffs_softdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
mckusick [Sun, 29 Dec 2013 07:26:48 +0000 (07:26 +0000)]
MFC of 258789:
We needlessly panic when trying to flush MKDIR_PARENT dependencies.
We had previously tried to flush all MKDIR_PARENT dependencies (and
all the NEWBLOCK pagedeps) by calling ffs_update(). However this will
only resolve these dependencies in direct blocks. So very large
directories with MKDIR_PARENT dependencies in indirect blocks had
not yet gotten flushed. As the directory is in the midst of doing a
complete sync, we simply defer the checking of the MKDIR_PARENT
dependencies until the indirect blocks have been sync'ed.