mav [Sat, 4 Jan 2014 23:40:47 +0000 (23:40 +0000)]
MFC r258691:
Don't count bucket allocation failures for UMA zones as their own failures.
There are good reasons for this to happen, such as recursion prevention, etc.
and they are not fatal since buckets are just an optimization mechanism.
Real bucket allocation failures are any way counted by the bucket zones
themselves, and we don't need double accounting there.
mav [Sat, 4 Jan 2014 23:39:39 +0000 (23:39 +0000)]
MFC r258340, r258497:
Implement mechanism to safely but slowly purge UMA per-CPU caches.
This is a last resort for very low memory condition in case other measures
to free memory were ineffective. Sequentially cycle through all CPUs and
extract per-CPU cache buckets into zone cache from where they can be freed.
mav [Sat, 4 Jan 2014 23:38:06 +0000 (23:38 +0000)]
MFC r258338:
Grow UMA zone bucket size also on lock congestion during item free.
Lock congestion is the same, whether it happens on alloc or free, so
handle it equally. Now that we have back pressure, there is no problem
to grow buckets a bit faster. Any way growth is much slower then in 9.x.
mav [Sat, 4 Jan 2014 23:37:01 +0000 (23:37 +0000)]
MFC r258337:
Add two new UMA bucket zones to store 3 and 9 items per bucket.
These new buckets make bucket size self-tuning more soft and precise.
Without them there are buckets for 1, 5, 13, 29, ... items. While at
bigger sizes difference about 2x is fine, at smallest ones it is 5x and
2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13,
29, reducing jumps between steps, making algorithm work softer, allocating
and freeing memory in better fitting chunks. Otherwise there is quite a
big gap between allocating 128K and 5x128K of RAM at once.
mav [Sat, 4 Jan 2014 23:35:34 +0000 (23:35 +0000)]
MFC r258336:
Implement soft pressure on UMA cache bucket sizes.
Every time system detects low memory condition decrease bucket sizes for
each zone by one item. As result, higher memory pressure will push to
smaller bucket sizes and so smaller per-CPU caches and so more efficient
memory use.
Before this change there was no force to oppose buckets growth as result
of practically inevitable zone lock conflicts, and after some run time
per-CPU caches could consume enough RAM to kill the system.
mav [Sat, 4 Jan 2014 23:31:34 +0000 (23:31 +0000)]
MFC r259232:
Create own free list for each of the first 32 possible allocation sizes.
In case of 4K allocation quantum that means for allocations up to 128K.
With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items). Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive. Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.
While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.
If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
dim [Sat, 4 Jan 2014 22:00:07 +0000 (22:00 +0000)]
MFC r260095:
For sys/boot/i386 and sys/boot/pc98, separate flags to be passed
directly to the linker (LD_FLAGS) from flags passed indirectly, via the
compiler driver (LDFLAGS).
This is because several Makefiles under sys/boot/i386 and sys/boot/pc98
use ${LD} directly to link, and the normal LDFLAGS value should not be
used in these cases.
glebius [Sat, 4 Jan 2014 19:51:57 +0000 (19:51 +0000)]
Merge r258690 by mav from head:
Fix bug introduced at r252226, when udata argument passed to bucket_alloc()
was used without making sure first that it was really passed for us.
On some of my systems this bug made user argument passed by ZFS code to
uma_zalloc_arg() unexpectedly block UMA per-CPU caches for those zones.
dim [Sat, 4 Jan 2014 18:53:31 +0000 (18:53 +0000)]
MFC r260040:
In sys/dev/mcd/mcd.c, mark the static const COPYRIGHT string as __used,
so it ends up in the object file, and no warnings are emitted about it
being actually unused.
dim [Sat, 4 Jan 2014 17:54:06 +0000 (17:54 +0000)]
MFC r260020:
For sys/dev/drm2/radeon, only use -fms-extensions with gcc. This flag
is only to stop gcc complaining about anonymous unions, which clang does
not do. For clang 3.4 however, -fms-extensions enables the Microsoft
__wchar_t type, which clashes with our own types.h.
MFC r260102:
Similar to r260020, only use -fms-extensions with gcc, for all other
modules which require this flag to compile. Use a GCC_MS_EXTENSIONS
variable, defined in kern.pre.mk, which can be used to easily supply the
flag (or not), depending on the compiler type.
dim [Sat, 4 Jan 2014 17:22:53 +0000 (17:22 +0000)]
MFC r260015:
In libc++'s type_traits header, avoid warnings (activated by our use of
-Wsystem-headers) about potential keyword compatibility problems, by
adding a __libcpp prefix to the applicable identifiers.
Upstream is still debating about this, but we need it now, to be able to
import clang 3.4.
glebius [Fri, 3 Jan 2014 12:28:33 +0000 (12:28 +0000)]
Merge r259681 from head:
Changes:
- Reinit uio_resid and flags before every call to soreceive().
- Set maximum acceptable size of packet to IP_MAXPACKET. As for now the
module doesn't support INET6.
- Properly handle MSG_TRUNC return from soreceive().
pluknet [Thu, 2 Jan 2014 16:37:23 +0000 (16:37 +0000)]
MFC r259872:
The compile time constant limit on number of swap devices was removed in 5.2.
As such, remove the EINVAL error saying so. Currently the vm.nswapdev sysctl
just represents the number of added swap devices.
scottl [Thu, 2 Jan 2014 01:51:54 +0000 (01:51 +0000)]
MFC r260070
Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases. Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp. By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.
scottl [Thu, 2 Jan 2014 01:44:14 +0000 (01:44 +0000)]
MFC r260068, r260069, r260076
Add the -R option to allow fsck_ffs to restart itself when too many critical
errors have been detected in a particular run.
Clean up the global state variables so that a restart can happen correctly.
Separate the global variables in fsck_ffs and fsdb to their own file. This
fixes header sharing with fscd.
Correctly initialize, static-ize, and remove global variables as needed in
dir.c. This fixes a problem with lost+found directories that was causing
a segfault.
Correctly initialize, static-ize, and remove global variables as needed in
suj.c.
Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode
jilles [Wed, 1 Jan 2014 20:22:29 +0000 (20:22 +0000)]
MFC r258281: Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.
Per POSIX, si_status should contain the value passed to exit() for
si_code==CLD_EXITED and the signal number for other si_code. This was
incorrect for CLD_EXITED and CLD_DUMPED.
This is still not fully POSIX-compliant (Austin group issue #594 says that
the full value passed to exit() shall be returned via si_status, not just
the low 8 bits) but is sufficient for a si_status-related test in libnih
(upstart, Debian/kFreeBSD).
rmacklem [Wed, 1 Jan 2014 02:49:45 +0000 (02:49 +0000)]
MFC: r259854
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode
when a Getattr for a file is done by a client other than the one that
holds the file's delegation. This would only happen when delegations
are enabled and the problem is fixed by this patch.
rmacklem [Tue, 31 Dec 2013 22:00:25 +0000 (22:00 +0000)]
MFC: r259845
An intermittent problem with NFSv4 exporting of ZFS snapshots was
reported to the freebsd-fs mailing list. I believe the problem was
caused by the Readdir operation using VFS_VGET() for a snapshot file entry
instead of VOP_LOOKUP(). This would not occur for NFSv3, since it
will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning
of the directory, whereas NFSv4 does not check "." or "..". This
patch adds a call to VFS_VGET() for the directory being read to check
for ENOTSUPP.
I also observed that the mount_on_fileid and fsid attributes were
not correct at the snapshot's auto mountpoints when looking at packet
traces for the Readdir. This patch fixes the attributes by doing a check
for different v_mount structure, even if the vnode v_mountedhere is not
set.
rmacklem [Tue, 31 Dec 2013 21:56:02 +0000 (21:56 +0000)]
MFC: r259801
The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.
rmacklem [Mon, 30 Dec 2013 21:24:41 +0000 (21:24 +0000)]
MFC: r259771
The NFSv4.1 client didn't return NFSv4.1 specific error codes
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.
rmacklem [Mon, 30 Dec 2013 21:17:20 +0000 (21:17 +0000)]
MFC: r259084
For software builds, the NFS client does many small
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.
dim [Mon, 30 Dec 2013 20:27:58 +0000 (20:27 +0000)]
MFC r259902:
In sys/dev/drm/mach64_dma.c, remove static function mach64_set_dma_eol(),
which has never been used, even by upstream, since its initial upstream
commit (see http://cgit.freedesktop.org/mesa/drm/commit/?id=873e1c4d )
dim [Mon, 30 Dec 2013 20:15:46 +0000 (20:15 +0000)]
MFC r257532 (by adrian):
Fix this build for clang.
MFC r259730:
To avoid having to explicitly test COMPILER_TYPE for setting
clang-specific or gcc-specific flags, introduce the following new
variables for use in Makefiles:
jmmv [Mon, 30 Dec 2013 14:09:04 +0000 (14:09 +0000)]
Fix 'make check-old' warnings when WITHOUT_TESTS is set.
This is a MFC of r258025 and r257940, both of which resolve issues with
dynamically setting the list of obsolete files based on the contents
of /usr/tests.
mckusick [Mon, 30 Dec 2013 05:22:22 +0000 (05:22 +0000)]
MFC of 256801, 256803, 256808, 256812, 256817, 256845, and 256860.
This set of changes puts in place the infrastructure to allow soft
updates to be multi-threaded. It introduces no functional changes
from its current operation.
MFC of 256860:
Allow kernels without options SOFTUPDATES to build. This should fix the
embedded tinderboxes.
Reviewed by: emaste
MFC of 256845:
Fix build problem on ARM (which defaults to building without soft updates).
Reported by: Tinderbox
Sponsored by: Netflix
MFC of 256817:
Restructuring of the soft updates code to set it up so that the
single kernel-wide soft update lock can be replaced with a
per-filesystem soft-updates lock. This per-filesystem lock will
allow each filesystem to have its own soft-updates flushing thread
rather than being limited to a single soft-updates flushing thread
for the entire kernel.
Move soft update variables out of the ufsmount structure and into
their own mount_softdeps structure referenced by ufsmount field
um_softdep. Eventually the per-filesystem lock will be in this
structure. For now there is simply a pointer to the kernel-wide
soft updates lock.
Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock
pointer in the mount_softdeps structure instead of a pointer to the
kernel-wide soft-updates lock.
Replace the five hash tables used by soft updates with per-filesystem
copies of these tables allocated in the mount_softdeps structure.
Several functions that flush dependencies when too many are allocated
in the kernel used to operate across all filesystems. They are now
parameterized to flush dependencies from a specified filesystem.
For now, we stick with the round-robin flushing strategy when the
kernel as a whole has too many dependencies allocated.
While there are many lines of changes, there should be no functional
change in the operation of soft updates.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256812:
Fourth of several cleanups to soft dependency implementation.
Add KASSERTS that soft dependency functions only get called
for filesystems running with soft dependencies. Calling these
functions when soft updates are not compiled into the system
become panic's.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256808:
Third of several cleanups to soft dependency implementation.
Ensure that softdep_unmount() and softdep_setup_sbupdate()
only get called for filesystems running with soft dependencies.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256803:
Second of several cleanups to soft dependency implementation.
Delete two unused functions in ffs_sofdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
MFC of 256801:
First of several cleanups to soft dependency implementation.
Convert three functions exported from ffs_softdep.c to static
functions as they are not used outside of ffs_softdep.c.
No functional change.
Tested by: Peter Holm and Scott Long
Sponsored by: Netflix
mckusick [Sun, 29 Dec 2013 07:26:48 +0000 (07:26 +0000)]
MFC of 258789:
We needlessly panic when trying to flush MKDIR_PARENT dependencies.
We had previously tried to flush all MKDIR_PARENT dependencies (and
all the NEWBLOCK pagedeps) by calling ffs_update(). However this will
only resolve these dependencies in direct blocks. So very large
directories with MKDIR_PARENT dependencies in indirect blocks had
not yet gotten flushed. As the directory is in the midst of doing a
complete sync, we simply defer the checking of the MKDIR_PARENT
dependencies until the indirect blocks have been sync'ed.
jmmv [Sat, 28 Dec 2013 23:08:58 +0000 (23:08 +0000)]
Plug the ATF tests into the build.
This is a MFC into stable/10 of:
- r257849 Add libatf-c++ to the prebuild libs.
- r257853 Build and install the atf tests.
- r258233 Move all atf directories to the tests mtree.
- r258285 Fix the build of some ATF tests.
This change is "make tinderbox" clean on ref10-amd64 with the default
settings of WITHOUT_TESTS. It is likely for the WITH_TESTS build to
still be broken because not all relevant changes have been merged yet.
jmmv [Sat, 28 Dec 2013 20:05:31 +0000 (20:05 +0000)]
Set up the /usr/tests hierarchy.
This is a MFC of the following into stable/10:
- r257097 Set up the /usr/tests hierarchy.
- r257098 Add missing WITHOUTTESTS file.
- r257100 Add a tests(7) manual page.
- r257105 Disable WITHTESTS= for now.
- r257848 Fix buildworld when WITHTESTS is enabled.
- r257850 Subsume the functionality of MKATF into MKTESTS.
- r257851 Handle the removal of the test suite when WITHOUTTESTS=yes.
- r257852 Install category Kyuafiles from their category directories.
- r258232 Install BSD.tests.mtree when MKTESTS is yes.
Note that building with WITH_TESTS is still broken at this point (and
hence why WITHOUT_TESTS is the set as the default). Subsequent pullups
will fix the remaining issues.
Make hastctl list command output current queue sizes.
Reviewed by: pjd
r257582 (pjd):
Correct alignment.
r259191:
For memsync replication, hio_countdown is used not only as an
indication when a request can be moved to done queue, but also for
detecting the current state of memsync request.
This approach has problems, e.g. leaking a request if memsynk ack from
the secondary failed, or racy usage of write_complete, which should be
called only once per write request, but for memsync can be entered by
local_send_thread and ggate_send_thread simultaneously.
So the following approach is implemented instead:
1) Use hio_countdown only for counting components we waiting to
complete, i.e. initially it is always 2 for any replication mode.
2) To distinguish between "memsync ack" and "memsync fin" responses
from the secondary, add and use hio_memsyncacked field.
3) write_complete() in component threads is called only before
releasing hio_countdown (i.e. before the hio may be returned to the
done queue).
4) Add and use hio_writecount refcounter to detect when
write_complete() can be called in memsync case.
Reported by: Pete French petefrench ingresso.co.uk
Tested by: Pete French petefrench ingresso.co.uk
r259192:
Add some macros to make the code more readable (no functional chages).
r259193:
Fix compiler warnings.
r259194:
In remote_send_thread, if sending a request fails don't take the
request back from the receive queue -- it might already be processed
by remote_recv_thread, which lead to crashes like below:
(primary) Unable to receive reply header: Connection reset by peer.
(primary) Unable to send request (Connection reset by peer):
WRITE(954662912, 131072).
(primary) Disconnected from kopusha:7772.
(primary) Increasing localcnt to 1.
(primary) Assertion failed: (old > 0), function refcnt_release,
file refcnt.h, line 62.
Taking the request back was not necessary (it would properly be
processed by the remote_recv_thread) and only complicated things.
r259195:
Send wakeup to threads waiting on empty queue before releasing the
lock to decrease spurious wakeups.
Submitted by: davidxu
r259196:
Check remote protocol version only for the first connection (when it
is actually sent by the remote node).
Otherwise it generated confusing "Negotiated protocol version 1" debug
messages when processing the second connection.
jmmv [Sat, 28 Dec 2013 16:08:10 +0000 (16:08 +0000)]
Pull up fixes to allow building tests along scripts and data files.
MFC of the following into stable/10:
- r257095 Allow mixing bsd.files.mk with bsd.subdir.mk.
- r258095 Allow this (bsd.progs.mk) to work with fmake.
- r258330 Need to also test for defined(${v}_${PROG}) in bsd.progs.mk.
- r259209 Make bsd.progs.mk work in directories with SCRIPTS but no PROGS.
This is all 'make tinderbox' clean as run on ref10-amd64.
dim [Sat, 28 Dec 2013 02:15:30 +0000 (02:15 +0000)]
MFC r259897:
In sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, remove static functions
mk_cpl_barrier_ulp(), mk_get_tcb_ulp() and mk_set_tcb_field_ulp(), which
are all unused since r237263.
dim [Sat, 28 Dec 2013 02:07:29 +0000 (02:07 +0000)]
MFC r259893:
In sys/vm/vm_pageout.c, since vm_pageout_worker() takes a void * as
argument, cast the incoming 0 argument to void *, to silence a warning
from clang 3.4 ("expression which evaluates to zero treated as a null
pointer constant of type 'void *' [-Wnon-literal-null-conversion]").
dim [Sat, 28 Dec 2013 01:26:26 +0000 (01:26 +0000)]
MFC r259842:
Remove some unused static const strings under sys/rpc, which have never
been used since the initial commit (r177633).
MFC r259843:
Move a static const variable to the #if 0 part where it is only used.
(Note the #if 0 part has been inactive since the initial commit,
r177633, so maybe it should be removed altogether).
dim [Sat, 28 Dec 2013 01:19:48 +0000 (01:19 +0000)]
MFC r259840:
In sys/netinet6/in6_mcast.c, in6m_is_ifp_detached() is only used
whenever KTR is defined, so put it between #ifdef KTR guards. This
avoids a warning about a unused function if KTR is not enabled.
dim [Sat, 28 Dec 2013 01:15:34 +0000 (01:15 +0000)]
MFC r259839:
In sys/netinet/in_mcast.c, inm_is_ifp_detached() is only used whenever
KTR is defined, so put it between #ifdef KTR guards. This avoids a
warning about a unused function if KTR is not enabled.
markj [Fri, 27 Dec 2013 23:00:56 +0000 (23:00 +0000)]
MFC r258000:
Consistently add the relocation offset only when the ELF type is not
ET_EXEC. This fixes several problems with the DTrace pid provider not
being able to match probes.
markj [Fri, 27 Dec 2013 22:30:36 +0000 (22:30 +0000)]
MFC r257670:
Modify the libproc breakpoint add/remove functions to stop the target
process if it has not already been stopped, since this is required for
ptrace(2) to work.
libdtrace does not seem to stop target processes before trying to remove
their breakpoints, so we were previously failing to remove the breakpoint
on r_debug_state() in rtld. This was causing processes to die with SIGTRAP
if they called dlopen(3) after dtrace(1) had detached.