hselasky [Wed, 13 Aug 2014 06:59:40 +0000 (06:59 +0000)]
MFC r269604:
- Implement fast interrupt handler to save CPU usage.
- Cleanup some register reads and writes to use existing register
access macros.
- Ensure code which only applies to the control endpoint is not run
for other endpoints in the data transfer path.
kib [Wed, 13 Aug 2014 06:58:42 +0000 (06:58 +0000)]
MFC r269643:
Weaken the requirement for the vm object lock by only asserting locked
object in vm_pager_page_unswapped(), instead of locked exclusively.
dim [Tue, 12 Aug 2014 17:56:48 +0000 (17:56 +0000)]
MFC r269750:
In r268463, I misplaced a return in demangle(), causing the function to
erroneously skip symbols that were not mangled at all. Fix this by
moving the return into the preceding if block.
While here, simplify the code by letting __cxa_demangle() allocate the
needed space for the demangled symbol. This also fixes a memory leak,
which would occur whenever __cxa_demangle() failed.
emaste [Tue, 12 Aug 2014 14:53:02 +0000 (14:53 +0000)]
MFC cleanup of libusb20 example
r257779 by hselasky:
- Use libusb20_strerror() function instead of custom usb_error() one.
- Rename "aux.[ch]" to "util.[ch]" which is a more common name for
utility functions and allows checkout on some non-FreeBSD systems
where the "aux.*" namespace is reserved.
- Fix some compile warnings while at it.
hselasky [Tue, 12 Aug 2014 12:10:29 +0000 (12:10 +0000)]
MFC r268316:
Fix OFED startup order: All SYSINIT()'s and modules should be loaded
prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx"
scripts. Else there can be a race configuring the interfaces via
"/etc/rc.conf".
delphij [Tue, 12 Aug 2014 00:59:19 +0000 (00:59 +0000)]
MFC r269230: MFV r269224:
Increase default ARC buf_hash_table size. When typical block size is small,
the hash table could be too small, which would lead to long hash chains and
limit performance for cached reads.
A new loader tunable, vfs.zfs.arc_average_blocksize, have been added which
allows users to override the default assumption of average (typical) block
size. Old default was 65536 (64 KiB) and new default is 8192 (8 KiB).
Illumos issue:
5034 ARC's buf_hash_table is too small
imp [Mon, 11 Aug 2014 18:42:20 +0000 (18:42 +0000)]
MFC: Merge in the changes in -current:
Support ! operator in "files" files.
Improve error detection and reporting
Cleanup code to make it easier to maintain.
Remove mandatory keyword: it has been used for 17 years.
Bump version number (we should have bumped for -I too, but didn't)
r261501 | imp | 2014-02-04 17:26:11 -0700 (Tue, 04 Feb 2014) | 5 lines
Fix ! by not clearing not at the bottom of the loop.
Add a blank line
Submitted by: bde (blank line)
r261493 | imp | 2014-02-04 11:28:58 -0700 (Tue, 04 Feb 2014) | 5 lines
Implement the '!' operator for files* files. It means 'include this
only if the specified option is NOT specified.' Bump version because
old config won't be able to cope with files* files that have this
construct in them.
r261446 | imp | 2014-02-03 12:14:36 -0700 (Mon, 03 Feb 2014) | 5 lines
Convert the loop by gotos into a for loop to improve readability. I
did this only with the inner loop for the token parsing, and not the
outer loop which was understandable enough when the extra layers of
looping went away...
r261445 | imp | 2014-02-03 12:10:33 -0700 (Mon, 03 Feb 2014) | 4 lines
Fix a bug introduced in r261437 that failed to honor "optional
profiling-routine" to work, since profiling-routine is not really an
option or a device, but a special case elsewhere in the code.
r261444 | imp | 2014-02-03 11:56:41 -0700 (Mon, 03 Feb 2014) | 2 lines
Slight cleanup to the error messaging to compress code vertically...
r261442 | imp | 2014-02-03 11:31:51 -0700 (Mon, 03 Feb 2014) | 2 lines
Better error messages when EOF is hit in the middle of a phrase.
r261438 | imp | 2014-02-03 09:54:53 -0700 (Mon, 03 Feb 2014) | 5 lines
Move the check for standard keyword + optional inclusion specifier to
its proper location. Otherwise you could have 'file.c standard pci'
without an error. This construct isn't in our tree, and has no well
defined meaning.
r261437 | imp | 2014-02-03 09:47:10 -0700 (Mon, 03 Feb 2014) | 4 lines
Don't believe we have a requirement until after we've checked all the
known key words. This will make error messages slightly better in
weird corner cases, but should otherwise be a nop.
r261436 | imp | 2014-02-03 09:46:01 -0700 (Mon, 03 Feb 2014) | 3 lines
In the 17 years since r30796, the mandatory keyword has never been used
in any files as far as I can tell, and is currently unused. Retire it.
r261435 | imp | 2014-02-03 08:10:44 -0700 (Mon, 03 Feb 2014) | 6 lines
Slightly deobfuscate read_file() and likely pessimize the runtime
performance by epsilon.
(Translation: elminate bogus macros that hid 'returns' making it hard
to read and moved a block of code inline rather than at the end of the
fuction where it was effectively a 'gosub' kind of goto).
ian [Mon, 11 Aug 2014 02:20:24 +0000 (02:20 +0000)]
MFC r269403, r269405, r269410, r269414:
Add 64-bit atomic ops for armv6, and also for armv4 only in kernel code.
Use the new ops in the cddl code (and avoid defining functions with the
same names locally).
busdma-v6 improvements, primarily:
- Allocate the temporary segments array per-map rather than per-tag.
- Avoid needlessly bouncing IO for mbufs and buffers allocated by
bus_dmamem_alloc() (in both situations we known they're allocated
on cacheline boundaries and don't need bouncing).
- Various minor reformatting and cleanups.
delphij [Sun, 10 Aug 2014 05:58:41 +0000 (05:58 +0000)]
MFC r269118: MFV r269010:
Import Illumos changes to address the following Illumos issues:
4976 zfs should only avoid writing to a failing non-redundant
top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature
is enabled
4984 device selection should use fragmentation metric
markj [Sat, 9 Aug 2014 15:00:03 +0000 (15:00 +0000)]
MFC r265308:
If the traced process stops because it received a signal, libproc needs
to ensure that the signal is forwarded when proc_continue() is called.
delphij [Fri, 8 Aug 2014 19:11:23 +0000 (19:11 +0000)]
MFC r268621 (smh) + r268625:
Don't report non-native block-size pools under zpool status -x
zpool status -x is used to identify pools that are exhibiting
errors or are otherwise unavailable, therefore non-native
block-size pools shouldn't be reported.
Also update man page to clarify other additional conditions
which won't cause a pool to be displayed under zpool status -x.
delphij [Fri, 8 Aug 2014 18:54:52 +0000 (18:54 +0000)]
MFC r269086:
As of r268075, the responsibility of rounding up buffer to optimal size have
been transferred from zio_compress_data to its caller. Therefore, passing
the 'minblocksize' down will be a no-op.
Eliminate the parameter to reduce diff against upstream.
markj [Fri, 8 Aug 2014 15:21:43 +0000 (15:21 +0000)]
MFC r265631:
Re-apply r248644. This fixes an annoying problem which caused dtrace -c to
fail to attach to stripped binaries. With the _r_debug_postinit symbol,
dtrace(1) can now set a breakpoint in the victim process after it has
registered its DOF table(s) with the kernel. r_debug_state cannot be used
for this purpose since it is called before DOF is made available, in which
case dtrace(1) cannot create USDT probes before the program begins
execution.
markj [Fri, 8 Aug 2014 14:53:01 +0000 (14:53 +0000)]
MFC r265629, r265630
MFC r265629:
Handle the different event types properly in rd_event_addr(). In particular,
with r265456 _r_debug_postinit can be used for RD_POSTINIT events. rtld(1)
uses r_debug_state for dl state transitions, so we use its address for
RD_DLACTIVITY events.
MFC r265630:
Fix the rd_event_addr prototype and slightly clarify the use of the "event"
parameter.
joerg [Fri, 8 Aug 2014 14:42:03 +0000 (14:42 +0000)]
Merge r269353:
Fix breakage introduced by r256843: removing the SA_CCB_WAITING bit
left some of the decisions based on its counterpart, SA_CCB_BUFFER_IO
being random. As a result, propagation of the residual information
for the SPACE command was broken, so the number of filemarks
encountered during a SPACE operation was miscalculated. Consequently,
systems relying on properly tracked filemark counters (like Bacula)
fell apart.
The change also removes a switch/case in sadone() which r256843
degraded to a single remaining case label.
markj [Thu, 7 Aug 2014 18:36:47 +0000 (18:36 +0000)]
MFC r265456, r265578:
Add a postinit debugger hook to rtld. This will be used by dtrace(1) to halt
the victim process before its entry point is called, at which point probes
and DOF data are registered with the kernel. The r_debug_state hook cannot
be used for this purpose, as it is called before the program's init routines
are invoked and in particular before DOF data is registered (via drti.o).
mckusick [Wed, 6 Aug 2014 23:33:16 +0000 (23:33 +0000)]
MFC of r269303:
When restoring a UFS dump onto a ZFS filesystem, an assertion in
restore was failing because ZFS was reporting a blocksize that was
not a multiple of 1024. Replace restore's failed assertion with
code that writes restored files in a blocksize that works for
restore (a multiple of 1024) despite being non-optimal for ZFS.
ache [Wed, 6 Aug 2014 10:33:43 +0000 (10:33 +0000)]
MFC: r268997
For "a"-mode files and rewind/fseek + fwrite combination return meaningful
value now, like Apple does, but avoid their __sflush physical write
performance degradation as much as possible.
marius [Tue, 5 Aug 2014 16:04:22 +0000 (16:04 +0000)]
MFC: r260457
The changes in r233781 attempted to make logging during a machine check
exception more readable. In practice they prevented all logging during
a machine check exception on at least some systems. Specifically, when
an uncorrected ECC error is detected in a DIMM on a Nehalem/Westmere
class machine, all CPUs receive a machine check exception, but only
CPUs on the same package as the memory controller for the erroring DIMM
log an error. The CPUs on the other package would complete the scan of
their machine check banks and panic before the first set of CPUs could
log an error. The end result was a clearer display during the panic
(no interleaved messages), but a crashdump without any useful info about
the error that occurred.
To handle this case, make all CPUs spin in the machine check handler
once they have completed their scan of their machine check banks until
at least one machine check error is logged. I tried using a DELAY()
instead so that the CPUs would not potentially hang forever, but that
was not reliable in testing.
While here, don't clear MCIP from MSR_MCG_STATUS before invoking panic.
Only clear it if the machine check handler does not panic and returns
to the interrupted thread.
Intel desktop Haswell CPUs may report benign corrected parity errors (see
HSD131 erratum in [1]) at a considerable rate. So filter these (default),
unless logging is enabled. Unfortunately, there really is no better way to
reasonably implement suppressing these errors than to just skipping them
in mca_log(). Given that they are reported for bank 0, they'd need to be
masked in MSR_MC0_CTL. However, P6 family processors require that register
to be set to either all 0s or all 1s, disabling way more than the one error
in question when using all 0s there. Alternatively, it could be masked for
the corresponding CMCI, but that still wouldn't keep the periodic scanner
from detecting these spurious errors. Apart from that, register contents of
MSR_MC0_CTL{,2} don't seem to be publicly documented, neither in the Intel
Architectures Developer's Manual nor in the Haswell datasheets.
Note that while HSD131 actually is only about C0-stepping as of revision
014 of the Intel desktop 4th generation processor family specification
update, these corrected errors also have been observed with D0-stepping
aka "Haswell Refresh".
mav [Tue, 5 Aug 2014 08:28:29 +0000 (08:28 +0000)]
MFC r269441:
Add missing comparisons to make list IDs in EXTENDED COPY per-initiator,
as they should be. Wrap it into a function to not duplicate the code.
markj [Tue, 5 Aug 2014 01:53:15 +0000 (01:53 +0000)]
MFC r267759, r267761
r267759:
Fix a couple of bugs on amd64 when fetching probe arguments beyond the
first five for probes entered through a UD fault (i.e. FBT probes).
Specifically, handle the fact that dtrace_invop_callsite must be
16 byte-aligned and thus may not immediately follow the call to
dtrace_invop() in dtrace_invop_start(). Also fetch register arguments and
the stack pointer through a struct trapframe instead of a struct reg.
r267761:
Fix some bugs when fetching probe arguments in i386. Firstly ensure that
the 4 byte-aligned dtrace_invop_callsite can be found and that it
immediately follows the call to dtrace_invop(). Secondly, fix some pointer
arithmetic to account for differences between struct i386_frame and illumos'
struct frame. Finally, ensure that dtrace_getarg() isn't inlined. It works
by following a fixed number of frame pointers to the probe site, so inlining
breaks it.
markj [Tue, 5 Aug 2014 00:25:46 +0000 (00:25 +0000)]
MFC r267706:
Allow creation of SDT probes from a module in which no providers are
defined. This ensures that the sdt:zfs:: probes appear despite the fact
the sdt provider is defined in the kernel rather than in zfs.ko.
markj [Mon, 4 Aug 2014 21:41:00 +0000 (21:41 +0000)]
MFC r256822:
When fetching function arguments out of a frame on amd64, explicitly select
the register based on the argument index rather than relying on the fields
in struct reg to be in the right order. This assumption is incorrect on
FreeBSD and generally led to bogus argument values for the sixth argument
of PID and USDT probes; the first five are passed directly to dtrace_probe()
via the fasttrap trap handler and so were correctly handled.
markj [Mon, 4 Aug 2014 15:36:22 +0000 (15:36 +0000)]
MFC r256571:
Add a function, memstr, which can be used to convert a buffer of
null-separated strings to a single string. This can be used to print the
full arguments of a process using execsnoop (from the DTrace toolkit) or
with the following one-liner:
Note that this relies on the process arguments being cached via the struct
proc, which means that it will not work for argvs longer than
kern.ps_arg_cache_limit. However, the following rather non-portable
script can be used to extract any argv at exec time:
dim [Mon, 4 Aug 2014 14:56:49 +0000 (14:56 +0000)]
MFC r269125:
In r232153, libarchive 3.0.3 was imported, replacing the archive_hash.h
header with archive_crypto_private.h, and its ARCHIVE_HASH_xxx macros
were renamed to ARCHIVE_CRYPTO_xxx.
Rename these macros in lib/libarchive/config_freebsd.h, to re-enable the
hashes for libarchive again. This affects the mtree format writer, and
the xar format reader and writer modules.
This also requires changes in the library order for statically linking
rescue, otherwise ld would complain about redefined symbols. Thanks to
jkim for pointing out the solution.
pfg [Mon, 4 Aug 2014 00:51:57 +0000 (00:51 +0000)]
MFC r268945:
Fix hdestroy() compliance issue.
The hcreate(3) implementation and related functions we inherited
from NetBSD used to free() the key value, something that is not
supported by the standard implementation.
This would cause a segmentation fault when attempting to run
the examples from the opengroup and linux manpages.
There is no need to bump the __FreeBSD_version as we have
always claimed XPG4.2 compliance but if some reference is
required, the bump for r269484 can be used.
peter [Sun, 3 Aug 2014 22:59:47 +0000 (22:59 +0000)]
Insta-MFC r269489: partial revert of r262867 which was MFC'ed as r263820.
Don't ignore sndbuf/rcvbuf limits for SOCK_DGRAM sockets. This appears
to be an edit error or patch fuzz mismatch.
pfg [Sun, 3 Aug 2014 18:39:11 +0000 (18:39 +0000)]
MFC r268066:
regex(3): Add support for \< and \> word delimiters
Solaris and other OSs have support for \< and \> as word
delimiters in utilities like sed(1). These are useful to
have for general compatiblity with Solaris but should be
avoided for portability with other systems, including the
traditional BSDs.
Bump __FreeBSD_version as this is likely to affect some
userland utilities.
pfg [Sun, 3 Aug 2014 18:31:52 +0000 (18:31 +0000)]
MFC r269124:
strftime() xlocale cleanups.
Replace fprintf_l with fputs when output is unformatted.
Use locale_t in _conv() since it was using sprintf (now sprintf_l)
Use locale_t on _yconv() since it calls _conv()
Obtained from: Apple Inc. (Libc 997.90.3)
CR: D482
Reviewed by: theraven
rmacklem [Sun, 3 Aug 2014 00:35:10 +0000 (00:35 +0000)]
MFC: r268273
The new NFSv3 server did not generate directory postop attributes for
the reply to ReaddirPlus when the server failed within the loop
that calls VFS_VGET(). This failure is most likely an error
return from VFS_VGET() caused by a bogus d_fileno that was
truncated to 32bits.
This patch fixes the server so that it will return directory postop
attributes for the failure. It does not fix the underlying issue caused
by d_fileno being uint32_t when a file system like ZFS generates
a fileno that is greater than 32bits.
marcel [Sat, 2 Aug 2014 22:25:24 +0000 (22:25 +0000)]
MFC 259910, 260023, 260028, 260600 & 260701:
o Fix "kptdir is itself virtual" error, caused by having the kptdir in PBVM.
o Allow building a cross libkvm for ia64.
o Add support for virtual cores (aka minidumps).
o We don't have to worry about page sizes when working on virtual cores.
o Handle truncation of the size returned by _kvm_kvatop().
hselasky [Sat, 2 Aug 2014 20:58:46 +0000 (20:58 +0000)]
Partial MFC of r267961, r267973, r267985, r267992, r267993 and r268005:
Backport some macro definitions to make backporting code from FreeBSD
current easier.
mav [Sat, 2 Aug 2014 06:56:00 +0000 (06:56 +0000)]
MFC r269123:
Implement separate I/O dispatch method for ZVOLs in "dev" mode.
Unlike disk devices ZVOLs process all requests synchronously. That makes
impossible sending multiple requests to them from single thread. From the
other side ZVOLs have real d_read/d_write methods, which unlike d_strategy
can handle uio scatter/gather and have no strict I/O size limitations.
So, if ZVOL in "dev" mode is detected, use of d_read/d_write methods instead
of d_strategy allows to avoid pointless splitting of large requests into
MAXPHYS (128K) sized chunks.
delphij [Sat, 2 Aug 2014 04:06:35 +0000 (04:06 +0000)]
MFC r268865: MFV r268852:
Reduce lock contention on the z_teardown_lock under heavily cached
read workload by splitting the single teardown rrw lock into
RRM_NUM_LOCKS (17) of them.
Read acquisitions are randomly distributed among these locks based
on curthread pointer. Write acquisitions are going to all the
locks, which for the usage of this type of lock should be rare.
Illumos issue:
5008 lock contention (rrw_exit) while running a read only load
delphij [Sat, 2 Aug 2014 04:01:44 +0000 (04:01 +0000)]
MFC r268859: MFV r268851:
When a sync task is waiting for a txg to complete, we should hurry it along
by increasing the number of outstanding async writes (i.e. make
vdev_queue_max_async_writes() return a larger number).
Illumos issue:
4753 increase number of outstanding async writes when sync task is waiting
delphij [Sat, 2 Aug 2014 03:59:35 +0000 (03:59 +0000)]
MFC r268858: MFV r268850:
Change the interaction between the DMU and ARC so that when the DMU is
shutting down an objset, we do not evict the data from the ARC. Instead
we simply coordinate the destruction of the DMU's data with the ARC.
The only case where we actually need to explicitly evict from the ARC is
when dbuf_rele_and_unlock() determines that the administrator has requested
that it not be kept in memory, via the primarycache/secondarycache properties.
In this case, we evict the data from the ARC by its blkptr_t, the same way
as when a block is freed we explicitly evict it from the ARC.
Illumos issue:
4631 zvol_get_stats triggering too many reads
delphij [Sat, 2 Aug 2014 03:56:06 +0000 (03:56 +0000)]
MFC r268855: MFV r268848:
Instead of asserting all zio's be properly aligned, only assert
on the logical ones.
Cap uberblocks at 8k, otherwise with ashift=17, there would be
only one uberblock.
This fixes a problem that zdb would trip assert on pools with
ashift >= 0xe (8k).
While there, also change the code so it only attempt to condense
space map unless the uncondensed size consumes greater than
zfs_metaslab_condense_block_threshold blocks.
Illumos issue:
4958 zdb trips assert on pools with ashift >= 0xe