mav [Fri, 30 Jan 2015 09:05:43 +0000 (09:05 +0000)]
MFC r277247: Don't count status as sent until CTIO completes successfully.
If we aggregated status sending with data move and got error, allow status
to be updated and resent again separately. Without this command may stuck
without status sent at all.
rstone [Wed, 28 Jan 2015 21:51:34 +0000 (21:51 +0000)]
MFC r277352:
When mountd is creating sockets, it iterates over all addresses specified
in the "hosts" array and eventually looks up the network address with
getaddrinfo(). At one point it checks for a numeric address and if it
sees one, it sets a hint parameter to force getaddrinfo to interpret the
host as a numeric address. However that hint is not cleared for subsequent
iterations of the loop and if any hosts seen after this point are host names,
getaddrinfo will fail on the name. The result of this bug is that you cannot
pass a host name to the -h flag.
Unfortunately, the first iteration will either process ::1 or 127.0.0.1,
so the flag is set on the first iteration and all host names will fail
to be processed.
The same bug applies to rpc.lockd and rpc.statd, so fix them too.
Differential Revision: https://reviews.freebsd.org/D1507
Reported by: Dylan Martin
MFC after: 1 week
Sponsored by: Sandvine Inc.
mav [Wed, 28 Jan 2015 02:55:20 +0000 (02:55 +0000)]
MFC r277169: Reimplement TRIM throttling added in r248577.
Previous throttling implementation approached problem from the wrong side.
It significantly limited useful delaying of TRIM requests and aggregation
potential, while not so much controlled TRIM burstiness under heavy load.
With this change random 4K write benchmarks (probably the worst case for
TRIM) show me IOPS increase by 20%, average latency reduction by 30%, peak
TRIM bursts reduction by 3 times and same peak TRIM map size (memory usage).
Also the new logic does not force map size down so heavily, really allowing
to keep deleted data for 32 TXG or 30 seconds under moderate load. It was
practically impossible with old throttling logic, which pushed map down to
only 64 segments.
bryanv [Tue, 27 Jan 2015 06:19:30 +0000 (06:19 +0000)]
MFC r272886:
Add context pointer and source address to the UDP tunnel callback
These are needed for the forthcoming vxlan implementation. The context
pointer means we do not have to use a spare pointer field in the inpcb,
and the source address is required to populate vxlan's forwarding table.
trasz [Mon, 26 Jan 2015 13:37:18 +0000 (13:37 +0000)]
MFC r272294 by gavin@:
Make clear in the ipheth(4) hardware notes that this driver is for the
tethering functionality only. Add a "bugs" section to give a pointer
to usbconfig set_config if the device isn't automatically detected.
trasz [Mon, 26 Jan 2015 13:21:30 +0000 (13:21 +0000)]
MFC r274791:
Add missing error checking for kernel_port_{add,remove}(). Both can fail
for reasons yet unknown; don't make it increment cumulated_error as a kind
of temporary workaround.
luigi [Mon, 26 Jan 2015 03:26:37 +0000 (03:26 +0000)]
Merge 272659:
Add netmap support to libpcap. Tcpdump and other native pcap clients
can now run directly on netmap ports using netmap:foo or valeXX:YY
as device names.
Modifications to existing code are small and trivial,
the netmap-specific code is all in a new file.
Please be aware that in netmap mode the physical interface is
disconnected from the host stack, so libpcap will steal the traffic
not just make a copy.
For the full version of the code (including linux and autotools support) see
https://code.google.com/p/netmap-libpcap/
mav [Sun, 25 Jan 2015 14:31:44 +0000 (14:31 +0000)]
MFC r276983: When aggregating TRIM segments, move the new one to the end.
New segment at the list head may block all TRIM requests until txg of that
segment can be processed. On my random I/O tests this change reduce peak
TRIM list length from 650 to 450 segments. Hopefully it should reduce TRIM
burstiness when list processing is unblocked.
mav [Sun, 25 Jan 2015 14:29:40 +0000 (14:29 +0000)]
MFC r276952: Add LBA as secondary sort key for synchronous I/O requests.
On FreeBSD gethrtime() implemented via getnanouptime(), that has 1ms (1/hz)
precision. It makes primary sort key (timestamp) collision very possible.
In such situations sorting by secondary key of LBA is much more reasonable
then by totally meaningless zio pointer value.
With this change on multi-threaded synchronous ZVOL read I've measured 10%
throughput increase and average latency reduction.
cperciva [Sun, 25 Jan 2015 08:16:51 +0000 (08:16 +0000)]
MFC r277318:
When disabling C3+ CPU states due to the CPU_QUIRK_NO_C3 quirk, don't
accidentally enable non-existent states.
This bug was triggered if ACPI advertises the presence of a C2 state
which we fail to parse via acpi_PkgGas due to our lack of support for
FFixedHW resources, and causes an immediate panic when an attempt is
made to enter the (NULL) state.
One affected platform is the EC2 c4.8xlarge VM instance type; there
may be others.
ngie [Sun, 25 Jan 2015 00:28:15 +0000 (00:28 +0000)]
MFC r277278:
r277278 (by ngie):
Fix lib/libthr/tests/detach_test
- Eliminate race with liberal use of sleep(3) [1]
- Fix NetBSD-specific implementation way of testing result from pthread_cancel
by testing with `td` instead of `NULL` [2]
delphij [Fri, 23 Jan 2015 18:39:26 +0000 (18:39 +0000)]
MFC r275922: MFV r275914:
As of r270383, the dbuf_compare comparator compares the dbuf
attributes in the following order:
db_level (indirect level)
db_blkid (block number)
db_state (current state)
the address of the element
Because db_state is being considered before the element's state,
changing of db_state would affect balancedness of the AVL tree,
even when the address of element compares differently. For
instance, in dbuf_create, db_state may be altered after the
node is inserted into the AVL tree and may break AVL tree
balancedness.
Instead of using db_state as a comparision critera (introduced
in r270383), consider it only when we are doing a lookup, that
is one of the two dbuf pointers contains DB_SEARCH.
Illumos issue:
5422 preserve AVL invariants in dn_dbufs
delphij [Fri, 23 Jan 2015 18:36:21 +0000 (18:36 +0000)]
MFC r275812: MFV r275784:
Plug a memory leak in libzfs. In zfs_iter_bookmarks, an nvlist is allocated
before calling lzc_get_bookmarks, which allocates the nvlist again (and
overwrites the pointer to previously allocated list).
Illumos issue:
5427 memory leak in libzfs when doing rollback
delphij [Fri, 23 Jan 2015 18:33:50 +0000 (18:33 +0000)]
MFC r275811: MFV r275783:
Convert ARC flags to use enum. Previously, public flags are defined in
arc.h and private flags are defined in arc.c which can lead to confusion
and programming errors.
Consistently use 'hdr' (when referencing arc_buf_hdr_t) instead of 'buf'
or 'ab' because arc_buf_t are often named 'buf' as well.
Illumos issue:
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
delphij [Fri, 23 Jan 2015 18:30:32 +0000 (18:30 +0000)]
MFC r275782: MFV r275551:
Remove "dbuf phys" db->db_data pointer aliases.
Use function accessors that cast db->db_data to the appropriate
"phys" type, removing the need for clients of the dmu buf user
API to keep properly typed pointer aliases to db->db_data in order
to conveniently access their data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
In zap_leaf() and zap_leaf_byteswap, now that the pointer alias
field l_phys has been removed, use the db_data field in an on
stack dmu_buf_t to point to the leaf's phys data.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
Remove the db_user_data_ptr_ptr field from dbuf and all logic
to maintain it.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
Modify the DMU buf user API to remove the ability to specify
a db_data aliasing pointer (db_user_data_ptr_ptr).
cddl/contrib/opensolaris/cmd/zdb/zdb.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Create and use the new "phys data" accessor functions
dsl_dir_phys(), dsl_dataset_phys(), zap_m_phys(),
zap_f_phys(), and zap_leaf_phys().
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dir.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_impl.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap_leaf.h:
Remove now unused "phys pointer" aliases to db->db_data
from clients of the DMU buf user API.
delphij [Fri, 23 Jan 2015 18:16:36 +0000 (18:16 +0000)]
MFC r275740: MFV r275548:
Verify that the block pointer is structurally valid, before attempting to
read it in. It can only be invalid in the case of a ZFS bug, but this
change will help identify such bugs in a more transparent way, by
panic'ing with a relevant message, rather than indexing off the end of an
array or something.
Illumos issue:
5349 verify that block pointer is plausible before reading
delphij [Fri, 23 Jan 2015 17:41:34 +0000 (17:41 +0000)]
MFC r275738: MFV r275546:
Reduce scrub activities when system there is enough dirty data, namely when
dirty data is more than zfs_vdev_async_write_active_min_dirty_percent (once
we start to increase the number of concurrent async writes).
While there also correct rounding error which would make scrub end up
pausing for (zfs_txg_timeout + 1) seconds instead of the desired
zfs_txg_timeout seconds.
Illumos issue:
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
delphij [Fri, 23 Jan 2015 17:31:41 +0000 (17:31 +0000)]
MFC r275737: MFV r275545:
If zio_checksum_error() returns other than ECKSUM (e.g. EINVAL), it does not
fill in the "zio_bad_cksum_t *info" parameter. Caller should not attempt to
use it in this case.
Illumos issue:
5348 zio_checksum_error() only fills in info if ECKSUM
delphij [Fri, 23 Jan 2015 17:16:26 +0000 (17:16 +0000)]
MFC r275734: MFV r275542:
If a dnode has a spill block and there is an error while accessing
a data block then traverse_dnode() loses information about that error
and returns a status of visiting the spill block.
This issue is discovered by Spectra Logic.
Illumos issue:
5311 traverse_dnode may report success when it should not
emaste [Fri, 23 Jan 2015 02:39:00 +0000 (02:39 +0000)]
crunchide: Correct 64-bit section header offset
For 64-bit binaries the Elf_Ehdr e_shoff is at offset 40, not 44.
Instead of using an incorrect hardcoded offset, let the compiler
figure it out for us with offsetof().
delphij [Fri, 23 Jan 2015 00:54:56 +0000 (00:54 +0000)]
MFC r275595:
Use calloc() instead of malloc() + bzero(). This also gets rid of a warning
because bzero is defined by strings.h which is not included in thread_pool.c.
delphij [Fri, 23 Jan 2015 00:44:14 +0000 (00:44 +0000)]
MFC r275594: MFV r275540:
When importing a pool, don't assume that the passed pool configuration
at vdev_load is always vaild. It's possible that a stale configuration
that comes with extra vdevs, where metaslab_init() would fail because
of lower layer returns error.
Change the code to make metaslab_init() handle and return errors from
lower layer and pass it back to upper layer and handle it there.
Illumos issue:
5213 panic in metaslab_init due to space_map_open returning ENXIO
brooks [Thu, 22 Jan 2015 21:17:58 +0000 (21:17 +0000)]
MFC r274816:
Add FPU support for MIPS setjmp(3)/longjmp(3).
This change saves/restores the callee-saved MIPS floating point
registers as documented by the o32/n32/n64 spec ("MIPSpro N32
ABI Handbook", Table 2-1) for the _setjmp(3), _longjmp(3),
setjmp(3) and longjmp(3) C library functions. This is only
included when the C library is built with hardware floating point
support (or when "SOFTFLOAT" is not defined).
ngie [Tue, 20 Jan 2015 23:39:08 +0000 (23:39 +0000)]
MFC r275907:
r275907 (by ngie):
Fix building/installing tests when TESTSBASE != /usr/tests
The work in r258233 hardcoded the assumption that tests was the last component
of the tests tree by pushing tests as an explicit prefix for the paths in
BSD.tests.dist and /usr was the prefix for all tests, per BSD.usr.dist and all
of the mtree calls used in Makefile.inc1. This assumption breaks if/when one
provides a custom TESTSBASE "prefix", e.g. TESTSBASE=/mytests .
One thing that r258233 did properly though was remove "/usr/tests" creation
from BSD.usr.dist -- that should have not been there in the first place. That
was an "oops" on my part for the work that was originally committed in r241823
ngie [Tue, 20 Jan 2015 21:42:40 +0000 (21:42 +0000)]
MFC r274075,r274581,r274582,r274595:
r274075 (by ngie):
Add reachover Makefiles for contrib/netbsd-tests/lib/libc; this adds approximately
500 new testcases
Various TODOs have been sprinkled around the Makefiles for items that even need
to be ported (missing features), testcases have issues with building/linking, or
issues at runtime.
A variant of this code has been tested extensively on amd64 and i386
10-STABLE/11-CURRENT for several months without issue. It builds on other
architectures, but the code will remain off until I have prove it works on
virtual hardware or real hardware on other architectures