From d4d88a71662901a6b6be4764c2155c1e3e6b6c37 Mon Sep 17 00:00:00 2001 From: mm Date: Fri, 30 Nov 2012 22:38:42 +0000 Subject: [PATCH] Merge ZFS feature flags support and related bugfixes: 236884, 237001, 237119, 237458, 237972, 238113, 238391, 238422, 238926, 238950, 238951, 239389, 239394, 239620, 239774, 239953, 239958, 239967, 239968, 240063, 240133, 240153, 240303, 240345, 240415, 240955, 241655, 243014, 243505, 243506 MFC r236884: Introduce "feature flags" for ZFS pools (bump SPA version to 5000). Add first feature "com.delphix:async_destroy" (asynchronous destroy of ZFS datasets). Implement features support in ZFS boot code. Illumos revisions merged: 13700:2889e2596bd6 13701:1949b688d5fb 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags References: https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 MFC r237001: Fix ZFS boot with pre-features pools (version <= 28) broken in r236884 MFC r237119 [1]: Do not remount ZFS dataset if changing canmount property to "on" and dataset is already mounted. MFC r237458: Import Illumos revision 13736:9f1d48e1681f 2901 ZFS receive fails for exabyte sparse files References: https://www.illumos.org/issues/2901 MFC r237972: Expose scrub and resilver tunables. This allows the user to tune the priority trade-off between scrub/resilver and other ZFS I/O. MFC r238113 (pjd): vdev_io_done stage is not used for ioctls. MFC r238391: Change behavior introduced in r237119 to vendor solution References: https://www.illumos.org/issues/2883 MFC r238422: Merge illumos commit 13749:df4cd82e2b60 1796 "ZFS HOLD" should not be used when doing "ZFS SEND" froma read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957 MFC r238926: Partial MFV (illumos-gate 13753:2aba784c276b) 2762 zpool command should have better support for feature flags References: https://www.illumos.org/issues/2762 MFC r238950: Fix reporting of root pool upgrade notice. MFC r238951: Fix wrong indent according to style(9) MFC r239389: Backport fix for vendor issue #3085 3085 zfs diff panics, then panics in a loop on booting References: https://www.illumos.org/issues/3085 MFC r239394: Update zfs(8) manpage with illumos version of "zfs diff" Illumos issue: 2399 zfs manual page does not document use of "zfs diff" References: https://www.illumos.org/issues/2399 MFC r239620 [2]: Merge recent vendor changes: 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Referenes: https://www.illumos.org/issues/3086 https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 MFC r239774: Merge recent vendor changes: 3100 zvol rename fails with EBUSY when dirty 3104 eliminate empty bpobjs 3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc References: https://www.illumos.org/issues/3100 https://www.illumos.org/issues/3104 https://www.illumos.org/issues/3120 MFC r239953 (joel): Mdoc fixes. MFC r239958 (joel): Minor mdoc fixes. MFC r239967 (joel): Mdoc fixes. MFC r239968 (joel): Remove trailing whitespace. MFC r240063 (gjb): Add myself to copyright sections, per CDDL license. MFC r240133: Merge recent vendor changes and sync code: 1862 incremental zfs receive fails for sparse file > 8PB 3112 ztest does not honor ZFS_DEBUG 3122 zfs destroy filesystem should prefetch blocks 3129 'zpool reopen' restarts resilvers 3130 ztest failure: Assertion failed: 0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10) References: https://www.illumos.org/issues/1862 https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3122 https://www.illumos.org/issues/3129 https://www.illumos.org/issues/3130 MFC r240153 (gjb) [3]: Typo fix and minor word swap. MFC r240303: Add assfail() and assfail3() to the opensolaris module. Remove obsoleted intermediate cddl/compat/opensolaris/sys/debug.h. MFC r240345 (avg): zfs: fix sa_modify_attrs handling of variable-sized attributes - skip length_idx index for a replaced variable-sized attribute - skip length_idx index for a removed variable-sized attribute - also re-arranged code to make sure that length_idx is always incremented for variable-sized attributes - additionally add an assertion that the number of actually produced attributes is the same as the expected number of resulting attributes MFC r240415: Merge recent zfs vendor changes, sync code and adjust userland DEBUG. Illumos issued covered: 1884 Empty "used" field for zfs *space commands 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3064 usr/src/cmd/zpool/zpool_main.c misspells "successful" 3093 zfs {user,group}space's -i is noop 3098 zfs userspace/groupspace fail without saying why when run as non-root References: https://www.illumos.org/issues/ + [issue_id] MFC r240955 (partial): Merge recent vendor changes in ZFS. Illumos issued covered: 3139 zdb dies when it tries to determine path of unlinked file 3189 kernel panic in ZFS test suite during hotspare_onoffline_004_neg 3208 moving zpool cross-endian results in incorrect user/group accounting References: https://www.illumos.org/issues/ + [issue_id] MFC r241655: Add missing initialization for do_prefix. Corrects porting error in r238391 Vendor issue and changeset reference: 2883 changing "canmount" property to "on" should not always remount dataset https://www.illumos.org/issues/2883 Changeset 13743:95aba6e49b9f MFC r243014: Move zpool-features manual page from section 5 to section 7 and fix references Reported by: pluknet MFC r243505: Illumos 13886:e3261d03efbf 3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version References: https://www.illumos.org/issues/3349 MFC r243506: zfs sha256 checksum is missing in zfs.8 manpage PR: kern/167905 [1], kern/170912 [2], kern/170914 [2], doc/171356 [3] git-svn-id: svn://svn.freebsd.org/base/stable/8@243717 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f --- Makefile.inc1 | 3 +- UPDATING | 11 + cddl/contrib/opensolaris/cmd/zdb/zdb.8 | 10 +- cddl/contrib/opensolaris/cmd/zdb/zdb.c | 72 +- cddl/contrib/opensolaris/cmd/zfs/zfs.8 | 157 ++-- cddl/contrib/opensolaris/cmd/zfs/zfs_main.c | 759 +++++++++--------- cddl/contrib/opensolaris/cmd/zhack/zhack.c | 533 ++++++++++++ .../contrib/opensolaris/cmd/zinject/zinject.c | 12 +- .../opensolaris/cmd/zpool/zpool-features.7 | 206 +++++ cddl/contrib/opensolaris/cmd/zpool/zpool.8 | 130 ++- .../opensolaris/cmd/zpool/zpool_main.c | 621 +++++++++++--- cddl/contrib/opensolaris/cmd/ztest/ztest.c | 231 +++++- .../opensolaris/lib/libnvpair/libnvpair.c | 5 + .../opensolaris/lib/libuutil/common/uu_misc.c | 2 + .../opensolaris/lib/libzfs/common/libzfs.h | 25 +- .../lib/libzfs/common/libzfs_config.c | 83 ++ .../lib/libzfs/common/libzfs_dataset.c | 56 +- .../lib/libzfs/common/libzfs_import.c | 85 +- .../lib/libzfs/common/libzfs_pool.c | 223 ++++- .../lib/libzfs/common/libzfs_sendrecv.c | 21 +- .../lib/libzfs/common/libzfs_status.c | 43 +- .../lib/libzfs/common/libzfs_util.c | 19 +- .../opensolaris/lib/libzpool/common/kernel.c | 4 +- .../lib/libzpool/common/sys/zfs_context.h | 57 +- cddl/lib/libnvpair/Makefile | 7 +- cddl/lib/libzfs/Makefile | 5 +- cddl/lib/libzpool/Makefile | 3 + cddl/sbin/zpool/Makefile | 2 +- cddl/usr.bin/ztest/Makefile | 10 +- cddl/usr.sbin/Makefile | 4 +- cddl/usr.sbin/zdb/Makefile | 3 + cddl/usr.sbin/zhack/Makefile | 32 + rescue/rescue/Makefile | 2 +- sys/boot/zfs/zfsimpl.c | 68 +- sys/cddl/boot/zfs/zfsimpl.h | 14 +- .../opensolaris/kern/opensolaris_cmn_err.c | 22 +- sys/cddl/compat/opensolaris/sys/assfail.h | 82 ++ sys/cddl/compat/opensolaris/sys/debug.h | 12 +- .../opensolaris/common/nvpair/fnvpair.c | 498 ++++++++++++ .../opensolaris/common/zfs/zfeature_common.c | 158 ++++ .../opensolaris/common/zfs/zfeature_common.h | 71 ++ .../opensolaris/common/zfs/zpool_prop.c | 22 + .../opensolaris/uts/common/Makefile.files | 10 +- .../opensolaris/uts/common/fs/zfs/arc.c | 104 ++- .../opensolaris/uts/common/fs/zfs/bpobj.c | 58 +- .../opensolaris/uts/common/fs/zfs/bptree.c | 225 ++++++ .../opensolaris/uts/common/fs/zfs/dbuf.c | 9 +- .../opensolaris/uts/common/fs/zfs/ddt.c | 9 +- .../opensolaris/uts/common/fs/zfs/dmu.c | 130 +-- .../uts/common/fs/zfs/dmu_objset.c | 9 - .../opensolaris/uts/common/fs/zfs/dmu_send.c | 35 +- .../uts/common/fs/zfs/dmu_traverse.c | 204 ++++- .../opensolaris/uts/common/fs/zfs/dmu_tx.c | 49 +- .../opensolaris/uts/common/fs/zfs/dnode.c | 95 +-- .../uts/common/fs/zfs/dnode_sync.c | 12 +- .../uts/common/fs/zfs/dsl_dataset.c | 311 ++++--- .../uts/common/fs/zfs/dsl_deadlist.c | 54 +- .../opensolaris/uts/common/fs/zfs/dsl_deleg.c | 8 +- .../opensolaris/uts/common/fs/zfs/dsl_dir.c | 42 +- .../opensolaris/uts/common/fs/zfs/dsl_pool.c | 182 +++-- .../opensolaris/uts/common/fs/zfs/dsl_scan.c | 186 +++-- .../uts/common/fs/zfs/dsl_synctask.c | 9 +- .../opensolaris/uts/common/fs/zfs/metaslab.c | 2 +- .../opensolaris/uts/common/fs/zfs/sa.c | 16 +- .../opensolaris/uts/common/fs/zfs/spa.c | 564 +++++++++++-- .../uts/common/fs/zfs/spa_config.c | 9 +- .../uts/common/fs/zfs/spa_history.c | 6 +- .../opensolaris/uts/common/fs/zfs/spa_misc.c | 87 +- .../opensolaris/uts/common/fs/zfs/space_map.c | 9 +- .../opensolaris/uts/common/fs/zfs/sys/arc.h | 8 + .../opensolaris/uts/common/fs/zfs/sys/bpobj.h | 3 + .../uts/common/fs/zfs/sys/bptree.h | 64 ++ .../opensolaris/uts/common/fs/zfs/sys/dmu.h | 97 ++- .../uts/common/fs/zfs/sys/dmu_objset.h | 1 - .../uts/common/fs/zfs/sys/dmu_traverse.h | 4 + .../opensolaris/uts/common/fs/zfs/sys/dnode.h | 2 +- .../uts/common/fs/zfs/sys/dsl_dataset.h | 10 +- .../uts/common/fs/zfs/sys/dsl_pool.h | 17 +- .../uts/common/fs/zfs/sys/dsl_scan.h | 4 + .../uts/common/fs/zfs/sys/sa_impl.h | 3 +- .../opensolaris/uts/common/fs/zfs/sys/spa.h | 17 +- .../uts/common/fs/zfs/sys/spa_impl.h | 10 +- .../opensolaris/uts/common/fs/zfs/sys/txg.h | 5 +- .../opensolaris/uts/common/fs/zfs/sys/vdev.h | 5 +- .../uts/common/fs/zfs/sys/vdev_impl.h | 3 +- .../opensolaris/uts/common/fs/zfs/sys/zap.h | 11 +- .../uts/common/fs/zfs/sys/zfeature.h | 52 ++ .../uts/common/fs/zfs/sys/zfs_debug.h | 7 + .../opensolaris/uts/common/fs/zfs/sys/zil.h | 2 + .../uts/common/fs/zfs/sys/zil_impl.h | 2 + .../opensolaris/uts/common/fs/zfs/sys/zio.h | 14 + .../uts/common/fs/zfs/sys/zio_impl.h | 2 +- .../opensolaris/uts/common/fs/zfs/txg.c | 3 +- .../opensolaris/uts/common/fs/zfs/vdev.c | 23 +- .../uts/common/fs/zfs/vdev_label.c | 136 +++- .../uts/common/fs/zfs/vdev_raidz.c | 6 +- .../opensolaris/uts/common/fs/zfs/zap.c | 34 +- .../opensolaris/uts/common/fs/zfs/zap_micro.c | 8 +- .../opensolaris/uts/common/fs/zfs/zfeature.c | 424 ++++++++++ .../opensolaris/uts/common/fs/zfs/zfs_debug.c | 5 +- .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 31 +- .../opensolaris/uts/common/fs/zfs/zfs_rlock.c | 9 +- .../uts/common/fs/zfs/zfs_vfsops.c | 44 +- .../opensolaris/uts/common/fs/zfs/zfs_vnops.c | 11 +- .../opensolaris/uts/common/fs/zfs/zfs_znode.c | 43 +- .../opensolaris/uts/common/fs/zfs/zil.c | 57 +- .../opensolaris/uts/common/fs/zfs/zio.c | 64 +- .../opensolaris/uts/common/fs/zfs/zvol.c | 10 + .../opensolaris/uts/common/sys/debug.h | 8 + .../opensolaris/uts/common/sys/fs/zfs.h | 21 +- .../opensolaris/uts/common/sys/nvpair.h | 68 ++ sys/modules/zfs/Makefile | 1 + 112 files changed, 6560 insertions(+), 1606 deletions(-) create mode 100644 cddl/contrib/opensolaris/cmd/zhack/zhack.c create mode 100644 cddl/contrib/opensolaris/cmd/zpool/zpool-features.7 create mode 100644 cddl/usr.sbin/zhack/Makefile create mode 100644 sys/cddl/compat/opensolaris/sys/assfail.h create mode 100644 sys/cddl/contrib/opensolaris/common/nvpair/fnvpair.c create mode 100644 sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.c create mode 100644 sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.h create mode 100644 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c create mode 100644 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bptree.h create mode 100644 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfeature.h create mode 100644 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfeature.c diff --git a/Makefile.inc1 b/Makefile.inc1 index 81f568598..5484af929 100644 --- a/Makefile.inc1 +++ b/Makefile.inc1 @@ -1120,7 +1120,7 @@ _prebuild_libs= ${_kerberos5_lib_libasn1} ${_kerberos5_lib_libheimntlm} \ lib/ncurses/ncurses lib/ncurses/ncursesw \ lib/libopie lib/libpam ${_lib_libthr} \ lib/libradius lib/libsbuf lib/libtacplus \ - ${_cddl_lib_libumem} \ + ${_cddl_lib_libumem} ${_cddl_lib_libnvpair} \ lib/libutil ${_lib_libypclnt} lib/libz lib/msun \ ${_secure_lib_libcrypto} ${_secure_lib_libssh} \ ${_secure_lib_libssl} @@ -1135,6 +1135,7 @@ lib/libopie__L lib/libtacplus__L: lib/libmd__L .if ${MK_CDDL} != "no" _cddl_lib_libumem= cddl/lib/libumem +_cddl_lib_libnvpair= cddl/lib/libnvpair _cddl_lib= cddl/lib .endif diff --git a/UPDATING b/UPDATING index 9097e272f..3b74427ae 100644 --- a/UPDATING +++ b/UPDATING @@ -15,6 +15,17 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 8.x IS SLOW ON IA64 OR SUN4V: debugging tools present in HEAD were left in place because sun4v support still needs work to become production ready. +20121130: + A new version of ZFS (pool version 5000) has been merged to 8-STABLE. + Starting with this version the old system of ZFS pool versioning + is superseded by "feature flags". This concept enables forward + compatibility against certain future changes in functionality of ZFS + pools. The first two read-only compatible "feature flags" for ZFS + pools are "com.delphix:async_destroy" and "com.delphix:empty_bpobj". + For more information read the new zpool-features(7) manual page. + Please refer to the "ZFS notes" section of this file for information + on upgrading boot ZFS pools. + 20121018: WITH_CTF can now be specified in src.conf (not recommended, there are some problems with static executables), make.conf (would also diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.8 b/cddl/contrib/opensolaris/cmd/zdb/zdb.8 index 953801248..e036b964f 100644 --- a/cddl/contrib/opensolaris/cmd/zdb/zdb.8 +++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.8 @@ -93,14 +93,14 @@ If specified multiple times, verify the checksums of all blocks. .It Fl C Display information about the configuration. If specified with no other options, instead display information about the cache file -.Ns ( Pa /etc/zfs/zpool.cache Ns ). +.Po Pa /etc/zfs/zpool.cache Pc . To specify the cache file to display, see .Fl U .Pp If specified multiple times, and a pool name is also specified display both the cached configuration and the on-disk configuration. If specified multiple times with -.FL e +.Fl e also display the configuration that would be used were the pool to be imported. .It Fl d @@ -135,7 +135,7 @@ option is also specified, also display the uberblocks on this device. .It Fl L Disable leak tracing and the loading of space maps. By default, -.Nm +.Nm verifies that all non-free blocks are referenced, which can be very expensive. .It Fl m Display the offset, spacemap, and free space of each metaslab. @@ -253,7 +253,7 @@ MOS Configuration: .Li # Ic zdb -d rpool Dataset mos [META], ID 0, cr_txg 4, 26.9M, 1051 objects Dataset rpool/swap [ZVOL], ID 59, cr_txg 356, 486M, 2 objects -... + ... .Ed .It Xo Sy Example 3 Display basic information about object 0 in .Sy 'rpool/export/home' @@ -272,7 +272,7 @@ Dataset rpool/export/home [ZPL], ID 137, cr_txg 1546, 32K, 8 objects .Li # Ic zdb -S rpool Simulated DDT histogram: -bucket allocated referenced +bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c b/cddl/contrib/opensolaris/cmd/zdb/zdb.c index ee169cede..2f8aa0e04 100644 --- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c +++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c @@ -18,8 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -54,6 +56,7 @@ #include #include #include +#include #undef ZFS_MAXNAMELEN #undef verify #include @@ -63,7 +66,8 @@ #define ZDB_CHECKSUM_NAME(idx) ((idx) < ZIO_CHECKSUM_FUNCTIONS ? \ zio_checksum_table[(idx)].ci_name : "UNKNOWN") #define ZDB_OT_NAME(idx) ((idx) < DMU_OT_NUMTYPES ? \ - dmu_ot[(idx)].ot_name : "UNKNOWN") + dmu_ot[(idx)].ot_name : DMU_OT_IS_VALID(idx) ? \ + dmu_ot_byteswap[DMU_OT_BYTESWAP(idx)].ob_name : "UNKNOWN") #define ZDB_OT_TYPE(idx) ((idx) < DMU_OT_NUMTYPES ? (idx) : DMU_OT_NUMTYPES) #ifndef lint @@ -1088,7 +1092,7 @@ dump_dsl_dataset(objset_t *os, uint64_t object, void *data, size_t size) ASSERT(size == sizeof (*ds)); crtime = ds->ds_creation_time; - zdb_nicenum(ds->ds_used_bytes, used); + zdb_nicenum(ds->ds_referenced_bytes, used); zdb_nicenum(ds->ds_compressed_bytes, compressed); zdb_nicenum(ds->ds_uncompressed_bytes, uncompressed); zdb_nicenum(ds->ds_unique_bytes, unique); @@ -1130,6 +1134,44 @@ dump_dsl_dataset(objset_t *os, uint64_t object, void *data, size_t size) (void) printf("\t\tbp = %s\n", blkbuf); } +/* ARGSUSED */ +static int +dump_bptree_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) +{ + char blkbuf[BP_SPRINTF_LEN]; + + if (bp->blk_birth != 0) { + sprintf_blkptr(blkbuf, bp); + (void) printf("\t%s\n", blkbuf); + } + return (0); +} + +static void +dump_bptree(objset_t *os, uint64_t obj, char *name) +{ + char bytes[32]; + bptree_phys_t *bt; + dmu_buf_t *db; + + if (dump_opt['d'] < 3) + return; + + VERIFY3U(0, ==, dmu_bonus_hold(os, obj, FTAG, &db)); + bt = db->db_data; + zdb_nicenum(bt->bt_bytes, bytes); + (void) printf("\n %s: %llu datasets, %s\n", + name, (unsigned long long)(bt->bt_end - bt->bt_begin), bytes); + dmu_buf_rele(db, FTAG); + + if (dump_opt['d'] < 5) + return; + + (void) printf("\n"); + + (void) bptree_iterate(os, obj, B_FALSE, dump_bptree_cb, NULL, NULL); +} + /* ARGSUSED */ static int dump_bpobj_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) @@ -1883,11 +1925,13 @@ typedef struct zdb_blkstats { */ #define ZDB_OT_DEFERRED (DMU_OT_NUMTYPES + 0) #define ZDB_OT_DITTO (DMU_OT_NUMTYPES + 1) -#define ZDB_OT_TOTAL (DMU_OT_NUMTYPES + 2) +#define ZDB_OT_OTHER (DMU_OT_NUMTYPES + 2) +#define ZDB_OT_TOTAL (DMU_OT_NUMTYPES + 3) static char *zdb_ot_extname[] = { "deferred free", "dedup ditto", + "other", "Total", }; @@ -1968,9 +2012,10 @@ zdb_blkptr_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp, arc_buf_t *pbuf, type = BP_GET_TYPE(bp); - zdb_count_block(zcb, zilog, bp, type); + zdb_count_block(zcb, zilog, bp, + (type & DMU_OT_NEWTYPE) ? ZDB_OT_OTHER : type); - is_metadata = (BP_GET_LEVEL(bp) != 0 || dmu_ot[type].ot_metadata); + is_metadata = (BP_GET_LEVEL(bp) != 0 || DMU_OT_IS_METADATA(type)); if (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata)) { int ioerr; @@ -2197,6 +2242,12 @@ dump_block_stats(spa_t *spa) (void) bpobj_iterate_nofree(&spa->spa_dsl_pool->dp_free_bpobj, count_block_cb, &zcb, NULL); } + if (spa_feature_is_active(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { + VERIFY3U(0, ==, bptree_iterate(spa->spa_meta_objset, + spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, + &zcb, NULL)); + } if (dump_opt['c'] > 1) flags |= TRAVERSE_PREFETCH_DATA; @@ -2373,7 +2424,7 @@ zdb_ddt_add_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp, } if (BP_IS_HOLE(bp) || BP_GET_CHECKSUM(bp) == ZIO_CHECKSUM_OFF || - BP_GET_LEVEL(bp) > 0 || dmu_ot[BP_GET_TYPE(bp)].ot_metadata) + BP_GET_LEVEL(bp) > 0 || DMU_OT_IS_METADATA(BP_GET_TYPE(bp))) return (0); ddt_key_fill(&zdde_search.zdde_key, bp); @@ -2478,7 +2529,14 @@ dump_zpool(spa_t *spa) dump_bpobj(&spa->spa_deferred_bpobj, "Deferred frees"); if (spa_version(spa) >= SPA_VERSION_DEADLISTS) { dump_bpobj(&spa->spa_dsl_pool->dp_free_bpobj, - "Pool frees"); + "Pool snapshot frees"); + } + + if (spa_feature_is_active(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { + dump_bptree(spa->spa_meta_objset, + spa->spa_dsl_pool->dp_bptree_obj, + "Pool dataset frees"); } dump_dtl(spa->spa_root_vdev, 0); } diff --git a/cddl/contrib/opensolaris/cmd/zfs/zfs.8 b/cddl/contrib/opensolaris/cmd/zfs/zfs.8 index 9147c2b07..80ce9784d 100644 --- a/cddl/contrib/opensolaris/cmd/zfs/zfs.8 +++ b/cddl/contrib/opensolaris/cmd/zfs/zfs.8 @@ -22,7 +22,7 @@ .\" Copyright (c) 2012 Nexenta Systems, Inc. All Rights Reserved. .\" Copyright (c) 2012, Joyent, Inc. All rights reserved. .\" Copyright (c) 2011, Pawel Jakub Dawidek -.\" Copyright (c) 2012, Bryan Drewery +.\" Copyright (c) 2012, Glen Barber .\" .\" $FreeBSD$ .\" @@ -57,8 +57,8 @@ .Op Fl dnpRrv .Sm off .Ar snapshot -.Ns Op % Ns Ar snapname -.Ns Op , Ns Ar ... +.Op % Ns Ar snapname +.Op , Ns Ar ... .Sm on .Nm .Cm snapshot @@ -136,17 +136,21 @@ .Fl a | Ar filesystem .Nm .Cm userspace -.Op Fl niHp +.Op Fl Hinp .Op Fl o Ar field Ns Op , Ns Ar ... -.Op Fl sS Ar field +.Op Fl s Ar field +.Ar ... +.Op Fl S Ar field .Ar ... .Op Fl t Ar type Ns Op , Ns Ar ... .Ar filesystem Ns | Ns Ar snapshot .Nm .Cm groupspace -.Op Fl niHp +.Op Fl Hinp .Op Fl o Ar field Ns Op , Ns Ar ... -.Op Fl sS Ar field +.Op Fl s Ar field +.Ar ... +.Op Fl S Ar field .Ar ... .Op Fl t Ar type Ns Op , Ns Ar ... .Ar filesystem Ns | Ns Ar snapshot @@ -619,7 +623,7 @@ privilege with .Qq Nm Cm allow , can access everyone's usage. .Pp -The +The .Sy userused@ Ns ... properties are not displayed by .Qq Nm Cm get all . @@ -819,7 +823,7 @@ command or unmounted by the command. .Pp This property is not inherited. -.It Sy checksum Ns = Ns Cm on | off | fletcher2 | fletcher4 +.It Sy checksum Ns = Ns Cm on | off | fletcher2 | fletcher4 | sha256 Controls the checksum used to verify data integrity. The default value is .Cm on , which automatically selects an appropriate algorithm (currently, @@ -1129,7 +1133,7 @@ will not use configured pool log devices. will instead optimize synchronous operations for global pool throughput and efficient use of resources. .It Sy snapdir Ns = Ns Cm hidden | visible -Controls whether the +Controls whether the .Pa \&.zfs directory is hidden or visible in the root of the file system as discussed in the @@ -1196,7 +1200,7 @@ are not reflected in the reservation. The .Sy vscan property is currently not supported on -.Fx . +.Fx . .It Sy xattr Ns = Ns Cm off | on The .Sy xattr @@ -1283,7 +1287,7 @@ properties. The correlation between properties and mount options is as follows: In addition, these options can be set on a per-mount basis using the .Fl o option, without affecting the property that is stored on disk. The values -specified on the command line override the values stored in the dataset. These +specified on the command line override the values stored in the dataset. These properties are reported as "temporary" by the .Qq Nm Cm get command. If the properties are changed while the dataset is mounted, the new @@ -1322,7 +1326,7 @@ domain name for the .Ar module component of property names to reduce the chance that two independently-developed packages use the same property name for different -purposes. Property names beginning with +purposes. Property names beginning with .Em com.sun are reserved for use by Sun Microsystems. .Pp @@ -1489,8 +1493,8 @@ behavior for mounted file systems in use. .Op Fl dnpRrv .Sm off .Ar snapshot -.Ns Op % Ns Ar snapname -.Ns Op , Ns Ar ... +.Op % Ns Ar snapname +.Op , Ns Ar ... .Sm on .Xc .Pp @@ -1978,9 +1982,11 @@ Upgrade the specified file system. .It Xo .Nm .Cm userspace -.Op Fl niHp +.Op Fl Hinp .Op Fl o Ar field Ns Op , Ns Ar ... -.Op Fl sS Ar field +.Op Fl s Ar field +.Ar ... +.Op Fl S Ar field .Ar ... .Op Fl t Ar type Ns Op , Ns Ar ... .Ar filesystem Ns | Ns Ar snapshot @@ -1998,9 +2004,9 @@ Print numeric ID instead of user/group name. .It Fl H Do not print headers, use tab-delimited output. .It Fl p -Use exact (parseable) numeric output. +Use exact (parsable) numeric output. .It Fl o Ar field Ns Op , Ns Ar ... -Display only the specified fields from the following set, +Display only the specified fields from the following set: .Sy type,name,used,quota . The default is to display all fields. .It Fl s Ar field @@ -2015,7 +2021,7 @@ another. The default is Sort by this field in reverse order. See .Fl s . .It Fl t Ar type Ns Op , Ns Ar ... -Print only the specified types from the following set, +Print only the specified types from the following set: .Sy all,posixuser,smbuser,posixgroup,smbgroup . .Pp The default is @@ -2029,9 +2035,11 @@ Translate SID to POSIX ID. This flag currently has no effect on .It Xo .Nm .Cm groupspace -.Op Fl niHp +.Op Fl Hinp .Op Fl o Ar field Ns Op , Ns Ar ... -.Op Fl sS Ar field +.Op Fl s Ar field +.Ar ... +.Op Fl S Ar field .Ar ... .Op Fl t Ar type Ns Op , Ns Ar ... .Ar filesystem Ns | Ns Ar snapshot @@ -2209,7 +2217,7 @@ and it is assumed to be from the same file system as the last .Ar snapshot . .Pp If the destination is a clone, the source may be the origin snapshot, which -must be fully specified (for example, +must be fully specified (for example, .Cm pool/fs@origin , not just .Cm @origin ) . @@ -2458,24 +2466,26 @@ subcommand or change a property. The following permissions are available: .Bl -column -offset 4n "secondarycache" "subcommand" .It NAME Ta TYPE Ta NOTES -.It Xo allow Ta subcommand Ta Must +.It allow Ta subcommand Ta Must Xo also have the permission that is being allowed .Xc -.It Xo clone Ta subcommand Ta Must +.It clone Ta subcommand Ta Must Xo also have the 'create' ability and 'mount' ability in the origin file system .Xc .It create Ta subcommand Ta Must also have the 'mount' ability .It destroy Ta subcommand Ta Must also have the 'mount' ability +.It diff Ta subcommand Ta Allows lookup of paths within a dataset given an +object number, and the ability to create snapshots necessary to 'zfs diff' .It hold Ta subcommand Ta Allows adding a user hold to a snapshot .It mount Ta subcommand Ta Allows mount/umount of Tn ZFS No datasets -.It Xo promote Ta subcommand Ta Must +.It promote Ta subcommand Ta Must Xo also have the 'mount' and 'promote' ability in the origin file system .Xc .It receive Ta subcommand Ta Must also have the 'mount' and 'create' ability -.It Xo release Ta subcommand Ta Allows +.It release Ta subcommand Ta Allows Xo releasing a user hold which might destroy the snapshot .Xc -.It Xo rename Ta subcommand Ta Must +.It rename Ta subcommand Ta Must Xo also have the 'mount' and 'create' ability in the new parent .Xc .It rollback Ta subcommand Ta Must also have the 'mount' ability @@ -2491,7 +2501,6 @@ protocol .It userprop Ta other Ta Allows changing any user property .It userquota Ta other Ta Allows accessing any userquota@... property .It userused Ta other Ta Allows reading any userused@... property -.It Ta .It aclinherit Ta property .It aclmode Ta property .It atime Ta property @@ -2669,43 +2678,42 @@ descendent file systems. .Op Ar snapshot Ns | Ns Ar filesystem .Xc .Pp -Describes differences between a snapshot and a successor dataset. The -successor dataset can be a later snapshot or the current filesystem. -.Pp -The changed files are displayed including the change type. The change type -is displayed useing a single character. If a file or directory was renamed, -the old and the new names are displayed. -.Pp -The following change types can be displayed: -.Pp -.Bl -column -offset indent "CHARACTER" "CHANGE TYPE" -.It CHARACTER Ta CHANGE TYPE -.It \&+ Ta file was added -.It \&- Ta file was removed -.It \&M Ta file was modified -.It \&R Ta file was renamed +Display the difference between a snapshot of a given filesystem and another +snapshot of that filesystem from a later time or the current contents of the +filesystem. The first column is a character indicating the type of change, +the other columns indicate pathname, new pathname +.Pq in case of rename , +change in link count, and optionally file type and/or change time. +.Pp +The types of change are: +.Bl -column -offset 2n indent +.It \&- Ta path was removed +.It \&+ Ta path was added +.It \&M Ta path was modified +.It \&R Ta path was renamed .El .Bl -tag -width indent .It Fl F -Display a single letter for the file type in second to last column. -.Pp -The following file types can be displayed: -.Pp -.Bl -column -offset indent "CHARACTER" "FILE TYPE" -.It CHARACTER Ta FILE TYPE -.It \&F Ta file -.It \&/ Ta directory +Display an indication of the type of file, in a manner similar to the +.Fl F +option of +.Xr ls 1 . +.Bl -column -offset 2n indent .It \&B Ta block device +.It \&C Ta character device +.It \&F Ta regular file +.It \&/ Ta directory .It \&@ Ta symbolic link .It \&= Ta socket .It \&> Ta door (not supported on Fx ) -.It \&| Ta FIFO (not supported on Fx ) -.It \&P Ta event portal (not supported on Fx ) +.It \&| Ta named pipe (not supported on Fx ) +.It \&P Ta event port (not supported on Fx ) .El .It Fl H -Machine-parseable output, fields separated a tab character. +Give more parseable tab-separated output, without header lines and without +arrows. .It Fl t -Display a change timestamp in the first column. +Display the path's inode change time as the first column of output. .El .It Xo .Nm @@ -2742,6 +2750,16 @@ Detaches the specified from the jail identified by JID .Ar jailid . .El +.Sh EXIT STATUS +The following exit values are returned: +.Bl -tag -offset 2n -width 2n +.It 0 +Successful completion. +.It 1 +An error occurred. +.It 2 +Invalid command line options were specified. +.El .Sh EXAMPLES .Bl -tag -width 0n .It Sy Example 1 No Creating a Tn ZFS No File System Hierarchy @@ -2807,7 +2825,7 @@ Snapshots are displayed if the .Sy listsnaps property is .Cm on . -The default is +The default is .Cm off . See .Xr zpool 8 @@ -3158,16 +3176,21 @@ Local+Descendent permissions on (tank/users) group staff @pset,create,mount ------------------------------------------------------------- .Ed -.El -.Sh EXIT STATUS -The following exit values are returned: -.Bl -tag -offset 2n -width 2n -.It 0 -Successful completion. -.It 1 -An error occurred. -.It 2 -Invalid command line options were specified. +.It Sy Example 22 Showing the differences between a snapshot and a ZFS Dataset +.Pp +The following example shows how to see what has changed between a prior +snapshot of a ZFS Dataset and its current state. The +.Fl F +option is used to indicate type information for the files affected. +.Bd -literal -offset 2n +.Li # Ic zfs diff tank/test@before tank/test +M / /tank/test/ +M F /tank/test/linked (+1) +R F /tank/test/oldname -> /tank/test/newname +- F /tank/test/deleted ++ F /tank/test/created +M F /tank/test/modified +.Ed .El .Sh SEE ALSO .Xr chmod 2 , diff --git a/cddl/contrib/opensolaris/cmd/zfs/zfs_main.c b/cddl/contrib/opensolaris/cmd/zfs/zfs_main.c index e122d36be..688db9b11 100644 --- a/cddl/contrib/opensolaris/cmd/zfs/zfs_main.c +++ b/cddl/contrib/opensolaris/cmd/zfs/zfs_main.c @@ -304,13 +304,13 @@ get_usage(zfs_help_t idx) "\tunallow [-r] -s @setname [[,...]] " "\n")); case HELP_USERSPACE: - return (gettext("\tuserspace [-niHp] [-o field[,...]] " - "[-sS field] ... [-t type[,...]]\n" - "\t \n")); + return (gettext("\tuserspace [-Hinp] [-o field[,...]] " + "[-s field] ...\n\t[-S field] ... " + "[-t type[,...]] \n")); case HELP_GROUPSPACE: - return (gettext("\tgroupspace [-niHp] [-o field[,...]] " - "[-sS field] ... [-t type[,...]]\n" - "\t \n")); + return (gettext("\tgroupspace [-Hinp] [-o field[,...]] " + "[-s field] ...\n\t[-S field] ... " + "[-t type[,...]] \n")); case HELP_HOLD: return (gettext("\thold [-r] ...\n")); case HELP_HOLDS: @@ -1081,7 +1081,7 @@ snapshot_to_nvl_cb(zfs_handle_t *zhp, void *arg) int err = 0; /* Check for clones. */ - if (!cb->cb_doclones) { + if (!cb->cb_doclones && !cb->cb_defer_destroy) { cb->cb_target = zhp; cb->cb_first = B_TRUE; err = zfs_iter_dependents(zhp, B_TRUE, @@ -2057,30 +2057,52 @@ zfs_do_upgrade(int argc, char **argv) return (ret); } -#define USTYPE_USR_BIT (0) -#define USTYPE_GRP_BIT (1) -#define USTYPE_PSX_BIT (2) -#define USTYPE_SMB_BIT (3) - -#define USTYPE_USR (1 << USTYPE_USR_BIT) -#define USTYPE_GRP (1 << USTYPE_GRP_BIT) - -#define USTYPE_PSX (1 << USTYPE_PSX_BIT) -#define USTYPE_SMB (1 << USTYPE_SMB_BIT) - -#define USTYPE_PSX_USR (USTYPE_PSX | USTYPE_USR) -#define USTYPE_SMB_USR (USTYPE_SMB | USTYPE_USR) -#define USTYPE_PSX_GRP (USTYPE_PSX | USTYPE_GRP) -#define USTYPE_SMB_GRP (USTYPE_SMB | USTYPE_GRP) -#define USTYPE_ALL (USTYPE_PSX_USR | USTYPE_SMB_USR \ - | USTYPE_PSX_GRP | USTYPE_SMB_GRP) +/* + * zfs userspace [-Hinp] [-o field[,...]] [-s field [-s field]...] + * [-S field [-S field]...] [-t type[,...]] filesystem | snapshot + * zfs groupspace [-Hinp] [-o field[,...]] [-s field [-s field]...] + * [-S field [-S field]...] [-t type[,...]] filesystem | snapshot + * + * -H Scripted mode; elide headers and separate columns by tabs. + * -i Translate SID to POSIX ID. + * -n Print numeric ID instead of user/group name. + * -o Control which fields to display. + * -p Use exact (parseable) numeric output. + * -s Specify sort columns, descending order. + * -S Specify sort columns, ascending order. + * -t Control which object types to display. + * + * Displays space consumed by, and quotas on, each user in the specified + * filesystem or snapshot. + */ +/* us_field_types, us_field_hdr and us_field_names should be kept in sync */ +enum us_field_types { + USFIELD_TYPE, + USFIELD_NAME, + USFIELD_USED, + USFIELD_QUOTA +}; +static char *us_field_hdr[] = { "TYPE", "NAME", "USED", "QUOTA" }; +static char *us_field_names[] = { "type", "name", "used", "quota" }; +#define USFIELD_LAST (sizeof (us_field_names) / sizeof (char *)) -#define USPROP_USED_BIT (0) -#define USPROP_QUOTA_BIT (1) +#define USTYPE_PSX_GRP (1 << 0) +#define USTYPE_PSX_USR (1 << 1) +#define USTYPE_SMB_GRP (1 << 2) +#define USTYPE_SMB_USR (1 << 3) +#define USTYPE_ALL \ + (USTYPE_PSX_GRP | USTYPE_PSX_USR | USTYPE_SMB_GRP | USTYPE_SMB_USR) -#define USPROP_USED (1 << USPROP_USED_BIT) -#define USPROP_QUOTA (1 << USPROP_QUOTA_BIT) +static int us_type_bits[] = { + USTYPE_PSX_GRP, + USTYPE_PSX_USR, + USTYPE_SMB_GRP, + USTYPE_SMB_USR, + USTYPE_ALL +}; +static char *us_type_names[] = { "posixgroup", "posxiuser", "smbgroup", + "smbuser", "all" }; typedef struct us_node { nvlist_t *usn_nvl; @@ -2089,37 +2111,49 @@ typedef struct us_node { } us_node_t; typedef struct us_cbdata { - nvlist_t **cb_nvlp; - uu_avl_pool_t *cb_avl_pool; - uu_avl_t *cb_avl; - boolean_t cb_numname; - boolean_t cb_nicenum; - boolean_t cb_sid2posix; - zfs_userquota_prop_t cb_prop; - zfs_sort_column_t *cb_sortcol; - size_t cb_max_typelen; - size_t cb_max_namelen; - size_t cb_max_usedlen; - size_t cb_max_quotalen; + nvlist_t **cb_nvlp; + uu_avl_pool_t *cb_avl_pool; + uu_avl_t *cb_avl; + boolean_t cb_numname; + boolean_t cb_nicenum; + boolean_t cb_sid2posix; + zfs_userquota_prop_t cb_prop; + zfs_sort_column_t *cb_sortcol; + size_t cb_width[USFIELD_LAST]; } us_cbdata_t; +static boolean_t us_populated = B_FALSE; + typedef struct { zfs_sort_column_t *si_sortcol; - boolean_t si_num_name; - boolean_t si_parsable; + boolean_t si_numname; } us_sort_info_t; +static int +us_field_index(char *field) +{ + int i; + + for (i = 0; i < USFIELD_LAST; i++) { + if (strcmp(field, us_field_names[i]) == 0) + return (i); + } + + return (-1); +} + static int us_compare(const void *larg, const void *rarg, void *unused) { const us_node_t *l = larg; const us_node_t *r = rarg; - int rc = 0; us_sort_info_t *si = (us_sort_info_t *)unused; zfs_sort_column_t *sortcol = si->si_sortcol; - boolean_t num_name = si->si_num_name; + boolean_t numname = si->si_numname; nvlist_t *lnvl = l->usn_nvl; nvlist_t *rnvl = r->usn_nvl; + int rc = 0; + boolean_t lvb, rvb; for (; sortcol != NULL; sortcol = sortcol->sc_next) { char *lvstr = ""; @@ -2138,17 +2172,17 @@ us_compare(const void *larg, const void *rarg, void *unused) (void) nvlist_lookup_uint32(lnvl, propname, &lv32); (void) nvlist_lookup_uint32(rnvl, propname, &rv32); if (rv32 != lv32) - rc = (rv32 > lv32) ? 1 : -1; + rc = (rv32 < lv32) ? 1 : -1; break; case ZFS_PROP_NAME: propname = "name"; - if (num_name) { - (void) nvlist_lookup_uint32(lnvl, propname, - &lv32); - (void) nvlist_lookup_uint32(rnvl, propname, - &rv32); - if (rv32 != lv32) - rc = (rv32 > lv32) ? 1 : -1; + if (numname) { + (void) nvlist_lookup_uint64(lnvl, propname, + &lv64); + (void) nvlist_lookup_uint64(rnvl, propname, + &rv64); + if (rv64 != lv64) + rc = (rv64 < lv64) ? 1 : -1; } else { (void) nvlist_lookup_string(lnvl, propname, &lvstr); @@ -2157,27 +2191,40 @@ us_compare(const void *larg, const void *rarg, void *unused) rc = strcmp(lvstr, rvstr); } break; - case ZFS_PROP_USED: case ZFS_PROP_QUOTA: - if (ZFS_PROP_USED == prop) + if (!us_populated) + break; + if (prop == ZFS_PROP_USED) propname = "used"; else propname = "quota"; (void) nvlist_lookup_uint64(lnvl, propname, &lv64); (void) nvlist_lookup_uint64(rnvl, propname, &rv64); if (rv64 != lv64) - rc = (rv64 > lv64) ? 1 : -1; + rc = (rv64 < lv64) ? 1 : -1; + break; } - if (rc) + if (rc != 0) { if (rc < 0) return (reverse ? 1 : -1); else return (reverse ? -1 : 1); + } } - return (rc); + /* + * If entries still seem to be the same, check if they are of the same + * type (smbentity is added only if we are doing SID to POSIX ID + * translation where we can have duplicate type/name combinations). + */ + if (nvlist_lookup_boolean_value(lnvl, "smbentity", &lvb) == 0 && + nvlist_lookup_boolean_value(rnvl, "smbentity", &rvb) == 0 && + lvb != rvb) + return (lvb < rvb ? -1 : 1); + + return (0); } static inline const char * @@ -2197,9 +2244,6 @@ us_type2str(unsigned field_type) } } -/* - * zfs userspace - */ static int userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space) { @@ -2207,7 +2251,6 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space) zfs_userquota_prop_t prop = cb->cb_prop; char *name = NULL; char *propname; - char namebuf[32]; char sizebuf[32]; us_node_t *node; uu_avl_pool_t *avl_pool = cb->cb_avl_pool; @@ -2221,32 +2264,30 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space) size_t namelen; size_t typelen; size_t sizelen; + int typeidx, nameidx, sizeidx; us_sort_info_t sortinfo = { sortcol, cb->cb_numname }; + boolean_t smbentity = B_FALSE; - if (domain == NULL || domain[0] == '\0') { - /* POSIX */ - if (prop == ZFS_PROP_GROUPUSED || prop == ZFS_PROP_GROUPQUOTA) { - type = USTYPE_PSX_GRP; - struct group *g = getgrgid(rid); - if (g) - name = g->gr_name; - } else { - type = USTYPE_PSX_USR; - struct passwd *p = getpwuid(rid); - if (p) - name = p->pw_name; - } - } else { - char sid[ZFS_MAXNAMELEN+32]; + if (nvlist_alloc(&props, NV_UNIQUE_NAME, 0) != 0) + nomem(); + node = safe_malloc(sizeof (us_node_t)); + uu_avl_node_init(node, &node->usn_avlnode, avl_pool); + node->usn_nvl = props; + + if (domain != NULL && domain[0] != '\0') { + /* SMB */ + char sid[ZFS_MAXNAMELEN + 32]; uid_t id; uint64_t classes; #ifdef sun - int err = 0; + int err; directory_error_t e; #endif + smbentity = B_TRUE; + (void) snprintf(sid, sizeof (sid), "%s-%u", domain, rid); - /* SMB */ + if (prop == ZFS_PROP_GROUPUSED || prop == ZFS_PROP_GROUPQUOTA) { type = USTYPE_SMB_GRP; #ifdef sun @@ -2262,217 +2303,139 @@ userspace_cb(void *arg, const char *domain, uid_t rid, uint64_t space) #ifdef sun if (err == 0) { rid = id; - - e = directory_name_from_sid(NULL, sid, &name, &classes); - if (e != NULL) { - directory_error_free(e); - return (NULL); + if (!cb->cb_sid2posix) { + e = directory_name_from_sid(NULL, sid, &name, + &classes); + if (e != NULL) + directory_error_free(e); + if (name == NULL) + name = sid; } - - if (name == NULL) - name = sid; } #endif } -/* - * if (prop == ZFS_PROP_GROUPUSED || prop == ZFS_PROP_GROUPQUOTA) - * ug = "group"; - * else - * ug = "user"; - */ - - if (prop == ZFS_PROP_USERUSED || prop == ZFS_PROP_GROUPUSED) - propname = "used"; - else - propname = "quota"; - - (void) snprintf(namebuf, sizeof (namebuf), "%u", rid); - if (name == NULL) - name = namebuf; - - if (cb->cb_nicenum) - zfs_nicenum(space, sizebuf, sizeof (sizebuf)); - else - (void) sprintf(sizebuf, "%llu", space); + if (cb->cb_sid2posix || domain == NULL || domain[0] == '\0') { + /* POSIX or -i */ + if (prop == ZFS_PROP_GROUPUSED || prop == ZFS_PROP_GROUPQUOTA) { + type = USTYPE_PSX_GRP; + if (!cb->cb_numname) { + struct group *g; - node = safe_malloc(sizeof (us_node_t)); - uu_avl_node_init(node, &node->usn_avlnode, avl_pool); + if ((g = getgrgid(rid)) != NULL) + name = g->gr_name; + } + } else { + type = USTYPE_PSX_USR; + if (!cb->cb_numname) { + struct passwd *p; - if (nvlist_alloc(&props, NV_UNIQUE_NAME, 0) != 0) { - free(node); - return (-1); + if ((p = getpwuid(rid)) != NULL) + name = p->pw_name; + } + } } + /* + * Make sure that the type/name combination is unique when doing + * SID to POSIX ID translation (hence changing the type from SMB to + * POSIX). + */ + if (cb->cb_sid2posix && + nvlist_add_boolean_value(props, "smbentity", smbentity) != 0) + nomem(); + + /* Calculate/update width of TYPE field */ + typestr = us_type2str(type); + typelen = strlen(gettext(typestr)); + typeidx = us_field_index("type"); + if (typelen > cb->cb_width[typeidx]) + cb->cb_width[typeidx] = typelen; if (nvlist_add_uint32(props, "type", type) != 0) nomem(); - if (cb->cb_numname) { - if (nvlist_add_uint32(props, "name", rid) != 0) + /* Calculate/update width of NAME field */ + if ((cb->cb_numname && cb->cb_sid2posix) || name == NULL) { + if (nvlist_add_uint64(props, "name", rid) != 0) nomem(); - namelen = strlen(namebuf); + namelen = snprintf(NULL, 0, "%u", rid); } else { if (nvlist_add_string(props, "name", name) != 0) nomem(); namelen = strlen(name); } + nameidx = us_field_index("name"); + if (namelen > cb->cb_width[nameidx]) + cb->cb_width[nameidx] = namelen; - typestr = us_type2str(type); - typelen = strlen(gettext(typestr)); - if (typelen > cb->cb_max_typelen) - cb->cb_max_typelen = typelen; - - if (namelen > cb->cb_max_namelen) - cb->cb_max_namelen = namelen; - - sizelen = strlen(sizebuf); - if (0 == strcmp(propname, "used")) { - if (sizelen > cb->cb_max_usedlen) - cb->cb_max_usedlen = sizelen; - } else { - if (sizelen > cb->cb_max_quotalen) - cb->cb_max_quotalen = sizelen; - } - - node->usn_nvl = props; - - n = uu_avl_find(avl, node, &sortinfo, &idx); - if (n == NULL) + /* + * Check if this type/name combination is in the list and update it; + * otherwise add new node to the list. + */ + if ((n = uu_avl_find(avl, node, &sortinfo, &idx)) == NULL) { uu_avl_insert(avl, node, idx); - else { + } else { nvlist_free(props); free(node); node = n; props = node->usn_nvl; } + /* Calculate/update width of USED/QUOTA fields */ + if (cb->cb_nicenum) + zfs_nicenum(space, sizebuf, sizeof (sizebuf)); + else + (void) snprintf(sizebuf, sizeof (sizebuf), "%llu", space); + sizelen = strlen(sizebuf); + if (prop == ZFS_PROP_USERUSED || prop == ZFS_PROP_GROUPUSED) { + propname = "used"; + if (!nvlist_exists(props, "quota")) + (void) nvlist_add_uint64(props, "quota", 0); + } else { + propname = "quota"; + if (!nvlist_exists(props, "used")) + (void) nvlist_add_uint64(props, "used", 0); + } + sizeidx = us_field_index(propname); + if (sizelen > cb->cb_width[sizeidx]) + cb->cb_width[sizeidx] = sizelen; + if (nvlist_add_uint64(props, propname, space) != 0) nomem(); return (0); } -static inline boolean_t -usprop_check(zfs_userquota_prop_t p, unsigned types, unsigned props) -{ - unsigned type; - unsigned prop; - - switch (p) { - case ZFS_PROP_USERUSED: - type = USTYPE_USR; - prop = USPROP_USED; - break; - case ZFS_PROP_USERQUOTA: - type = USTYPE_USR; - prop = USPROP_QUOTA; - break; - case ZFS_PROP_GROUPUSED: - type = USTYPE_GRP; - prop = USPROP_USED; - break; - case ZFS_PROP_GROUPQUOTA: - type = USTYPE_GRP; - prop = USPROP_QUOTA; - break; - default: /* ALL */ - return (B_TRUE); - }; - - return (type & types && prop & props); -} - -#define USFIELD_TYPE (1 << 0) -#define USFIELD_NAME (1 << 1) -#define USFIELD_USED (1 << 2) -#define USFIELD_QUOTA (1 << 3) -#define USFIELD_ALL (USFIELD_TYPE | USFIELD_NAME | USFIELD_USED | USFIELD_QUOTA) - -static int -parsefields(unsigned *fieldsp, char **names, unsigned *bits, size_t len) -{ - char *field = optarg; - char *delim; - - do { - int i; - boolean_t found = B_FALSE; - delim = strchr(field, ','); - if (delim != NULL) - *delim = '\0'; - - for (i = 0; i < len; i++) - if (0 == strcmp(field, names[i])) { - found = B_TRUE; - *fieldsp |= bits[i]; - break; - } - - if (!found) { - (void) fprintf(stderr, gettext("invalid type '%s'" - "for -t option\n"), field); - return (-1); - } - - field = delim + 1; - } while (delim); - - return (0); -} - - -static char *type_names[] = { "posixuser", "smbuser", "posixgroup", "smbgroup", - "all" }; -static unsigned type_bits[] = { - USTYPE_PSX_USR, - USTYPE_SMB_USR, - USTYPE_PSX_GRP, - USTYPE_SMB_GRP, - USTYPE_ALL -}; - -static char *us_field_names[] = { "type", "name", "used", "quota" }; -static unsigned us_field_bits[] = { - USFIELD_TYPE, - USFIELD_NAME, - USFIELD_USED, - USFIELD_QUOTA -}; - static void -print_us_node(boolean_t scripted, boolean_t parseable, unsigned fields, - size_t type_width, size_t name_width, size_t used_width, - size_t quota_width, us_node_t *node) +print_us_node(boolean_t scripted, boolean_t parsable, int *fields, int types, + size_t *width, us_node_t *node) { nvlist_t *nvl = node->usn_nvl; - nvpair_t *nvp = NULL; char valstr[ZFS_MAXNAMELEN]; boolean_t first = B_TRUE; - boolean_t quota_found = B_FALSE; + int cfield = 0; + int field; + uint32_t ustype; - if (fields & USFIELD_QUOTA && !nvlist_exists(nvl, "quota")) - if (nvlist_add_string(nvl, "quota", "none") != 0) - nomem(); + /* Check type */ + (void) nvlist_lookup_uint32(nvl, "type", &ustype); + if (!(ustype & types)) + return; - while ((nvp = nvlist_next_nvpair(nvl, nvp)) != NULL) { - char *pname = nvpair_name(nvp); - data_type_t type = nvpair_type(nvp); - uint32_t val32 = 0; - uint64_t val64 = 0; + while ((field = fields[cfield]) != USFIELD_LAST) { + nvpair_t *nvp = NULL; + data_type_t type; + uint32_t val32; + uint64_t val64; char *strval = NULL; - unsigned field = 0; - unsigned width = 0; - int i; - for (i = 0; i < 4; i++) { - if (0 == strcmp(pname, us_field_names[i])) { - field = us_field_bits[i]; + + while ((nvp = nvlist_next_nvpair(nvl, nvp)) != NULL) { + if (strcmp(nvpair_name(nvp), + us_field_names[field]) == 0) break; - } } - if (!(field & fields)) - continue; - + type = nvpair_type(nvp); switch (type) { case DATA_TYPE_UINT32: (void) nvpair_value_uint32(nvp, &val32); @@ -2484,99 +2447,86 @@ print_us_node(boolean_t scripted, boolean_t parseable, unsigned fields, (void) nvpair_value_string(nvp, &strval); break; default: - (void) fprintf(stderr, "Invalid data type\n"); + (void) fprintf(stderr, "invalid data type\n"); } - if (!first) - if (scripted) - (void) printf("\t"); - else - (void) printf(" "); - switch (field) { case USFIELD_TYPE: strval = (char *)us_type2str(val32); - width = type_width; break; case USFIELD_NAME: if (type == DATA_TYPE_UINT64) { (void) sprintf(valstr, "%llu", val64); strval = valstr; } - width = name_width; break; case USFIELD_USED: case USFIELD_QUOTA: if (type == DATA_TYPE_UINT64) { - (void) nvpair_value_uint64(nvp, &val64); - if (parseable) + if (parsable) { (void) sprintf(valstr, "%llu", val64); - else + } else { zfs_nicenum(val64, valstr, sizeof (valstr)); - strval = valstr; - } - - if (field == USFIELD_USED) - width = used_width; - else { - quota_found = B_FALSE; - width = quota_width; + } + if (field == USFIELD_QUOTA && + strcmp(valstr, "0") == 0) + strval = "none"; + else + strval = valstr; } - break; } - if (field == USFIELD_QUOTA && !quota_found) - (void) printf("%*s", width, strval); - else { - if (type == DATA_TYPE_STRING) - (void) printf("%-*s", width, strval); + if (!first) { + if (scripted) + (void) printf("\t"); else - (void) printf("%*s", width, strval); + (void) printf(" "); } + if (scripted) + (void) printf("%s", strval); + else if (field == USFIELD_TYPE || field == USFIELD_NAME) + (void) printf("%-*s", width[field], strval); + else + (void) printf("%*s", width[field], strval); first = B_FALSE; - + cfield++; } (void) printf("\n"); } static void -print_us(boolean_t scripted, boolean_t parsable, unsigned fields, - unsigned type_width, unsigned name_width, unsigned used_width, - unsigned quota_width, boolean_t rmnode, uu_avl_t *avl) +print_us(boolean_t scripted, boolean_t parsable, int *fields, int types, + size_t *width, boolean_t rmnode, uu_avl_t *avl) { - static char *us_field_hdr[] = { "TYPE", "NAME", "USED", "QUOTA" }; us_node_t *node; const char *col; - int i; - size_t width[4] = { type_width, name_width, used_width, quota_width }; + int cfield = 0; + int field; if (!scripted) { boolean_t first = B_TRUE; - for (i = 0; i < 4; i++) { - unsigned field = us_field_bits[i]; - if (!(field & fields)) - continue; - col = gettext(us_field_hdr[i]); - if (field == USFIELD_TYPE || field == USFIELD_NAME) - (void) printf(first?"%-*s":" %-*s", width[i], - col); - else - (void) printf(first?"%*s":" %*s", width[i], - col); + while ((field = fields[cfield]) != USFIELD_LAST) { + col = gettext(us_field_hdr[field]); + if (field == USFIELD_TYPE || field == USFIELD_NAME) { + (void) printf(first ? "%-*s" : " %-*s", + width[field], col); + } else { + (void) printf(first ? "%*s" : " %*s", + width[field], col); + } first = B_FALSE; + cfield++; } (void) printf("\n"); } - for (node = uu_avl_first(avl); node != NULL; - node = uu_avl_next(avl, node)) { - print_us_node(scripted, parsable, fields, type_width, - name_width, used_width, used_width, node); + for (node = uu_avl_first(avl); node; node = uu_avl_next(avl, node)) { + print_us_node(scripted, parsable, fields, types, width, node); if (rmnode) nvlist_free(node->usn_nvl); } @@ -2591,32 +2541,36 @@ zfs_do_userspace(int argc, char **argv) uu_avl_pool_t *avl_pool; uu_avl_t *avl_tree; uu_avl_walk_t *walk; - - char *cmd; + char *delim; + char deffields[] = "type,name,used,quota"; + char *ofield = NULL; + char *tfield = NULL; + int cfield = 0; + int fields[256]; + int i; boolean_t scripted = B_FALSE; boolean_t prtnum = B_FALSE; - boolean_t parseable = B_FALSE; + boolean_t parsable = B_FALSE; boolean_t sid2posix = B_FALSE; - int error = 0; + int ret = 0; int c; - zfs_sort_column_t *default_sortcol = NULL; zfs_sort_column_t *sortcol = NULL; - unsigned types = USTYPE_PSX_USR | USTYPE_SMB_USR; - unsigned fields = 0; - unsigned props = USPROP_USED | USPROP_QUOTA; + int types = USTYPE_PSX_USR | USTYPE_SMB_USR; us_cbdata_t cb; us_node_t *node; - boolean_t resort_avl = B_FALSE; + us_node_t *rmnode; + uu_list_pool_t *listpool; + uu_list_t *list; + uu_avl_index_t idx = 0; + uu_list_index_t idx2 = 0; if (argc < 2) usage(B_FALSE); - cmd = argv[0]; - if (0 == strcmp(cmd, "groupspace")) - /* toggle default group types */ + if (strcmp(argv[0], "groupspace") == 0) + /* Toggle default group types */ types = USTYPE_PSX_GRP | USTYPE_SMB_GRP; - /* check options */ while ((c = getopt(argc, argv, "nHpo:s:S:t:i")) != -1) { switch (c) { case 'n': @@ -2626,32 +2580,22 @@ zfs_do_userspace(int argc, char **argv) scripted = B_TRUE; break; case 'p': - parseable = B_TRUE; + parsable = B_TRUE; break; case 'o': - if (parsefields(&fields, us_field_names, us_field_bits, - 4) != 0) - return (1); + ofield = optarg; break; case 's': - if (zfs_add_sort_column(&sortcol, optarg, - B_FALSE) != 0) { - (void) fprintf(stderr, - gettext("invalid property '%s'\n"), optarg); - usage(B_FALSE); - } - break; case 'S': if (zfs_add_sort_column(&sortcol, optarg, - B_TRUE) != 0) { + c == 's' ? B_FALSE : B_TRUE) != 0) { (void) fprintf(stderr, - gettext("invalid property '%s'\n"), optarg); + gettext("invalid field '%s'\n"), optarg); usage(B_FALSE); } break; case 't': - if (parsefields(&types, type_names, type_bits, 5)) - return (1); + tfield = optarg; break; case 'i': sid2posix = B_TRUE; @@ -2671,104 +2615,129 @@ zfs_do_userspace(int argc, char **argv) argc -= optind; argv += optind; - /* ok, now we have sorted by default colums (type,name) avl tree */ - if (sortcol) { - zfs_sort_column_t *sc; - for (sc = sortcol; sc; sc = sc->sc_next) { - if (sc->sc_prop == ZFS_PROP_QUOTA) { - resort_avl = B_TRUE; - break; - } - } + if (argc < 1) { + (void) fprintf(stderr, gettext("missing dataset name\n")); + usage(B_FALSE); + } + if (argc > 1) { + (void) fprintf(stderr, gettext("too many arguments\n")); + usage(B_FALSE); } - if (!fields) - fields = USFIELD_ALL; + /* Use default output fields if not specified using -o */ + if (ofield == NULL) + ofield = deffields; + do { + if ((delim = strchr(ofield, ',')) != NULL) + *delim = '\0'; + if ((fields[cfield++] = us_field_index(ofield)) == -1) { + (void) fprintf(stderr, gettext("invalid type '%s' " + "for -o option\n"), ofield); + return (-1); + } + if (delim != NULL) + ofield = delim + 1; + } while (delim != NULL); + fields[cfield] = USFIELD_LAST; + + /* Override output types (-t option) */ + if (tfield != NULL) { + types = 0; + + do { + boolean_t found = B_FALSE; + + if ((delim = strchr(tfield, ',')) != NULL) + *delim = '\0'; + for (i = 0; i < sizeof (us_type_bits) / sizeof (int); + i++) { + if (strcmp(tfield, us_type_names[i]) == 0) { + found = B_TRUE; + types |= us_type_bits[i]; + break; + } + } + if (!found) { + (void) fprintf(stderr, gettext("invalid type " + "'%s' for -t option\n"), tfield); + return (-1); + } + if (delim != NULL) + tfield = delim + 1; + } while (delim != NULL); + } - if ((zhp = zfs_open(g_zfs, argv[argc-1], ZFS_TYPE_DATASET)) == NULL) + if ((zhp = zfs_open(g_zfs, argv[0], ZFS_TYPE_DATASET)) == NULL) return (1); if ((avl_pool = uu_avl_pool_create("us_avl_pool", sizeof (us_node_t), - offsetof(us_node_t, usn_avlnode), - us_compare, UU_DEFAULT)) == NULL) + offsetof(us_node_t, usn_avlnode), us_compare, UU_DEFAULT)) == NULL) nomem(); if ((avl_tree = uu_avl_create(avl_pool, NULL, UU_DEFAULT)) == NULL) nomem(); - if (sortcol && !resort_avl) - cb.cb_sortcol = sortcol; - else { - (void) zfs_add_sort_column(&default_sortcol, "type", B_FALSE); - (void) zfs_add_sort_column(&default_sortcol, "name", B_FALSE); - cb.cb_sortcol = default_sortcol; - } + /* Always add default sorting columns */ + (void) zfs_add_sort_column(&sortcol, "type", B_FALSE); + (void) zfs_add_sort_column(&sortcol, "name", B_FALSE); + + cb.cb_sortcol = sortcol; cb.cb_numname = prtnum; - cb.cb_nicenum = !parseable; + cb.cb_nicenum = !parsable; cb.cb_avl_pool = avl_pool; cb.cb_avl = avl_tree; cb.cb_sid2posix = sid2posix; - cb.cb_max_typelen = strlen(gettext("TYPE")); - cb.cb_max_namelen = strlen(gettext("NAME")); - cb.cb_max_usedlen = strlen(gettext("USED")); - cb.cb_max_quotalen = strlen(gettext("QUOTA")); + + for (i = 0; i < USFIELD_LAST; i++) + cb.cb_width[i] = strlen(gettext(us_field_hdr[i])); for (p = 0; p < ZFS_NUM_USERQUOTA_PROPS; p++) { - if (!usprop_check(p, types, props)) + if (((p == ZFS_PROP_USERUSED || p == ZFS_PROP_USERQUOTA) && + !(types & (USTYPE_PSX_USR | USTYPE_SMB_USR))) || + ((p == ZFS_PROP_GROUPUSED || p == ZFS_PROP_GROUPQUOTA) && + !(types & (USTYPE_PSX_GRP | USTYPE_SMB_GRP)))) continue; - cb.cb_prop = p; - error = zfs_userspace(zhp, p, userspace_cb, &cb); + if ((ret = zfs_userspace(zhp, p, userspace_cb, &cb)) != 0) + return (ret); + } - if (error) - break; + /* Sort the list */ + if ((node = uu_avl_first(avl_tree)) == NULL) + return (0); + + us_populated = B_TRUE; + + listpool = uu_list_pool_create("tmplist", sizeof (us_node_t), + offsetof(us_node_t, usn_listnode), NULL, UU_DEFAULT); + list = uu_list_create(listpool, NULL, UU_DEFAULT); + uu_list_node_init(node, &node->usn_listnode, listpool); + + while (node != NULL) { + rmnode = node; + node = uu_avl_next(avl_tree, node); + uu_avl_remove(avl_tree, rmnode); + if (uu_list_find(list, rmnode, NULL, &idx2) == NULL) + uu_list_insert(list, rmnode, idx2); } - if (resort_avl) { - us_node_t *node; - us_node_t *rmnode; - uu_list_pool_t *listpool; - uu_list_t *list; - uu_avl_index_t idx = 0; - uu_list_index_t idx2 = 0; - listpool = uu_list_pool_create("tmplist", sizeof (us_node_t), - offsetof(us_node_t, usn_listnode), NULL, - UU_DEFAULT); - list = uu_list_create(listpool, NULL, UU_DEFAULT); - - node = uu_avl_first(avl_tree); - uu_list_node_init(node, &node->usn_listnode, listpool); - while (node != NULL) { - rmnode = node; - node = uu_avl_next(avl_tree, node); - uu_avl_remove(avl_tree, rmnode); - if (uu_list_find(list, rmnode, NULL, &idx2) == NULL) { - uu_list_insert(list, rmnode, idx2); - } - } + for (node = uu_list_first(list); node != NULL; + node = uu_list_next(list, node)) { + us_sort_info_t sortinfo = { sortcol, cb.cb_numname }; - for (node = uu_list_first(list); node != NULL; - node = uu_list_next(list, node)) { - us_sort_info_t sortinfo = { sortcol, cb.cb_numname }; - if (uu_avl_find(avl_tree, node, &sortinfo, &idx) == - NULL) + if (uu_avl_find(avl_tree, node, &sortinfo, &idx) == NULL) uu_avl_insert(avl_tree, node, idx); - } - - uu_list_destroy(list); } - /* print & free node`s nvlist memory */ - print_us(scripted, parseable, fields, cb.cb_max_typelen, - cb.cb_max_namelen, cb.cb_max_usedlen, - cb.cb_max_quotalen, B_TRUE, cb.cb_avl); + uu_list_destroy(list); + uu_list_pool_destroy(listpool); - if (sortcol) - zfs_free_sort_columns(sortcol); - zfs_free_sort_columns(default_sortcol); + /* Print and free node nvlist memory */ + print_us(scripted, parsable, fields, types, cb.cb_width, B_TRUE, + cb.cb_avl); - /* - * Finally, clean up the AVL tree. - */ + zfs_free_sort_columns(sortcol); + + /* Clean up the AVL tree */ if ((walk = uu_avl_walk_start(cb.cb_avl, UU_WALK_ROBUST)) == NULL) nomem(); @@ -2781,7 +2750,7 @@ zfs_do_userspace(int argc, char **argv) uu_avl_destroy(avl_tree); uu_avl_pool_destroy(avl_pool); - return (error); + return (ret); } /* diff --git a/cddl/contrib/opensolaris/cmd/zhack/zhack.c b/cddl/contrib/opensolaris/cmd/zhack/zhack.c new file mode 100644 index 000000000..2618cea32 --- /dev/null +++ b/cddl/contrib/opensolaris/cmd/zhack/zhack.c @@ -0,0 +1,533 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +/* + * zhack is a debugging tool that can write changes to ZFS pool using libzpool + * for testing purposes. Altering pools with zhack is unsupported and may + * result in corrupted pools. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#undef ZFS_MAXNAMELEN +#undef verify +#include + +extern boolean_t zfeature_checks_disable; + +const char cmdname[] = "zhack"; +libzfs_handle_t *g_zfs; +static importargs_t g_importargs; +static char *g_pool; +static boolean_t g_readonly; + +static void +usage(void) +{ + (void) fprintf(stderr, + "Usage: %s [-c cachefile] [-d dir] ...\n" + "where is one of the following:\n" + "\n", cmdname); + + (void) fprintf(stderr, + " feature stat \n" + " print information about enabled features\n" + " feature enable [-d desc] \n" + " add a new enabled feature to the pool\n" + " -d sets the feature's description\n" + " feature ref [-md] \n" + " change the refcount on the given feature\n" + " -d decrease instead of increase the refcount\n" + " -m add the feature to the label if increasing refcount\n" + "\n" + " : should be a feature guid\n"); + exit(1); +} + + +static void +fatal(const char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + (void) fprintf(stderr, "%s: ", cmdname); + (void) vfprintf(stderr, fmt, ap); + va_end(ap); + (void) fprintf(stderr, "\n"); + + exit(1); +} + +/* ARGSUSED */ +static int +space_delta_cb(dmu_object_type_t bonustype, void *data, + uint64_t *userp, uint64_t *groupp) +{ + /* + * Is it a valid type of object to track? + */ + if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA) + return (ENOENT); + (void) fprintf(stderr, "modifying object that needs user accounting"); + abort(); + /* NOTREACHED */ +} + +/* + * Target is the dataset whose pool we want to open. + */ +static void +import_pool(const char *target, boolean_t readonly) +{ + nvlist_t *config; + nvlist_t *pools; + int error; + char *sepp; + spa_t *spa; + nvpair_t *elem; + nvlist_t *props; + const char *name; + + kernel_init(readonly ? FREAD : (FREAD | FWRITE)); + g_zfs = libzfs_init(); + ASSERT(g_zfs != NULL); + + dmu_objset_register_type(DMU_OST_ZFS, space_delta_cb); + + g_readonly = readonly; + + /* + * If we only want readonly access, it's OK if we find + * a potentially-active (ie, imported into the kernel) pool from the + * default cachefile. + */ + if (readonly && spa_open(target, &spa, FTAG) == 0) { + spa_close(spa, FTAG); + return; + } + + g_importargs.unique = B_TRUE; + g_importargs.can_be_active = readonly; + g_pool = strdup(target); + if ((sepp = strpbrk(g_pool, "/@")) != NULL) + *sepp = '\0'; + g_importargs.poolname = g_pool; + pools = zpool_search_import(g_zfs, &g_importargs); + + if (pools == NULL || nvlist_next_nvpair(pools, NULL) == NULL) { + if (!g_importargs.can_be_active) { + g_importargs.can_be_active = B_TRUE; + if (zpool_search_import(g_zfs, &g_importargs) != NULL || + spa_open(target, &spa, FTAG) == 0) { + fatal("cannot import '%s': pool is active; run " + "\"zpool export %s\" first\n", + g_pool, g_pool); + } + } + + fatal("cannot import '%s': no such pool available\n", g_pool); + } + + elem = nvlist_next_nvpair(pools, NULL); + name = nvpair_name(elem); + verify(nvpair_value_nvlist(elem, &config) == 0); + + props = NULL; + if (readonly) { + verify(nvlist_alloc(&props, NV_UNIQUE_NAME, 0) == 0); + verify(nvlist_add_uint64(props, + zpool_prop_to_name(ZPOOL_PROP_READONLY), 1) == 0); + } + + zfeature_checks_disable = B_TRUE; + error = spa_import(name, config, props, ZFS_IMPORT_NORMAL); + zfeature_checks_disable = B_FALSE; + if (error == EEXIST) + error = 0; + + if (error) + fatal("can't import '%s': %s", name, strerror(error)); +} + +static void +zhack_spa_open(const char *target, boolean_t readonly, void *tag, spa_t **spa) +{ + int err; + + import_pool(target, readonly); + + zfeature_checks_disable = B_TRUE; + err = spa_open(target, spa, tag); + zfeature_checks_disable = B_FALSE; + + if (err != 0) + fatal("cannot open '%s': %s", target, strerror(err)); + if (spa_version(*spa) < SPA_VERSION_FEATURES) { + fatal("'%s' has version %d, features not enabled", target, + (int)spa_version(*spa)); + } +} + +static void +dump_obj(objset_t *os, uint64_t obj, const char *name) +{ + zap_cursor_t zc; + zap_attribute_t za; + + (void) printf("%s_obj:\n", name); + + for (zap_cursor_init(&zc, os, obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + if (za.za_integer_length == 8) { + ASSERT(za.za_num_integers == 1); + (void) printf("\t%s = %llu\n", + za.za_name, (u_longlong_t)za.za_first_integer); + } else { + ASSERT(za.za_integer_length == 1); + char val[1024]; + VERIFY(zap_lookup(os, obj, za.za_name, + 1, sizeof (val), val) == 0); + (void) printf("\t%s = %s\n", za.za_name, val); + } + } + zap_cursor_fini(&zc); +} + +static void +dump_mos(spa_t *spa) +{ + nvlist_t *nv = spa->spa_label_features; + + (void) printf("label config:\n"); + for (nvpair_t *pair = nvlist_next_nvpair(nv, NULL); + pair != NULL; + pair = nvlist_next_nvpair(nv, pair)) { + (void) printf("\t%s\n", nvpair_name(pair)); + } +} + +static void +zhack_do_feature_stat(int argc, char **argv) +{ + spa_t *spa; + objset_t *os; + char *target; + + argc--; + argv++; + + if (argc < 1) { + (void) fprintf(stderr, "error: missing pool name\n"); + usage(); + } + target = argv[0]; + + zhack_spa_open(target, B_TRUE, FTAG, &spa); + os = spa->spa_meta_objset; + + dump_obj(os, spa->spa_feat_for_read_obj, "for_read"); + dump_obj(os, spa->spa_feat_for_write_obj, "for_write"); + dump_obj(os, spa->spa_feat_desc_obj, "descriptions"); + dump_mos(spa); + + spa_close(spa, FTAG); +} + +static void +feature_enable_sync(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + zfeature_info_t *feature = arg2; + + spa_feature_enable(spa, feature, tx); +} + +static void +zhack_do_feature_enable(int argc, char **argv) +{ + char c; + char *desc, *target; + spa_t *spa; + objset_t *mos; + zfeature_info_t feature; + zfeature_info_t *nodeps[] = { NULL }; + + /* + * Features are not added to the pool's label until their refcounts + * are incremented, so fi_mos can just be left as false for now. + */ + desc = NULL; + feature.fi_uname = "zhack"; + feature.fi_mos = B_FALSE; + feature.fi_can_readonly = B_FALSE; + feature.fi_depends = nodeps; + + optind = 1; + while ((c = getopt(argc, argv, "rmd:")) != -1) { + switch (c) { + case 'r': + feature.fi_can_readonly = B_TRUE; + break; + case 'd': + desc = strdup(optarg); + break; + default: + usage(); + break; + } + } + + if (desc == NULL) + desc = strdup("zhack injected"); + feature.fi_desc = desc; + + argc -= optind; + argv += optind; + + if (argc < 2) { + (void) fprintf(stderr, "error: missing feature or pool name\n"); + usage(); + } + target = argv[0]; + feature.fi_guid = argv[1]; + + if (!zfeature_is_valid_guid(feature.fi_guid)) + fatal("invalid feature guid: %s", feature.fi_guid); + + zhack_spa_open(target, B_FALSE, FTAG, &spa); + mos = spa->spa_meta_objset; + + if (0 == zfeature_lookup_guid(feature.fi_guid, NULL)) + fatal("'%s' is a real feature, will not enable"); + if (0 == zap_contains(mos, spa->spa_feat_desc_obj, feature.fi_guid)) + fatal("feature already enabled: %s", feature.fi_guid); + + VERIFY3U(0, ==, dsl_sync_task_do(spa->spa_dsl_pool, NULL, + feature_enable_sync, spa, &feature, 5)); + + spa_close(spa, FTAG); + + free(desc); +} + +static void +feature_incr_sync(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + zfeature_info_t *feature = arg2; + + spa_feature_incr(spa, feature, tx); +} + +static void +feature_decr_sync(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + zfeature_info_t *feature = arg2; + + spa_feature_decr(spa, feature, tx); +} + +static void +zhack_do_feature_ref(int argc, char **argv) +{ + char c; + char *target; + boolean_t decr = B_FALSE; + spa_t *spa; + objset_t *mos; + zfeature_info_t feature; + zfeature_info_t *nodeps[] = { NULL }; + + /* + * fi_desc does not matter here because it was written to disk + * when the feature was enabled, but we need to properly set the + * feature for read or write based on the information we read off + * disk later. + */ + feature.fi_uname = "zhack"; + feature.fi_mos = B_FALSE; + feature.fi_desc = NULL; + feature.fi_depends = nodeps; + + optind = 1; + while ((c = getopt(argc, argv, "md")) != -1) { + switch (c) { + case 'm': + feature.fi_mos = B_TRUE; + break; + case 'd': + decr = B_TRUE; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc < 2) { + (void) fprintf(stderr, "error: missing feature or pool name\n"); + usage(); + } + target = argv[0]; + feature.fi_guid = argv[1]; + + if (!zfeature_is_valid_guid(feature.fi_guid)) + fatal("invalid feature guid: %s", feature.fi_guid); + + zhack_spa_open(target, B_FALSE, FTAG, &spa); + mos = spa->spa_meta_objset; + + if (0 == zfeature_lookup_guid(feature.fi_guid, NULL)) + fatal("'%s' is a real feature, will not change refcount"); + + if (0 == zap_contains(mos, spa->spa_feat_for_read_obj, + feature.fi_guid)) { + feature.fi_can_readonly = B_FALSE; + } else if (0 == zap_contains(mos, spa->spa_feat_for_write_obj, + feature.fi_guid)) { + feature.fi_can_readonly = B_TRUE; + } else { + fatal("feature is not enabled: %s", feature.fi_guid); + } + + if (decr && !spa_feature_is_active(spa, &feature)) + fatal("feature refcount already 0: %s", feature.fi_guid); + + VERIFY3U(0, ==, dsl_sync_task_do(spa->spa_dsl_pool, NULL, + decr ? feature_decr_sync : feature_incr_sync, spa, &feature, 5)); + + spa_close(spa, FTAG); +} + +static int +zhack_do_feature(int argc, char **argv) +{ + char *subcommand; + + argc--; + argv++; + if (argc == 0) { + (void) fprintf(stderr, + "error: no feature operation specified\n"); + usage(); + } + + subcommand = argv[0]; + if (strcmp(subcommand, "stat") == 0) { + zhack_do_feature_stat(argc, argv); + } else if (strcmp(subcommand, "enable") == 0) { + zhack_do_feature_enable(argc, argv); + } else if (strcmp(subcommand, "ref") == 0) { + zhack_do_feature_ref(argc, argv); + } else { + (void) fprintf(stderr, "error: unknown subcommand: %s\n", + subcommand); + usage(); + } + + return (0); +} + +#define MAX_NUM_PATHS 1024 + +int +main(int argc, char **argv) +{ + extern void zfs_prop_init(void); + + char *path[MAX_NUM_PATHS]; + const char *subcommand; + int rv = 0; + char c; + + g_importargs.path = path; + + dprintf_setup(&argc, argv); + zfs_prop_init(); + + while ((c = getopt(argc, argv, "c:d:")) != -1) { + switch (c) { + case 'c': + g_importargs.cachefile = optarg; + break; + case 'd': + assert(g_importargs.paths < MAX_NUM_PATHS); + g_importargs.path[g_importargs.paths++] = optarg; + break; + default: + usage(); + break; + } + } + + argc -= optind; + argv += optind; + optind = 1; + + if (argc == 0) { + (void) fprintf(stderr, "error: no command specified\n"); + usage(); + } + + subcommand = argv[0]; + + if (strcmp(subcommand, "feature") == 0) { + rv = zhack_do_feature(argc, argv); + } else { + (void) fprintf(stderr, "error: unknown subcommand: %s\n", + subcommand); + usage(); + } + + if (!g_readonly && spa_export(g_pool, NULL, B_TRUE, B_TRUE) != 0) { + fatal("pool export failed; " + "changes may not be committed to disk\n"); + } + + libzfs_fini(g_zfs); + kernel_fini(); + + return (rv); +} diff --git a/cddl/contrib/opensolaris/cmd/zinject/zinject.c b/cddl/contrib/opensolaris/cmd/zinject/zinject.c index 51d2fc97c..d17ed534e 100644 --- a/cddl/contrib/opensolaris/cmd/zinject/zinject.c +++ b/cddl/contrib/opensolaris/cmd/zinject/zinject.c @@ -297,11 +297,9 @@ static int iter_handlers(int (*func)(int, const char *, zinject_record_t *, void *), void *data) { - zfs_cmd_t zc; + zfs_cmd_t zc = { 0 }; int ret; - zc.zc_guid = 0; - while (ioctl(zfs_fd, ZFS_IOC_INJECT_LIST_NEXT, &zc) == 0) if ((ret = func((int)zc.zc_guid, zc.zc_name, &zc.zc_inject_record, data)) != 0) @@ -424,7 +422,7 @@ static int cancel_one_handler(int id, const char *pool, zinject_record_t *record, void *data) { - zfs_cmd_t zc; + zfs_cmd_t zc = { 0 }; zc.zc_guid = (uint64_t)id; @@ -457,7 +455,7 @@ cancel_all_handlers(void) static int cancel_handler(int id) { - zfs_cmd_t zc; + zfs_cmd_t zc = { 0 }; zc.zc_guid = (uint64_t)id; @@ -479,7 +477,7 @@ static int register_handler(const char *pool, int flags, zinject_record_t *record, int quiet) { - zfs_cmd_t zc; + zfs_cmd_t zc = { 0 }; (void) strcpy(zc.zc_name, pool); zc.zc_inject_record = *record; @@ -536,7 +534,7 @@ register_handler(const char *pool, int flags, zinject_record_t *record, int perform_action(const char *pool, zinject_record_t *record, int cmd) { - zfs_cmd_t zc; + zfs_cmd_t zc = { 0 }; ASSERT(cmd == VDEV_STATE_DEGRADED || cmd == VDEV_STATE_FAULTED); (void) strlcpy(zc.zc_name, pool, sizeof (zc.zc_name)); diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7 b/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7 new file mode 100644 index 000000000..999212c16 --- /dev/null +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool-features.7 @@ -0,0 +1,206 @@ +'\" te +.\" Copyright (c) 2012, Martin Matuska . +.\" All Rights Reserved. +.\" +.\" The contents of this file are subject to the terms of the +.\" Common Development and Distribution License (the "License"). +.\" You may not use this file except in compliance with the License. +.\" +.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +.\" or http://www.opensolaris.org/os/licensing. +.\" See the License for the specific language governing permissions +.\" and limitations under the License. +.\" +.\" When distributing Covered Code, include this CDDL HEADER in each +.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. +.\" If applicable, add the following below this CDDL HEADER, with the +.\" fields enclosed by brackets "[]" replaced with your own identifying +.\" information: Portions Copyright [yyyy] [name of copyright owner] +.\" +.\" Copyright (c) 2012 by Delphix. All rights reserved. +.\" +.\" $FreeBSD$ +.\" +.Dd Aug 28, 2012 +.Dt ZPOOL-FEATURES 7 +.Os +.Sh NAME +.Nm zpool-features +.Nd ZFS pool feature descriptions +.Sh DESCRIPTION +ZFS pool on\-disk format versions are specified via "features" which replace +the old on\-disk format numbers (the last supported on\-disk format number is +28). +To enable a feature on a pool use the +.Cm upgrade +subcommand of the +.Xr zpool 8 +command, or set the +.Sy feature@feature_name +property to +.Ar enabled . +.Pp +The pool format does not affect file system version compatibility or the ability +to send file systems between pools. +.Pp +Since most features can be enabled independently of each other the on\-disk +format of the pool is specified by the set of all features marked as +.Sy active +on the pool. If the pool was created by another software version this set may +include unsupported features. +.Ss Identifying features +Every feature has a guid of the form +.Sy com.example:feature_name . +The reverse DNS name ensures that the feature's guid is unique across all ZFS +implementations. When unsupported features are encountered on a pool they will +be identified by their guids. +Refer to the documentation for the ZFS implementation that created the pool +for information about those features. +.Pp +Each supported feature also has a short name. +By convention a feature's short name is the portion of its guid which follows +the ':' (e.g. +.Sy com.example:feature_name +would have the short name +.Sy feature_name ), +however a feature's short name may differ across ZFS implementations if +following the convention would result in name conflicts. +.Ss Feature states +Features can be in one of three states: +.Bl -tag -width "XXXXXXXX" +.It Sy active +This feature's on\-disk format changes are in effect on the pool. +Support for this feature is required to import the pool in read\-write mode. +If this feature is not read-only compatible, support is also required to +import the pool in read\-only mode (see "Read\-only compatibility"). +.It Sy enabled +An administrator has marked this feature as enabled on the pool, but the +feature's on\-disk format changes have not been made yet. +The pool can still be imported by software that does not support this feature, +but changes may be made to the on\-disk format at any time which will move +the feature to the +.Sy active +state. +Some features may support returning to the +.Sy enabled +state after becoming +.Sy active . +See feature\-specific documentation for details. +.It Sy disabled +This feature's on\-disk format changes have not been made and will not be made +unless an administrator moves the feature to the +.Sy enabled +state. +Features cannot be disabled once they have been enabled. +.El +.Pp +The state of supported features is exposed through pool properties of the form +.Sy feature@short_name . +.Ss Read\-only compatibility +Some features may make on\-disk format changes that do not interfere with other +software's ability to read from the pool. +These features are referred to as "read\-only compatible". +If all unsupported features on a pool are read\-only compatible, the pool can +be imported in read\-only mode by setting the +.Sy readonly +property during import (see +.Xr zpool 8 +for details on importing pools). +.Ss Unsupported features +For each unsupported feature enabled on an imported pool a pool property +named +.Sy unsupported@feature_guid +will indicate why the import was allowed despite the unsupported feature. +Possible values for this property are: +.Bl -tag -width "XXXXXXXX" +.It Sy inactive +The feature is in the +.Sy enabled +state and therefore the pool's on\-disk format is still compatible with +software that does not support this feature. +.It Sy readonly +The feature is read\-only compatible and the pool has been imported in +read\-only mode. +.El +.Ss Feature dependencies +Some features depend on other features being enabled in order to function +properly. +Enabling a feature will automatically enable any features it depends on. +.Sh FEATURES +The following features are supported on this system: +.Bl -tag -width "XXXXXXXX" +.It Sy async_destroy +.Bl -column "READ\-ONLY COMPATIBLE" "com.delphix:async_destroy" +.It GUID Ta com.delphix:async_destroy +.It READ\-ONLY COMPATIBLE Ta yes +.It DEPENDENCIES Ta none +.El +.Pp +Destroying a file system requires traversing all of its data in order to +return its used space to the pool. +Without +.Sy async_destroy +the file system is not fully removed until all space has been reclaimed. +If the destroy operation is interrupted by a reboot or power outage the next +attempt to open the pool will need to complete the destroy operation +synchronously. +.Pp +When +.Sy async_destroy +is enabled the file system's data will be reclaimed by a background process, +allowing the destroy operation to complete without traversing the entire file +system. +The background process is able to resume interrupted destroys after the pool +has been opened, eliminating the need to finish interrupted destroys as part +of the open operation. +The amount of space remaining to be reclaimed by the background process is +available through the +.Sy freeing +property. +.Pp +This feature is only +.Sy active +while +.Sy freeing +is non\-zero. +.It Sy empty_bpobj +.Bl -column "READ\-ONLY COMPATIBLE" "com.delphix:empty_bpobj" +.It GUID Ta com.delphix:empty_bpobj +.It READ\-ONLY COMPATIBLE Ta yes +.It DEPENDENCIES Ta none +.El +.Pp +This feature increases the performance of creating and using a large number +of snapshots of a single filesystem or volume, and also reduces the disk +space required. +.Pp +When there are many snapshots, each snapshot uses many Block Pointer Objects +.Pq bpobj's +to track blocks associated with that snapshot. +However, in common use cases, most of these bpobj's are empty. +This feature allows us to create each bpobj on-demand, thus eliminating the +empty bpobjs. +.Pp +This feature is +.Sy active +while there are any filesystems, volumes, or snapshots which were created +after enabling this feature. +.El +.Sh SEE ALSO +.Xr zpool 8 +.Sh AUTHORS +This manual page is a +.Xr mdoc 7 +reimplementation of the +.Tn illumos +manual page +.Em zpool-features(5) , +modified and customized for +.Fx +and licensed under the Common Development and Distribution License +.Pq Tn CDDL . +.Pp +The +.Xr mdoc 7 +implementation of this manual page was initially written by +.An Martin Matuska Aq mm@FreeBSD.org . diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool.8 b/cddl/contrib/opensolaris/cmd/zpool/zpool.8 index 5536c14b8..0a4de1f32 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool.8 +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool.8 @@ -1,5 +1,5 @@ '\" te -.\" Copyright (c) 2011, Martin Matuska . +.\" Copyright (c) 2012, Martin Matuska . .\" All Rights Reserved. .\" .\" The contents of this file are subject to the terms of the @@ -20,6 +20,8 @@ .\" Copyright (c) 2010, Sun Microsystems, Inc. All Rights Reserved. .\" Copyright 2011, Nexenta Systems, Inc. All Rights Reserved. .\" Copyright (c) 2011, Justin T. Gibbs +.\" Copyright (c) 2012 by Delphix. All Rights Reserved. +.\" Copyright (c) 2012, Glen Barber .\" .\" $FreeBSD$ .\" @@ -47,7 +49,7 @@ .Op Ar device .Nm .Cm create -.Op Fl fn +.Op Fl fnd .Op Fl o Ar property Ns = Ns Ar value .Ar ... .Op Fl O Ar file-system-property Ns = Ns Ar value @@ -192,7 +194,7 @@ A describes a single device or a collection of devices organized according to certain performance and fault characteristics. The following virtual devices are supported: -.Bl -tag +.Bl -tag -width "XXXXXX" .It Sy disk A block device, typically located under .Pa /dev . @@ -531,12 +533,22 @@ can provide additional information about a pool using this property. .It Sy dedupratio The deduplication ratio specified for a pool, expressed as a multiplier. For example, a -.S dedupratio +.Sy dedupratio value of 1.76 indicates that 1.76 units of data were stored but only 1 unit of disk space was actually consumed. See .Xr zfs 8 for a description of the deduplication feature. .It Sy free Number of blocks within the pool that are not allocated. +.It Sy freeing +After a file system or snapshot is destroyed, the space it was using is +returned to the pool asynchronously. +.Sy freeing +is the amount of space remaining to be reclaimed. +Over time +.Sy freeing +will decrease while +.Sy free +increases. .It Sy expandsize This property has currently no value on FreeBSD. .It Sy guid @@ -552,11 +564,16 @@ or .Qq Sy UNAVAIL . .It Sy size Total size of the storage pool. +.It Sy unsupported@ Ns Ar feature_guid +Information about unsupported features that are enabled on the pool. +See +.Xr zpool-features 7 +for details. .It Sy used Amount of storage space used within the pool. .El .Pp -These space usage properties report actual physical space available to the +The space usage properties report actual physical space available to the storage pool. The physical space can be different from the total amount of space that any contained datasets can actually use. The amount of space used in a @@ -653,6 +670,11 @@ Setting it to the special value creates a temporary pool that is never cached, and the special value .Cm '' (empty string) uses the default location. +.It Sy comment Ns = Ns Ar text +A text string consisting of printable ASCII characters that will be stored +such that it is available even if the pool becomes faulted. +An administrator can provide additional information about a pool using this +property. .It Sy dedupditto Ns = Ns Ar number Threshold for the number of block ditto copies. If the reference count for a deduplicated block increases above this number, a new ditto copy of this block @@ -686,6 +708,17 @@ requests that have yet to be committed to disk would be blocked. .It Sy panic Prints out a message to the console and generates a system crash dump. .El +.It Sy feature@ Ns Ar feature_name Ns = Ns Sy enabled +The value of this property is the current state of +.Ar feature_name . +The only valid value when setting this property is +.Sy enabled +which moves +.Ar feature_name +to the enabled state. +See +.Xr zpool-features 7 +for details on feature states. .It Sy listsnaps Ns = Ns Cm on No | Cm off Controls whether information about snapshots associated with this pool is output when @@ -699,9 +732,9 @@ The current on-disk version of the pool. This can be increased, but never decreased. The preferred method of updating pools is with the .Qq Nm Cm upgrade command, though this property can be used when a specific version is needed -for backwards compatibility. This property can be any number between 1 and the -current version reported by -.Qo Ic zpool upgrade -v Qc . +for backwards compatibility. +Once feature flags is enabled on a pool this property will no longer have a +value. .El .Sh SUBCOMMANDS All subcommands that modify state are logged persistently to the pool in their @@ -810,7 +843,7 @@ do not actually discard any transactions. .It Xo .Nm .Cm create -.Op Fl fn +.Op Fl fnd .Op Fl o Ar property Ns = Ns Ar value .Ar ... .Op Fl O Ar file-system-property Ns = Ns Ar value @@ -859,6 +892,10 @@ The mount point must not exist or must be empty, or else the root dataset cannot be mounted. This can be overridden with the .Fl m option. +.Pp +By default all supported features are enabled on the new pool unless the +.Fl d +option is specified. .Bl -tag -width indent .It Fl f Forces use of @@ -869,6 +906,17 @@ Not all devices can be overridden in this manner. Displays the configuration that would be used without actually creating the pool. The actual pool creation can still fail due to insufficient privileges or device sharing. +.It Fl d +Do not enable any features on the new pool. +Individual features can be enabled by setting their corresponding properties +to +.Sy enabled +with the +.Fl o +option. +See +.Xr zpool-features 7 +for details about feature properties. .It Xo .Fl o Ar property Ns = Ns Ar value .Op Fl o Ar property Ns = Ns Ar value @@ -1589,21 +1637,22 @@ for unixtime .Op Fl v .Xc .Pp -Displays all pools formatted using a different +Displays pools which do not have all supported features enabled and pools +formatted using a legacy .Tn ZFS -pool on-disk version. Older versions can continue to be used, but some -features may not be available. These pools can be upgraded using -.Qq Nm Cm upgrade Fl a . -Pools that are formatted with a more recent version are also displayed, -although these pools will be inaccessible on the system. +version number. +These pools can continue to be used, but some features may not be available. +Use +.Nm Cm upgrade Fl a +to enable all features on all pools. .Bl -tag -width indent .It Fl v -Displays -.Tn ZFS -pool versions supported by the current software. The current +Displays legacy .Tn ZFS -pool version and all previous supported versions are displayed, along -with an explanation of the features provided with each version. +versions supported by the current software. +See +.Xr zpool-features 7 +for a description of feature flags features supported by the current software. .El .It Xo .Nm @@ -1612,20 +1661,34 @@ with an explanation of the features provided with each version. .Fl a | Ar pool ... .Xc .Pp -Upgrades the given pool to the latest on-disk pool version. Once this is done, -the pool will no longer be accessible on systems running older versions of the -software. +Enables all supported features on the given pool. +Once this is done, the pool will no longer be accessible on systems that do +not support feature flags. +See +.Xr zpool-features 7 +for details on compatability with system sthat support feature flags, but do +not support all features enabled on the pool. .Bl -tag -width indent .It Fl a -Upgrades all pools. +Enables all supported features on all pools. .It Fl V Ar version -Upgrade to the specified version. If the +Upgrade to the specified legacy version. If the .Fl V -flag is not specified, the pool is upgraded to the most recent version. This -option can only be used to increase the version number, and only up to the most -recent version supported by this software. +flag is specified, no features will be enabled on the pool. +This option can only be used to increase version number up to the last +supported legacy version number. .El .El +.Sh EXIT STATUS +The following exit values are returned: +.Bl -tag -offset 2n -width 2n +.It 0 +Successful completion. +.It 1 +An error occurred. +.It 2 +Invalid command line options were specified. +.El .Sh EXAMPLES .Bl -tag -width 0n .It Sy Example 1 No Creating a RAID-Z Storage Pool @@ -1863,18 +1926,9 @@ Pool data returned to its state as of Tue Sep 08 13:23:35 2009. Discarded approximately 29 seconds of transactions. .Ed .El -.Sh EXIT STATUS -The following exit values are returned: -.Bl -tag -offset 2n -width 2n -.It 0 -Successful completion. -.It 1 -An error occurred. -.It 2 -Invalid command line options were specified. -.El .Sh SEE ALSO .Xr zfs 8 +.Xr zpool-features 7 .Sh AUTHORS This manual page is a .Xr mdoc 7 diff --git a/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c b/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c index 61030f008..3c2a625d2 100644 --- a/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c +++ b/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c @@ -54,6 +54,7 @@ #include "zpool_util.h" #include "zfs_comutil.h" +#include "zfeature_common.h" #include "statcommon.h" @@ -207,7 +208,7 @@ get_usage(zpool_help_t idx) { case HELP_CLEAR: return (gettext("\tclear [-nF] [device]\n")); case HELP_CREATE: - return (gettext("\tcreate [-fn] [-o property=value] ... \n" + return (gettext("\tcreate [-fnd] [-o property=value] ... \n" "\t [-O file-system-property=value] ... \n" "\t [-m mountpoint] [-R root] ...\n")); case HELP_DESTROY: @@ -339,6 +340,12 @@ usage(boolean_t requested) /* Iterate over all properties */ (void) zprop_iter(print_prop_cb, fp, B_FALSE, B_TRUE, ZFS_TYPE_POOL); + + (void) fprintf(fp, "\t%-15s ", "feature@..."); + (void) fprintf(fp, "YES disabled | enabled | active\n"); + + (void) fprintf(fp, gettext("\nThe feature@ properties must be " + "appended with a feature name.\nSee zpool-features(7).\n")); } /* @@ -382,6 +389,18 @@ print_vdev_tree(zpool_handle_t *zhp, const char *name, nvlist_t *nv, int indent, } } +static boolean_t +prop_list_contains_feature(nvlist_t *proplist) +{ + nvpair_t *nvp; + for (nvp = nvlist_next_nvpair(proplist, NULL); NULL != nvp; + nvp = nvlist_next_nvpair(proplist, nvp)) { + if (zpool_prop_feature(nvpair_name(nvp))) + return (B_TRUE); + } + return (B_FALSE); +} + /* * Add a property pair (name, string-value) into a property nvlist. */ @@ -405,12 +424,34 @@ add_prop_list(const char *propname, char *propval, nvlist_t **props, proplist = *props; if (poolprop) { - if ((prop = zpool_name_to_prop(propname)) == ZPROP_INVAL) { + const char *vname = zpool_prop_to_name(ZPOOL_PROP_VERSION); + + if ((prop = zpool_name_to_prop(propname)) == ZPROP_INVAL && + !zpool_prop_feature(propname)) { (void) fprintf(stderr, gettext("property '%s' is " "not a valid pool property\n"), propname); return (2); } - normnm = zpool_prop_to_name(prop); + + /* + * feature@ properties and version should not be specified + * at the same time. + */ + if ((prop == ZPROP_INVAL && zpool_prop_feature(propname) && + nvlist_exists(proplist, vname)) || + (prop == ZPOOL_PROP_VERSION && + prop_list_contains_feature(proplist))) { + (void) fprintf(stderr, gettext("'feature@' and " + "'version' properties cannot be specified " + "together\n")); + return (2); + } + + + if (zpool_prop_feature(propname)) + normnm = propname; + else + normnm = zpool_prop_to_name(prop); } else { if ((fprop = zfs_name_to_prop(propname)) != ZPROP_INVAL) { normnm = zfs_prop_to_name(fprop); @@ -701,7 +742,7 @@ errout: } /* - * zpool create [-fn] [-o property=value] ... + * zpool create [-fnd] [-o property=value] ... * [-O file-system-property=value] ... * [-R root] [-m mountpoint] ... * @@ -710,8 +751,10 @@ errout: * were to be created. * -R Create a pool under an alternate root * -m Set default mountpoint for the root dataset. By default it's - * '/' + * '/' * -o Set property=value. + * -d Don't automatically enable all supported pool features + * (individual features can be enabled with -o). * -O Set fsproperty=value in the pool's root file system * * Creates the named pool according to the given vdev specification. The @@ -724,6 +767,7 @@ zpool_do_create(int argc, char **argv) { boolean_t force = B_FALSE; boolean_t dryrun = B_FALSE; + boolean_t enable_all_pool_feat = B_TRUE; int c; nvlist_t *nvroot = NULL; char *poolname; @@ -735,7 +779,7 @@ zpool_do_create(int argc, char **argv) char *propval; /* check options */ - while ((c = getopt(argc, argv, ":fnR:m:o:O:")) != -1) { + while ((c = getopt(argc, argv, ":fndR:m:o:O:")) != -1) { switch (c) { case 'f': force = B_TRUE; @@ -743,6 +787,9 @@ zpool_do_create(int argc, char **argv) case 'n': dryrun = B_TRUE; break; + case 'd': + enable_all_pool_feat = B_FALSE; + break; case 'R': altroot = optarg; if (add_prop_list(zpool_prop_to_name( @@ -770,6 +817,21 @@ zpool_do_create(int argc, char **argv) if (add_prop_list(optarg, propval, &props, B_TRUE)) goto errout; + + /* + * If the user is creating a pool that doesn't support + * feature flags, don't enable any features. + */ + if (zpool_name_to_prop(optarg) == ZPOOL_PROP_VERSION) { + char *end; + u_longlong_t ver; + + ver = strtoull(propval, &end, 10); + if (*end == '\0' && + ver < SPA_VERSION_FEATURES) { + enable_all_pool_feat = B_FALSE; + } + } break; case 'O': if ((propval = strchr(optarg, '=')) == NULL) { @@ -835,7 +897,6 @@ zpool_do_create(int argc, char **argv) goto errout; } - if (altroot != NULL && altroot[0] != '/') { (void) fprintf(stderr, gettext("invalid alternate root '%s': " "must be an absolute path\n"), altroot); @@ -917,6 +978,27 @@ zpool_do_create(int argc, char **argv) /* * Hand off to libzfs. */ + if (enable_all_pool_feat) { + int i; + for (i = 0; i < SPA_FEATURES; i++) { + char propname[MAXPATHLEN]; + zfeature_info_t *feat = &spa_feature_table[i]; + + (void) snprintf(propname, sizeof (propname), + "feature@%s", feat->fi_uname); + + /* + * Skip feature if user specified it manually + * on the command line. + */ + if (nvlist_exists(props, propname)) + continue; + + if (add_prop_list(propname, ZFS_FEATURE_ENABLED, + &props, B_TRUE) != 0) + goto errout; + } + } if (zpool_create(g_zfs, poolname, nvroot, props, fsprops) == 0) { zfs_handle_t *pool = zfs_open(g_zfs, poolname, @@ -1249,6 +1331,10 @@ print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv, (void) printf(gettext("newer version")); break; + case VDEV_AUX_UNSUP_FEAT: + (void) printf(gettext("unsupported feature(s)")); + break; + case VDEV_AUX_SPARED: verify(nvlist_lookup_uint64(nv, ZPOOL_CONFIG_GUID, &cb.cb_guid) == 0); @@ -1366,6 +1452,10 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth) (void) printf(gettext("newer version")); break; + case VDEV_AUX_UNSUP_FEAT: + (void) printf(gettext("unsupported feature(s)")); + break; + case VDEV_AUX_ERR_EXCEEDED: (void) printf(gettext("too many errors")); break; @@ -1523,8 +1613,8 @@ show_import(nvlist_t *config) break; case ZPOOL_STATUS_VERSION_OLDER: - (void) printf(gettext(" status: The pool is formatted using an " - "older on-disk version.\n")); + (void) printf(gettext(" status: The pool is formatted using a " + "legacy on-disk version.\n")); break; case ZPOOL_STATUS_VERSION_NEWER: @@ -1532,6 +1622,25 @@ show_import(nvlist_t *config) "incompatible version.\n")); break; + case ZPOOL_STATUS_FEAT_DISABLED: + (void) printf(gettext(" status: Some supported features are " + "not enabled on the pool.\n")); + break; + + case ZPOOL_STATUS_UNSUP_FEAT_READ: + (void) printf(gettext("status: The pool uses the following " + "feature(s) not supported on this sytem:\n")); + zpool_print_unsup_feat(config); + break; + + case ZPOOL_STATUS_UNSUP_FEAT_WRITE: + (void) printf(gettext("status: The pool can only be accessed " + "in read-only mode on this system. It\n\tcannot be " + "accessed in read-write mode because it uses the " + "following\n\tfeature(s) not supported on this system:\n")); + zpool_print_unsup_feat(config); + break; + case ZPOOL_STATUS_HOSTID_MISMATCH: (void) printf(gettext(" status: The pool was last accessed by " "another system.\n")); @@ -1564,19 +1673,21 @@ show_import(nvlist_t *config) * Print out an action according to the overall state of the pool. */ if (vs->vs_state == VDEV_STATE_HEALTHY) { - if (reason == ZPOOL_STATUS_VERSION_OLDER) + if (reason == ZPOOL_STATUS_VERSION_OLDER || + reason == ZPOOL_STATUS_FEAT_DISABLED) { (void) printf(gettext(" action: The pool can be " "imported using its name or numeric identifier, " "though\n\tsome features will not be available " "without an explicit 'zpool upgrade'.\n")); - else if (reason == ZPOOL_STATUS_HOSTID_MISMATCH) + } else if (reason == ZPOOL_STATUS_HOSTID_MISMATCH) { (void) printf(gettext(" action: The pool can be " "imported using its name or numeric " "identifier and\n\tthe '-f' flag.\n")); - else + } else { (void) printf(gettext(" action: The pool can be " "imported using its name or numeric " "identifier.\n")); + } } else if (vs->vs_state == VDEV_STATE_DEGRADED) { (void) printf(gettext(" action: The pool can be imported " "despite missing or damaged devices. The\n\tfault " @@ -1589,6 +1700,20 @@ show_import(nvlist_t *config) "newer\n\tsoftware, or recreate the pool from " "backup.\n")); break; + case ZPOOL_STATUS_UNSUP_FEAT_READ: + (void) printf(gettext("action: The pool cannot be " + "imported. Access the pool on a system that " + "supports\n\tthe required feature(s), or recreate " + "the pool from backup.\n")); + break; + case ZPOOL_STATUS_UNSUP_FEAT_WRITE: + (void) printf(gettext("action: The pool cannot be " + "imported in read-write mode. Import the pool " + "with\n" + "\t\"-o readonly=on\", access the pool on a system " + "that supports the\n\trequired feature(s), or " + "recreate the pool from backup.\n")); + break; case ZPOOL_STATUS_MISSING_DEV_R: case ZPOOL_STATUS_MISSING_DEV_NR: case ZPOOL_STATUS_BAD_GUID_SUM: @@ -1664,9 +1789,9 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts, ZPOOL_CONFIG_POOL_STATE, &state) == 0); verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION, &version) == 0); - if (version > SPA_VERSION) { + if (!SPA_VERSION_IS_SUPPORTED(version)) { (void) fprintf(stderr, gettext("cannot import '%s': pool " - "is formatted using a newer ZFS version\n"), name); + "is formatted using an unsupported ZFS version\n"), name); return (1); } else if (state != POOL_STATE_EXPORTED && !(flags & ZFS_IMPORT_ANY_HOST)) { @@ -2601,15 +2726,13 @@ static void print_header(list_cbdata_t *cb) { zprop_list_t *pl = cb->cb_proplist; + char headerbuf[ZPOOL_MAXPROPLEN]; const char *header; boolean_t first = B_TRUE; boolean_t right_justify; size_t width = 0; for (; pl != NULL; pl = pl->pl_next) { - if (pl->pl_prop == ZPROP_INVAL) - continue; - width = pl->pl_width; if (first && cb->cb_verbose) { /* @@ -2624,8 +2747,18 @@ print_header(list_cbdata_t *cb) else first = B_FALSE; - header = zpool_prop_column_name(pl->pl_prop); - right_justify = zpool_prop_align_right(pl->pl_prop); + right_justify = B_FALSE; + if (pl->pl_prop != ZPROP_INVAL) { + header = zpool_prop_column_name(pl->pl_prop); + right_justify = zpool_prop_align_right(pl->pl_prop); + } else { + int i; + + for (i = 0; pl->pl_user_prop[i] != '\0'; i++) + headerbuf[i] = toupper(pl->pl_user_prop[i]); + headerbuf[i] = '\0'; + header = headerbuf; + } if (pl->pl_next == NULL && !right_justify) (void) printf("%s", header); @@ -2685,6 +2818,11 @@ print_pool(zpool_handle_t *zhp, list_cbdata_t *cb) propstr = property; right_justify = zpool_prop_align_right(pl->pl_prop); + } else if ((zpool_prop_feature(pl->pl_user_prop) || + zpool_prop_unsupported(pl->pl_user_prop)) && + zpool_prop_get_feature(zhp, pl->pl_user_prop, property, + sizeof (property)) == 0) { + propstr = property; } else { propstr = "-"; } @@ -3255,7 +3393,7 @@ zpool_do_split(int argc, char **argv) if (zpool_get_state(zhp) != POOL_STATE_UNAVAIL && zpool_enable_datasets(zhp, mntopts, 0) != 0) { ret = 1; - (void) fprintf(stderr, gettext("Split was succssful, but " + (void) fprintf(stderr, gettext("Split was successful, but " "the datasets could not all be mounted\n")); (void) fprintf(stderr, gettext("Try doing '%s' with a " "different altroot\n"), "zpool import"); @@ -4007,12 +4145,13 @@ status_callback(zpool_handle_t *zhp, void *data) break; case ZPOOL_STATUS_VERSION_OLDER: - (void) printf(gettext("status: The pool is formatted using an " - "older on-disk format. The pool can\n\tstill be used, but " - "some features are unavailable.\n")); + (void) printf(gettext("status: The pool is formatted using a " + "legacy on-disk format. The pool can\n\tstill be used, " + "but some features are unavailable.\n")); (void) printf(gettext("action: Upgrade the pool using 'zpool " "upgrade'. Once this is done, the\n\tpool will no longer " - "be accessible on older software versions.\n")); + "be accessible on software that does not support feature\n" + "\tflags.\n")); break; case ZPOOL_STATUS_VERSION_NEWER: @@ -4024,6 +4163,41 @@ status_callback(zpool_handle_t *zhp, void *data) "backup.\n")); break; + case ZPOOL_STATUS_FEAT_DISABLED: + (void) printf(gettext("status: Some supported features are not " + "enabled on the pool. The pool can\n\tstill be used, but " + "some features are unavailable.\n")); + (void) printf(gettext("action: Enable all features using " + "'zpool upgrade'. Once this is done,\n\tthe pool may no " + "longer be accessible by software that does not support\n\t" + "the features. See zpool-features(7) for details.\n")); + break; + + case ZPOOL_STATUS_UNSUP_FEAT_READ: + (void) printf(gettext("status: The pool cannot be accessed on " + "this system because it uses the\n\tfollowing feature(s) " + "not supported on this system:\n")); + zpool_print_unsup_feat(config); + (void) printf("\n"); + (void) printf(gettext("action: Access the pool from a system " + "that supports the required feature(s),\n\tor restore the " + "pool from backup.\n")); + break; + + case ZPOOL_STATUS_UNSUP_FEAT_WRITE: + (void) printf(gettext("status: The pool can only be accessed " + "in read-only mode on this system. It\n\tcannot be " + "accessed in read-write mode because it uses the " + "following\n\tfeature(s) not supported on this system:\n")); + zpool_print_unsup_feat(config); + (void) printf("\n"); + (void) printf(gettext("action: The pool cannot be accessed in " + "read-write mode. Import the pool with\n" + "\t\"-o readonly=on\", access the pool from a system that " + "supports the\n\trequired feature(s), or restore the " + "pool from backup.\n")); + break; + case ZPOOL_STATUS_FAULTED_DEV_R: (void) printf(gettext("status: One or more devices are " "faulted in response to persistent errors.\n\tSufficient " @@ -4228,15 +4402,14 @@ zpool_do_status(int argc, char **argv) } typedef struct upgrade_cbdata { - int cb_all; int cb_first; - int cb_newer; char cb_poolname[ZPOOL_MAXNAMELEN]; int cb_argc; uint64_t cb_version; char **cb_argv; } upgrade_cbdata_t; +#ifdef __FreeBSD__ static int is_root_pool(zpool_handle_t *zhp) { @@ -4262,54 +4435,161 @@ is_root_pool(zpool_handle_t *zhp) return (poolname != NULL && strcmp(poolname, zpool_get_name(zhp)) == 0); } +static void +root_pool_upgrade_check(zpool_handle_t *zhp, char *poolname, int size) { + + if (poolname[0] == '\0' && is_root_pool(zhp)) + (void) strlcpy(poolname, zpool_get_name(zhp), size); +} +#endif /* FreeBSD */ + +static int +upgrade_version(zpool_handle_t *zhp, uint64_t version) +{ + int ret; + nvlist_t *config; + uint64_t oldversion; + + config = zpool_get_config(zhp, NULL); + verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION, + &oldversion) == 0); + + assert(SPA_VERSION_IS_SUPPORTED(oldversion)); + assert(oldversion < version); + + ret = zpool_upgrade(zhp, version); + if (ret != 0) + return (ret); + + if (version >= SPA_VERSION_FEATURES) { + (void) printf(gettext("Successfully upgraded " + "'%s' from version %llu to feature flags.\n"), + zpool_get_name(zhp), oldversion); + } else { + (void) printf(gettext("Successfully upgraded " + "'%s' from version %llu to version %llu.\n"), + zpool_get_name(zhp), oldversion, version); + } + + return (0); +} + +static int +upgrade_enable_all(zpool_handle_t *zhp, int *countp) +{ + int i, ret, count; + boolean_t firstff = B_TRUE; + nvlist_t *enabled = zpool_get_features(zhp); + + count = 0; + for (i = 0; i < SPA_FEATURES; i++) { + const char *fname = spa_feature_table[i].fi_uname; + const char *fguid = spa_feature_table[i].fi_guid; + if (!nvlist_exists(enabled, fguid)) { + char *propname; + verify(-1 != asprintf(&propname, "feature@%s", fname)); + ret = zpool_set_prop(zhp, propname, + ZFS_FEATURE_ENABLED); + if (ret != 0) { + free(propname); + return (ret); + } + count++; + + if (firstff) { + (void) printf(gettext("Enabled the " + "following features on '%s':\n"), + zpool_get_name(zhp)); + firstff = B_FALSE; + } + (void) printf(gettext(" %s\n"), fname); + free(propname); + } + } + + if (countp != NULL) + *countp = count; + return (0); +} + static int upgrade_cb(zpool_handle_t *zhp, void *arg) { upgrade_cbdata_t *cbp = arg; nvlist_t *config; uint64_t version; - int ret = 0; + boolean_t printnl = B_FALSE; + int ret; config = zpool_get_config(zhp, NULL); verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION, &version) == 0); - if (!cbp->cb_newer && version < SPA_VERSION) { - if (!cbp->cb_all) { - if (cbp->cb_first) { - (void) printf(gettext("The following pools are " - "out of date, and can be upgraded. After " - "being\nupgraded, these pools will no " - "longer be accessible by older software " - "versions.\n\n")); - (void) printf(gettext("VER POOL\n")); - (void) printf(gettext("--- ------------\n")); - cbp->cb_first = B_FALSE; - } + assert(SPA_VERSION_IS_SUPPORTED(version)); - (void) printf("%2llu %s\n", (u_longlong_t)version, - zpool_get_name(zhp)); - } else { + if (version < cbp->cb_version) { + cbp->cb_first = B_FALSE; + ret = upgrade_version(zhp, cbp->cb_version); + if (ret != 0) + return (ret); +#ifdef __FreeBSD__ + root_pool_upgrade_check(zhp, cbp->cb_poolname, + sizeof(cbp->cb_poolname)); +#endif /* ___FreeBSD__ */ + printnl = B_TRUE; + +#ifdef illumos + /* + * If they did "zpool upgrade -a", then we could + * be doing ioctls to different pools. We need + * to log this history once to each pool, and bypass + * the normal history logging that happens in main(). + */ + (void) zpool_log_history(g_zfs, history_str); + log_history = B_FALSE; +#endif + } + + if (cbp->cb_version >= SPA_VERSION_FEATURES) { + int count; + ret = upgrade_enable_all(zhp, &count); + if (ret != 0) + return (ret); + + if (count > 0) { cbp->cb_first = B_FALSE; - ret = zpool_upgrade(zhp, cbp->cb_version); - if (!ret) { - (void) printf(gettext("Successfully upgraded " - "'%s'\n\n"), zpool_get_name(zhp)); - if (cbp->cb_poolname[0] == '\0' && - is_root_pool(zhp)) { - (void) strlcpy(cbp->cb_poolname, - zpool_get_name(zhp), - sizeof(cbp->cb_poolname)); - } - } + printnl = B_TRUE; } - } else if (cbp->cb_newer && version > SPA_VERSION) { - assert(!cbp->cb_all); + } + + if (printnl) { + (void) printf(gettext("\n")); + } + + return (0); +} + +static int +upgrade_list_older_cb(zpool_handle_t *zhp, void *arg) +{ + upgrade_cbdata_t *cbp = arg; + nvlist_t *config; + uint64_t version; + + config = zpool_get_config(zhp, NULL); + verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION, + &version) == 0); + assert(SPA_VERSION_IS_SUPPORTED(version)); + + if (version < SPA_VERSION_FEATURES) { if (cbp->cb_first) { (void) printf(gettext("The following pools are " - "formatted using a newer software version and\n" - "cannot be accessed on the current system.\n\n")); + "formatted with legacy version numbers and can\n" + "be upgraded to use feature flags. After " + "being upgraded, these pools\nwill no " + "longer be accessible by software that does not " + "support feature\nflags.\n\n")); (void) printf(gettext("VER POOL\n")); (void) printf(gettext("--- ------------\n")); cbp->cb_first = B_FALSE; @@ -4319,14 +4599,65 @@ upgrade_cb(zpool_handle_t *zhp, void *arg) zpool_get_name(zhp)); } - zpool_close(zhp); - return (ret); + return (0); +} + +static int +upgrade_list_disabled_cb(zpool_handle_t *zhp, void *arg) +{ + upgrade_cbdata_t *cbp = arg; + nvlist_t *config; + uint64_t version; + + config = zpool_get_config(zhp, NULL); + verify(nvlist_lookup_uint64(config, ZPOOL_CONFIG_VERSION, + &version) == 0); + + if (version >= SPA_VERSION_FEATURES) { + int i; + boolean_t poolfirst = B_TRUE; + nvlist_t *enabled = zpool_get_features(zhp); + + for (i = 0; i < SPA_FEATURES; i++) { + const char *fguid = spa_feature_table[i].fi_guid; + const char *fname = spa_feature_table[i].fi_uname; + if (!nvlist_exists(enabled, fguid)) { + if (cbp->cb_first) { + (void) printf(gettext("\nSome " + "supported features are not " + "enabled on the following pools. " + "Once a\nfeature is enabled the " + "pool may become incompatible with " + "software\nthat does not support " + "the feature. See " + "zpool-features(7) for " + "details.\n\n")); + (void) printf(gettext("POOL " + "FEATURE\n")); + (void) printf(gettext("------" + "---------\n")); + cbp->cb_first = B_FALSE; + } + + if (poolfirst) { + (void) printf(gettext("%s\n"), + zpool_get_name(zhp)); + poolfirst = B_FALSE; + } + + (void) printf(gettext(" %s\n"), fname); + } + } + } + + return (0); } /* ARGSUSED */ static int upgrade_one(zpool_handle_t *zhp, void *data) { + boolean_t printnl = B_FALSE; upgrade_cbdata_t *cbp = data; uint64_t cur_version; int ret; @@ -4341,30 +4672,53 @@ upgrade_one(zpool_handle_t *zhp, void *data) cur_version = zpool_get_prop_int(zhp, ZPOOL_PROP_VERSION, NULL); if (cur_version > cbp->cb_version) { (void) printf(gettext("Pool '%s' is already formatted " - "using more current version '%llu'.\n"), + "using more current version '%llu'.\n\n"), zpool_get_name(zhp), cur_version); return (0); } - if (cur_version == cbp->cb_version) { + + if (cbp->cb_version != SPA_VERSION && cur_version == cbp->cb_version) { (void) printf(gettext("Pool '%s' is already formatted " - "using the current version.\n"), zpool_get_name(zhp)); + "using version %llu.\n\n"), zpool_get_name(zhp), + cbp->cb_version); return (0); } - ret = zpool_upgrade(zhp, cbp->cb_version); + if (cur_version != cbp->cb_version) { + printnl = B_TRUE; + ret = upgrade_version(zhp, cbp->cb_version); + if (ret != 0) + return (ret); +#ifdef __FreeBSD__ + root_pool_upgrade_check(zhp, cbp->cb_poolname, + sizeof(cbp->cb_poolname)); +#endif /* ___FreeBSD__ */ + } - if (!ret) { - (void) printf(gettext("Successfully upgraded '%s' " - "from version %llu to version %llu\n\n"), - zpool_get_name(zhp), (u_longlong_t)cur_version, - (u_longlong_t)cbp->cb_version); - if (cbp->cb_poolname[0] == '\0' && is_root_pool(zhp)) { - (void) strlcpy(cbp->cb_poolname, zpool_get_name(zhp), + if (cbp->cb_version >= SPA_VERSION_FEATURES) { + int count = 0; + ret = upgrade_enable_all(zhp, &count); + if (ret != 0) + return (ret); + + if (count != 0) { + printnl = B_TRUE; +#ifdef __FreeBSD__ + root_pool_upgrade_check(zhp, cbp->cb_poolname, sizeof(cbp->cb_poolname)); +#endif /* __FreeBSD __*/ + } else if (cur_version == SPA_VERSION) { + (void) printf(gettext("Pool '%s' already has all " + "supported features enabled.\n"), + zpool_get_name(zhp)); } } - return (ret != 0); + if (printnl) { + (void) printf(gettext("\n")); + } + + return (0); } /* @@ -4383,6 +4737,7 @@ zpool_do_upgrade(int argc, char **argv) upgrade_cbdata_t cb = { 0 }; int ret = 0; boolean_t showversions = B_FALSE; + boolean_t upgradeall = B_FALSE; char *end; @@ -4390,15 +4745,15 @@ zpool_do_upgrade(int argc, char **argv) while ((c = getopt(argc, argv, ":avV:")) != -1) { switch (c) { case 'a': - cb.cb_all = B_TRUE; + upgradeall = B_TRUE; break; case 'v': showversions = B_TRUE; break; case 'V': cb.cb_version = strtoll(optarg, &end, 10); - if (*end != '\0' || cb.cb_version > SPA_VERSION || - cb.cb_version < SPA_VERSION_1) { + if (*end != '\0' || + !SPA_VERSION_IS_SUPPORTED(cb.cb_version)) { (void) fprintf(stderr, gettext("invalid version '%s'\n"), optarg); usage(B_FALSE); @@ -4423,19 +4778,19 @@ zpool_do_upgrade(int argc, char **argv) if (cb.cb_version == 0) { cb.cb_version = SPA_VERSION; - } else if (!cb.cb_all && argc == 0) { + } else if (!upgradeall && argc == 0) { (void) fprintf(stderr, gettext("-V option is " "incompatible with other arguments\n")); usage(B_FALSE); } if (showversions) { - if (cb.cb_all || argc != 0) { + if (upgradeall || argc != 0) { (void) fprintf(stderr, gettext("-v option is " "incompatible with other arguments\n")); usage(B_FALSE); } - } else if (cb.cb_all) { + } else if (upgradeall) { if (argc != 0) { (void) fprintf(stderr, gettext("-a option should not " "be used along with a pool name\n")); @@ -4443,11 +4798,27 @@ zpool_do_upgrade(int argc, char **argv) } } - (void) printf(gettext("This system is currently running " - "ZFS pool version %llu.\n\n"), SPA_VERSION); - cb.cb_first = B_TRUE; + (void) printf(gettext("This system supports ZFS pool feature " + "flags.\n\n")); if (showversions) { - (void) printf(gettext("The following versions are " + int i; + + (void) printf(gettext("The following features are " + "supported:\n\n")); + (void) printf(gettext("FEAT DESCRIPTION\n")); + (void) printf("----------------------------------------------" + "---------------\n"); + for (i = 0; i < SPA_FEATURES; i++) { + zfeature_info_t *fi = &spa_feature_table[i]; + const char *ro = fi->fi_can_readonly ? + " (read-only compatible)" : ""; + + (void) printf("%-37s%s\n", fi->fi_uname, ro); + (void) printf(" %s\n", fi->fi_desc); + } + (void) printf("\n"); + + (void) printf(gettext("The following legacy versions are also " "supported:\n\n")); (void) printf(gettext("VER DESCRIPTION\n")); (void) printf("--- -----------------------------------------" @@ -4490,32 +4861,44 @@ zpool_do_upgrade(int argc, char **argv) (void) printf(gettext("\nFor more information on a particular " "version, including supported releases,\n")); (void) printf(gettext("see the ZFS Administration Guide.\n\n")); - } else if (argc == 0) { - int notfound; - + } else if (argc == 0 && upgradeall) { + cb.cb_first = B_TRUE; ret = zpool_iter(g_zfs, upgrade_cb, &cb); - notfound = cb.cb_first; - - if (!cb.cb_all && ret == 0) { - if (!cb.cb_first) - (void) printf("\n"); - cb.cb_first = B_TRUE; - cb.cb_newer = B_TRUE; - ret = zpool_iter(g_zfs, upgrade_cb, &cb); - if (!cb.cb_first) { - notfound = B_FALSE; - (void) printf("\n"); + if (ret == 0 && cb.cb_first) { + if (cb.cb_version == SPA_VERSION) { + (void) printf(gettext("All pools are already " + "formatted using feature flags.\n\n")); + (void) printf(gettext("Every feature flags " + "pool already has all supported features " + "enabled.\n")); + } else { + (void) printf(gettext("All pools are already " + "formatted with version %llu or higher.\n"), + cb.cb_version); } } + } else if (argc == 0) { + cb.cb_first = B_TRUE; + ret = zpool_iter(g_zfs, upgrade_list_older_cb, &cb); + assert(ret == 0); + + if (cb.cb_first) { + (void) printf(gettext("All pools are formatted " + "using feature flags.\n\n")); + } else { + (void) printf(gettext("\nUse 'zpool upgrade -v' " + "for a list of available legacy versions.\n")); + } + + cb.cb_first = B_TRUE; + ret = zpool_iter(g_zfs, upgrade_list_disabled_cb, &cb); + assert(ret == 0); - if (ret == 0) { - if (notfound) - (void) printf(gettext("All pools are formatted " - "using this version.\n")); - else if (!cb.cb_all) - (void) printf(gettext("Use 'zpool upgrade -v' " - "for a list of available versions and " - "their associated\nfeatures.\n")); + if (cb.cb_first) { + (void) printf(gettext("Every feature flags pool has " + "all supported features enabled.\n")); + } else { + (void) printf(gettext("\n")); } } else { ret = for_each_pool(argc, argv, B_FALSE, NULL, @@ -4705,13 +5088,26 @@ get_callback(zpool_handle_t *zhp, void *data) pl == cbp->cb_proplist) continue; - if (zpool_get_prop(zhp, pl->pl_prop, - value, sizeof (value), &srctype) != 0) - continue; + if (pl->pl_prop == ZPROP_INVAL && + (zpool_prop_feature(pl->pl_user_prop) || + zpool_prop_unsupported(pl->pl_user_prop))) { + srctype = ZPROP_SRC_LOCAL; + + if (zpool_prop_get_feature(zhp, pl->pl_user_prop, + value, sizeof (value)) == 0) { + zprop_print_one_property(zpool_get_name(zhp), + cbp, pl->pl_user_prop, value, srctype, + NULL, NULL); + } + } else { + if (zpool_get_prop(zhp, pl->pl_prop, value, + sizeof (value), &srctype) != 0) + continue; - zprop_print_one_property(zpool_get_name(zhp), cbp, - zpool_prop_to_name(pl->pl_prop), value, srctype, NULL, - NULL); + zprop_print_one_property(zpool_get_name(zhp), cbp, + zpool_prop_to_name(pl->pl_prop), value, srctype, + NULL, NULL); + } } return (0); } @@ -4723,8 +5119,11 @@ zpool_do_get(int argc, char **argv) zprop_list_t fake_name = { 0 }; int ret; - if (argc < 3) + if (argc < 2) { + (void) fprintf(stderr, gettext("missing property " + "argument\n")); usage(B_FALSE); + } cb.cb_first = B_TRUE; cb.cb_sources = ZPROP_SRC_ALL; @@ -4734,7 +5133,7 @@ zpool_do_get(int argc, char **argv) cb.cb_columns[3] = GET_COL_SOURCE; cb.cb_type = ZFS_TYPE_POOL; - if (zprop_get_list(g_zfs, argv[1], &cb.cb_proplist, + if (zprop_get_list(g_zfs, argv[1], &cb.cb_proplist, ZFS_TYPE_POOL) != 0) usage(B_FALSE); diff --git a/cddl/contrib/opensolaris/cmd/ztest/ztest.c b/cddl/contrib/opensolaris/cmd/ztest/ztest.c index 487f95a05..af0b0fabb 100644 --- a/cddl/contrib/opensolaris/cmd/ztest/ztest.c +++ b/cddl/contrib/opensolaris/cmd/ztest/ztest.c @@ -107,6 +107,7 @@ #include #include #include +#include #include #include #include @@ -329,6 +330,7 @@ ztest_func_t ztest_vdev_add_remove; ztest_func_t ztest_vdev_aux_add_remove; ztest_func_t ztest_split_pool; ztest_func_t ztest_reguid; +ztest_func_t ztest_spa_upgrade; uint64_t zopt_always = 0ULL * NANOSEC; /* all the time */ uint64_t zopt_incessant = 1ULL * NANOSEC / 10; /* every 1/10 second */ @@ -362,8 +364,9 @@ ztest_info_t ztest_info[] = { { ztest_reguid, 1, &zopt_sometimes }, { ztest_spa_rename, 1, &zopt_rarely }, { ztest_scrub, 1, &zopt_rarely }, + { ztest_spa_upgrade, 1, &zopt_rarely }, { ztest_dsl_dataset_promote_busy, 1, &zopt_rarely }, - { ztest_vdev_attach_detach, 1, &zopt_rarely }, + { ztest_vdev_attach_detach, 1, &zopt_rarely }, { ztest_vdev_LUN_growth, 1, &zopt_rarely }, { ztest_vdev_add_remove, 1, &ztest_opts.zo_vdevtime }, @@ -414,6 +417,13 @@ static spa_t *ztest_spa = NULL; static ztest_ds_t *ztest_ds; static mutex_t ztest_vdev_lock; + +/* + * The ztest_name_lock protects the pool and dataset namespace used by + * the individual tests. To modify the namespace, consumers must grab + * this lock as writer. Grabbing the lock as reader will ensure that the + * namespace does not change while the lock is held. + */ static rwlock_t ztest_name_lock; static boolean_t ztest_dump_core = B_TRUE; @@ -781,7 +791,7 @@ ztest_get_ashift(void) } static nvlist_t * -make_vdev_file(char *path, char *aux, size_t size, uint64_t ashift) +make_vdev_file(char *path, char *aux, char *pool, size_t size, uint64_t ashift) { char pathbuf[MAXPATHLEN]; uint64_t vdev; @@ -797,12 +807,13 @@ make_vdev_file(char *path, char *aux, size_t size, uint64_t ashift) vdev = ztest_shared->zs_vdev_aux; (void) snprintf(path, sizeof (pathbuf), ztest_aux_template, ztest_opts.zo_dir, - ztest_opts.zo_pool, aux, vdev); + pool == NULL ? ztest_opts.zo_pool : pool, + aux, vdev); } else { vdev = ztest_shared->zs_vdev_next_leaf++; (void) snprintf(path, sizeof (pathbuf), ztest_dev_template, ztest_opts.zo_dir, - ztest_opts.zo_pool, vdev); + pool == NULL ? ztest_opts.zo_pool : pool, vdev); } } @@ -824,17 +835,18 @@ make_vdev_file(char *path, char *aux, size_t size, uint64_t ashift) } static nvlist_t * -make_vdev_raidz(char *path, char *aux, size_t size, uint64_t ashift, int r) +make_vdev_raidz(char *path, char *aux, char *pool, size_t size, + uint64_t ashift, int r) { nvlist_t *raidz, **child; int c; if (r < 2) - return (make_vdev_file(path, aux, size, ashift)); + return (make_vdev_file(path, aux, pool, size, ashift)); child = umem_alloc(r * sizeof (nvlist_t *), UMEM_NOFAIL); for (c = 0; c < r; c++) - child[c] = make_vdev_file(path, aux, size, ashift); + child[c] = make_vdev_file(path, aux, pool, size, ashift); VERIFY(nvlist_alloc(&raidz, NV_UNIQUE_NAME, 0) == 0); VERIFY(nvlist_add_string(raidz, ZPOOL_CONFIG_TYPE, @@ -853,19 +865,19 @@ make_vdev_raidz(char *path, char *aux, size_t size, uint64_t ashift, int r) } static nvlist_t * -make_vdev_mirror(char *path, char *aux, size_t size, uint64_t ashift, - int r, int m) +make_vdev_mirror(char *path, char *aux, char *pool, size_t size, + uint64_t ashift, int r, int m) { nvlist_t *mirror, **child; int c; if (m < 1) - return (make_vdev_raidz(path, aux, size, ashift, r)); + return (make_vdev_raidz(path, aux, pool, size, ashift, r)); child = umem_alloc(m * sizeof (nvlist_t *), UMEM_NOFAIL); for (c = 0; c < m; c++) - child[c] = make_vdev_raidz(path, aux, size, ashift, r); + child[c] = make_vdev_raidz(path, aux, pool, size, ashift, r); VERIFY(nvlist_alloc(&mirror, NV_UNIQUE_NAME, 0) == 0); VERIFY(nvlist_add_string(mirror, ZPOOL_CONFIG_TYPE, @@ -882,8 +894,8 @@ make_vdev_mirror(char *path, char *aux, size_t size, uint64_t ashift, } static nvlist_t * -make_vdev_root(char *path, char *aux, size_t size, uint64_t ashift, - int log, int r, int m, int t) +make_vdev_root(char *path, char *aux, char *pool, size_t size, uint64_t ashift, + int log, int r, int m, int t) { nvlist_t *root, **child; int c; @@ -893,7 +905,8 @@ make_vdev_root(char *path, char *aux, size_t size, uint64_t ashift, child = umem_alloc(t * sizeof (nvlist_t *), UMEM_NOFAIL); for (c = 0; c < t; c++) { - child[c] = make_vdev_mirror(path, aux, size, ashift, r, m); + child[c] = make_vdev_mirror(path, aux, pool, size, ashift, + r, m); VERIFY(nvlist_add_uint64(child[c], ZPOOL_CONFIG_IS_LOG, log) == 0); } @@ -911,6 +924,27 @@ make_vdev_root(char *path, char *aux, size_t size, uint64_t ashift, return (root); } +/* + * Find a random spa version. Returns back a random spa version in the + * range [initial_version, SPA_VERSION_FEATURES]. + */ +static uint64_t +ztest_random_spa_version(uint64_t initial_version) +{ + uint64_t version = initial_version; + + if (version <= SPA_VERSION_BEFORE_FEATURES) { + version = version + + ztest_random(SPA_VERSION_BEFORE_FEATURES - version + 1); + } + + if (version > SPA_VERSION_BEFORE_FEATURES) + version = SPA_VERSION_FEATURES; + + ASSERT(SPA_VERSION_IS_SUPPORTED(version)); + return (version); +} + static int ztest_random_blocksize(void) { @@ -973,7 +1007,7 @@ ztest_dsl_prop_set_uint64(char *osname, zfs_prop_t prop, uint64_t value, ztest_record_enospc(FTAG); return (error); } - ASSERT3U(error, ==, 0); + ASSERT0(error); VERIFY3U(dsl_prop_get(osname, propname, sizeof (curval), 1, &curval, setpoint), ==, 0); @@ -1005,7 +1039,7 @@ ztest_spa_prop_set_uint64(zpool_prop_t prop, uint64_t value) ztest_record_enospc(FTAG); return (error); } - ASSERT3U(error, ==, 0); + ASSERT0(error); return (error); } @@ -1702,7 +1736,7 @@ ztest_replay_setattr(ztest_ds_t *zd, lr_setattr_t *lr, boolean_t byteswap) ASSERT3U(lr->lr_size, >=, sizeof (*bbt)); ASSERT3U(lr->lr_size, <=, db->db_size); - VERIFY3U(dmu_set_bonus(db, lr->lr_size, tx), ==, 0); + VERIFY0(dmu_set_bonus(db, lr->lr_size, tx)); bbt = ztest_bt_bonus(db); ztest_bt_generate(bbt, os, lr->lr_foid, -1ULL, lr->lr_mode, txg, crtxg); @@ -2224,6 +2258,7 @@ ztest_zil_remount(ztest_ds_t *zd, uint64_t id) { objset_t *os = zd->zd_os; + VERIFY(mutex_lock(&zd->zd_dirobj_lock) == 0); (void) rw_wrlock(&zd->zd_zilog_lock); /* zfsvfs_teardown() */ @@ -2234,6 +2269,7 @@ ztest_zil_remount(ztest_ds_t *zd, uint64_t id) zil_replay(os, zd, ztest_replay_vector); (void) rw_unlock(&zd->zd_zilog_lock); + VERIFY(mutex_unlock(&zd->zd_dirobj_lock) == 0); } /* @@ -2251,7 +2287,7 @@ ztest_spa_create_destroy(ztest_ds_t *zd, uint64_t id) /* * Attempt to create using a bad file. */ - nvroot = make_vdev_root("/dev/bogus", NULL, 0, 0, 0, 0, 0, 1); + nvroot = make_vdev_root("/dev/bogus", NULL, NULL, 0, 0, 0, 0, 0, 1); VERIFY3U(ENOENT, ==, spa_create("ztest_bad_file", nvroot, NULL, NULL, NULL)); nvlist_free(nvroot); @@ -2259,7 +2295,7 @@ ztest_spa_create_destroy(ztest_ds_t *zd, uint64_t id) /* * Attempt to create using a bad mirror. */ - nvroot = make_vdev_root("/dev/bogus", NULL, 0, 0, 0, 0, 2, 1); + nvroot = make_vdev_root("/dev/bogus", NULL, NULL, 0, 0, 0, 0, 2, 1); VERIFY3U(ENOENT, ==, spa_create("ztest_bad_mirror", nvroot, NULL, NULL, NULL)); nvlist_free(nvroot); @@ -2269,7 +2305,7 @@ ztest_spa_create_destroy(ztest_ds_t *zd, uint64_t id) * what's in the nvroot; we should fail with EEXIST. */ (void) rw_rdlock(&ztest_name_lock); - nvroot = make_vdev_root("/dev/bogus", NULL, 0, 0, 0, 0, 0, 1); + nvroot = make_vdev_root("/dev/bogus", NULL, NULL, 0, 0, 0, 0, 0, 1); VERIFY3U(EEXIST, ==, spa_create(zo->zo_pool, nvroot, NULL, NULL, NULL)); nvlist_free(nvroot); VERIFY3U(0, ==, spa_open(zo->zo_pool, &spa, FTAG)); @@ -2279,6 +2315,78 @@ ztest_spa_create_destroy(ztest_ds_t *zd, uint64_t id) (void) rw_unlock(&ztest_name_lock); } +/* ARGSUSED */ +void +ztest_spa_upgrade(ztest_ds_t *zd, uint64_t id) +{ + spa_t *spa; + uint64_t initial_version = SPA_VERSION_INITIAL; + uint64_t version, newversion; + nvlist_t *nvroot, *props; + char *name; + + VERIFY0(mutex_lock(&ztest_vdev_lock)); + name = kmem_asprintf("%s_upgrade", ztest_opts.zo_pool); + + /* + * Clean up from previous runs. + */ + (void) spa_destroy(name); + + nvroot = make_vdev_root(NULL, NULL, name, ztest_opts.zo_vdev_size, 0, + 0, ztest_opts.zo_raidz, ztest_opts.zo_mirrors, 1); + + /* + * If we're configuring a RAIDZ device then make sure that the + * the initial version is capable of supporting that feature. + */ + switch (ztest_opts.zo_raidz_parity) { + case 0: + case 1: + initial_version = SPA_VERSION_INITIAL; + break; + case 2: + initial_version = SPA_VERSION_RAIDZ2; + break; + case 3: + initial_version = SPA_VERSION_RAIDZ3; + break; + } + + /* + * Create a pool with a spa version that can be upgraded. Pick + * a value between initial_version and SPA_VERSION_BEFORE_FEATURES. + */ + do { + version = ztest_random_spa_version(initial_version); + } while (version > SPA_VERSION_BEFORE_FEATURES); + + props = fnvlist_alloc(); + fnvlist_add_uint64(props, + zpool_prop_to_name(ZPOOL_PROP_VERSION), version); + VERIFY0(spa_create(name, nvroot, props, NULL, NULL)); + fnvlist_free(nvroot); + fnvlist_free(props); + + VERIFY0(spa_open(name, &spa, FTAG)); + VERIFY3U(spa_version(spa), ==, version); + newversion = ztest_random_spa_version(version + 1); + + if (ztest_opts.zo_verbose >= 4) { + (void) printf("upgrading spa version from %llu to %llu\n", + (u_longlong_t)version, (u_longlong_t)newversion); + } + + spa_upgrade(spa, newversion); + VERIFY3U(spa_version(spa), >, version); + VERIFY3U(spa_version(spa), ==, fnvlist_lookup_uint64(spa->spa_config, + zpool_prop_to_name(ZPOOL_PROP_VERSION))); + spa_close(spa, FTAG); + + strfree(name); + VERIFY0(mutex_unlock(&ztest_vdev_lock)); +} + static vdev_t * vdev_lookup_by_path(vdev_t *vd, const char *path) { @@ -2368,7 +2476,7 @@ ztest_vdev_add_remove(ztest_ds_t *zd, uint64_t id) /* * Make 1/4 of the devices be log devices. */ - nvroot = make_vdev_root(NULL, NULL, + nvroot = make_vdev_root(NULL, NULL, NULL, ztest_opts.zo_vdev_size, 0, ztest_random(4) == 0, ztest_opts.zo_raidz, zs->zs_mirrors, 1); @@ -2445,7 +2553,7 @@ ztest_vdev_aux_add_remove(ztest_ds_t *zd, uint64_t id) /* * Add a new device. */ - nvlist_t *nvroot = make_vdev_root(NULL, aux, + nvlist_t *nvroot = make_vdev_root(NULL, aux, NULL, (ztest_opts.zo_vdev_size * 5) / 4, 0, 0, 0, 0, 1); error = spa_vdev_add(spa, nvroot); if (error != 0) @@ -2714,7 +2822,7 @@ ztest_vdev_attach_detach(ztest_ds_t *zd, uint64_t id) /* * Build the nvlist describing newpath. */ - root = make_vdev_root(newpath, NULL, newvd == NULL ? newsize : 0, + root = make_vdev_root(newpath, NULL, NULL, newvd == NULL ? newsize : 0, ashift, 0, 0, 0, 1); error = spa_vdev_attach(spa, oldguid, root, replacing); @@ -3035,7 +3143,7 @@ ztest_objset_destroy_cb(const char *name, void *arg) error = dmu_object_info(os, ZTEST_DIROBJ, &doi); if (error != ENOENT) { /* We could have crashed in the middle of destroying it */ - ASSERT3U(error, ==, 0); + ASSERT0(error); ASSERT3U(doi.doi_type, ==, DMU_OT_ZAP_OTHER); ASSERT3S(doi.doi_physical_blocks_512, >=, 0); } @@ -3448,10 +3556,10 @@ ztest_dmu_read_write(ztest_ds_t *zd, uint64_t id) */ error = dmu_read(os, packobj, packoff, packsize, packbuf, DMU_READ_PREFETCH); - ASSERT3U(error, ==, 0); + ASSERT0(error); error = dmu_read(os, bigobj, bigoff, bigsize, bigbuf, DMU_READ_PREFETCH); - ASSERT3U(error, ==, 0); + ASSERT0(error); /* * Get a tx for the mods to both packobj and bigobj. @@ -3761,10 +3869,10 @@ ztest_dmu_read_write_zcopy(ztest_ds_t *zd, uint64_t id) if (i != 0 || ztest_random(2) != 0) { error = dmu_read(os, packobj, packoff, packsize, packbuf, DMU_READ_PREFETCH); - ASSERT3U(error, ==, 0); + ASSERT0(error); error = dmu_read(os, bigobj, bigoff, bigsize, bigbuf, DMU_READ_PREFETCH); - ASSERT3U(error, ==, 0); + ASSERT0(error); } compare_and_update_pbbufs(s, packbuf, bigbuf, bigsize, n, chunksize, txg); @@ -4035,7 +4143,7 @@ ztest_zap(ztest_ds_t *zd, uint64_t id) if (error == ENOENT) return; - ASSERT3U(error, ==, 0); + ASSERT0(error); tx = dmu_tx_create(os); dmu_tx_hold_zap(tx, object, B_TRUE, NULL); @@ -4231,7 +4339,7 @@ ztest_commit_callback(void *arg, int error) data->zcd_called = B_TRUE; if (error == ECANCELED) { - ASSERT3U(data->zcd_txg, ==, 0); + ASSERT0(data->zcd_txg); ASSERT(!data->zcd_added); /* @@ -4436,7 +4544,7 @@ ztest_spa_prop_get_set(ztest_ds_t *zd, uint64_t id) (void) ztest_spa_prop_set_uint64(ZPOOL_PROP_DEDUPDITTO, ZIO_DEDUPDITTO_MIN + ztest_random(ZIO_DEDUPDITTO_MIN)); - VERIFY3U(spa_prop_get(ztest_spa, &props), ==, 0); + VERIFY0(spa_prop_get(ztest_spa, &props)); if (ztest_opts.zo_verbose >= 6) dump_nvlist(props, 4); @@ -4859,13 +4967,19 @@ ztest_reguid(ztest_ds_t *zd, uint64_t id) { spa_t *spa = ztest_spa; uint64_t orig, load; + int error; orig = spa_guid(spa); load = spa_load_guid(spa); - if (spa_change_guid(spa) != 0) + + (void) rw_wrlock(&ztest_name_lock); + error = spa_change_guid(spa); + (void) rw_unlock(&ztest_name_lock); + + if (error != 0) return; - if (ztest_opts.zo_verbose >= 3) { + if (ztest_opts.zo_verbose >= 4) { (void) printf("Changed guid old %llu -> %llu\n", (u_longlong_t)orig, (u_longlong_t)spa_guid(spa)); } @@ -5255,7 +5369,7 @@ ztest_dataset_open(int d) } ASSERT(error == 0 || error == EEXIST); - VERIFY3U(dmu_objset_hold(name, zd, &os), ==, 0); + VERIFY0(dmu_objset_hold(name, zd, &os)); (void) rw_unlock(&ztest_name_lock); ztest_zd_init(zd, ZTEST_GET_SHARED_DS(d), os); @@ -5539,8 +5653,15 @@ ztest_freeze(void) */ kernel_init(FREAD | FWRITE); VERIFY3U(0, ==, spa_open(ztest_opts.zo_pool, &spa, FTAG)); + ASSERT(spa_freeze_txg(spa) == UINT64_MAX); VERIFY3U(0, ==, ztest_dataset_open(0)); ztest_dataset_close(0); + + spa->spa_debug = B_TRUE; + ztest_spa = spa; + txg_wait_synced(spa_get_dsl(spa), 0); + ztest_reguid(NULL, 0); + spa_close(spa, FTAG); kernel_fini(); } @@ -5575,10 +5696,9 @@ make_random_props() { nvlist_t *props; - if (ztest_random(2) == 0) - return (NULL); - VERIFY(nvlist_alloc(&props, NV_UNIQUE_NAME, 0) == 0); + if (ztest_random(2) == 0) + return (props); VERIFY(nvlist_add_uint64(props, "autoreplace", 1) == 0); return (props); @@ -5606,9 +5726,15 @@ ztest_init(ztest_shared_t *zs) ztest_shared->zs_vdev_next_leaf = 0; zs->zs_splits = 0; zs->zs_mirrors = ztest_opts.zo_mirrors; - nvroot = make_vdev_root(NULL, NULL, ztest_opts.zo_vdev_size, 0, + nvroot = make_vdev_root(NULL, NULL, NULL, ztest_opts.zo_vdev_size, 0, 0, ztest_opts.zo_raidz, zs->zs_mirrors, 1); props = make_random_props(); + for (int i = 0; i < SPA_FEATURES; i++) { + char buf[1024]; + (void) snprintf(buf, sizeof (buf), "feature@%s", + spa_feature_table[i].fi_uname); + VERIFY3U(0, ==, nvlist_add_uint64(props, buf, 0)); + } VERIFY3U(0, ==, spa_create(ztest_opts.zo_pool, nvroot, props, NULL, NULL)); nvlist_free(nvroot); @@ -5616,6 +5742,7 @@ ztest_init(ztest_shared_t *zs) VERIFY3U(0, ==, spa_open(ztest_opts.zo_pool, &spa, FTAG)); zs->zs_metaslab_sz = 1ULL << spa->spa_root_vdev->vdev_child[0]->vdev_ms_shift; + spa_close(spa, FTAG); kernel_fini(); @@ -5654,9 +5781,24 @@ setup_fds(void) ASSERT3U(fd, ==, ZTEST_FD_RAND); } +static int +shared_data_size(ztest_shared_hdr_t *hdr) +{ + int size; + + size = hdr->zh_hdr_size; + size += hdr->zh_opts_size; + size += hdr->zh_size; + size += hdr->zh_stats_size * hdr->zh_stats_count; + size += hdr->zh_ds_size * hdr->zh_ds_count; + + return (size); +} + static void setup_hdr(void) { + int size; ztest_shared_hdr_t *hdr; #ifndef illumos @@ -5667,6 +5809,8 @@ setup_hdr(void) PROT_READ | PROT_WRITE, MAP_SHARED, ZTEST_FD_DATA, 0); ASSERT(hdr != MAP_FAILED); + VERIFY3U(0, ==, ftruncate(ZTEST_FD_DATA, sizeof (ztest_shared_hdr_t))); + hdr->zh_hdr_size = sizeof (ztest_shared_hdr_t); hdr->zh_opts_size = sizeof (ztest_shared_opts_t); hdr->zh_size = sizeof (ztest_shared_t); @@ -5675,6 +5819,9 @@ setup_hdr(void) hdr->zh_ds_size = sizeof (ztest_shared_ds_t); hdr->zh_ds_count = ztest_opts.zo_datasets; + size = shared_data_size(hdr); + VERIFY3U(0, ==, ftruncate(ZTEST_FD_DATA, size)); + (void) munmap((caddr_t)hdr, P2ROUNDUP(sizeof (*hdr), getpagesize())); } @@ -5689,11 +5836,7 @@ setup_data(void) PROT_READ, MAP_SHARED, ZTEST_FD_DATA, 0); ASSERT(hdr != MAP_FAILED); - size = hdr->zh_hdr_size; - size += hdr->zh_opts_size; - size += hdr->zh_size; - size += hdr->zh_stats_size * hdr->zh_stats_count; - size += hdr->zh_ds_size * hdr->zh_ds_count; + size = shared_data_size(hdr); (void) munmap((caddr_t)hdr, P2ROUNDUP(sizeof (*hdr), getpagesize())); hdr = ztest_shared_hdr = (void *)mmap(0, P2ROUNDUP(size, getpagesize()), @@ -5817,6 +5960,8 @@ main(int argc, char **argv) (void) setvbuf(stdout, NULL, _IOLBF, 0); + dprintf_setup(&argc, argv); + if (!ischild) { process_options(argc, argv); diff --git a/cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c b/cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c index 142574873..3302ac798 100644 --- a/cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c +++ b/cddl/contrib/opensolaris/lib/libnvpair/libnvpair.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -802,6 +803,10 @@ dump_nvlist(nvlist_t *list, int indent) while ((elem = nvlist_next_nvpair(list, elem)) != NULL) { switch (nvpair_type(elem)) { + case DATA_TYPE_BOOLEAN: + (void) printf("%*s%s\n", indent, "", nvpair_name(elem)); + break; + case DATA_TYPE_BOOLEAN_VALUE: (void) nvpair_value_boolean_value(elem, &bool_value); (void) printf("%*s%s: %s\n", indent, "", diff --git a/cddl/contrib/opensolaris/lib/libuutil/common/uu_misc.c b/cddl/contrib/opensolaris/lib/libuutil/common/uu_misc.c index 507d4eb13..b673834e4 100644 --- a/cddl/contrib/opensolaris/lib/libuutil/common/uu_misc.c +++ b/cddl/contrib/opensolaris/lib/libuutil/common/uu_misc.c @@ -25,6 +25,8 @@ #include "libuutil_common.h" +#define HAVE_ASSFAIL 1 + #include #include #include diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h index 989dd8072..3a3541056 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h @@ -294,6 +294,15 @@ typedef enum { ZPOOL_STATUS_IO_FAILURE_CONTINUE, /* failed I/O, failmode 'continue' */ ZPOOL_STATUS_BAD_LOG, /* cannot read log chain(s) */ + /* + * If the pool has unsupported features but can still be opened in + * read-only mode, its status is ZPOOL_STATUS_UNSUP_FEAT_WRITE. If the + * pool has unsupported features but cannot be opened at all, its + * status is ZPOOL_STATUS_UNSUP_FEAT_READ. + */ + ZPOOL_STATUS_UNSUP_FEAT_READ, /* unsupported features for read */ + ZPOOL_STATUS_UNSUP_FEAT_WRITE, /* unsupported features for write */ + /* * These faults have no corresponding message ID. At the time we are * checking the status, the original reason for the FMA fault (I/O or @@ -307,7 +316,8 @@ typedef enum { * requiring administrative attention. There is no corresponding * message ID. */ - ZPOOL_STATUS_VERSION_OLDER, /* older on-disk version */ + ZPOOL_STATUS_VERSION_OLDER, /* older legacy on-disk version */ + ZPOOL_STATUS_FEAT_DISABLED, /* supported features are disabled */ ZPOOL_STATUS_RESILVERING, /* device being resilvered */ ZPOOL_STATUS_OFFLINE_DEV, /* device online */ ZPOOL_STATUS_REMOVED_DEV, /* removed device */ @@ -326,6 +336,7 @@ extern void zpool_dump_ddt(const ddt_stat_t *dds, const ddt_histogram_t *ddh); * Statistics and configuration functions. */ extern nvlist_t *zpool_get_config(zpool_handle_t *, nvlist_t **); +extern nvlist_t *zpool_get_features(zpool_handle_t *); extern int zpool_refresh_stats(zpool_handle_t *, boolean_t *); extern int zpool_get_errlog(zpool_handle_t *, nvlist_t **); @@ -338,6 +349,7 @@ extern int zpool_import(libzfs_handle_t *, nvlist_t *, const char *, char *altroot); extern int zpool_import_props(libzfs_handle_t *, nvlist_t *, const char *, nvlist_t *, int); +extern void zpool_print_unsup_feat(nvlist_t *config); /* * Search for pools to import @@ -427,6 +439,8 @@ extern int zfs_prop_get_written_int(zfs_handle_t *zhp, const char *propname, uint64_t *propvalue); extern int zfs_prop_get_written(zfs_handle_t *zhp, const char *propname, char *propbuf, int proplen, boolean_t literal); +extern int zfs_prop_get_feature(zfs_handle_t *zhp, const char *propname, + char *buf, size_t len); extern int zfs_get_snapused_int(zfs_handle_t *firstsnap, zfs_handle_t *lastsnap, uint64_t *usedp); extern uint64_t zfs_prop_get_int(zfs_handle_t *, zfs_prop_t); @@ -454,10 +468,19 @@ extern void zfs_prune_proplist(zfs_handle_t *, uint8_t *); #define ZFS_MOUNTPOINT_NONE "none" #define ZFS_MOUNTPOINT_LEGACY "legacy" +#define ZFS_FEATURE_DISABLED "disabled" +#define ZFS_FEATURE_ENABLED "enabled" +#define ZFS_FEATURE_ACTIVE "active" + +#define ZFS_UNSUPPORTED_INACTIVE "inactive" +#define ZFS_UNSUPPORTED_READONLY "readonly" + /* * zpool property management */ extern int zpool_expand_proplist(zpool_handle_t *, zprop_list_t **); +extern int zpool_prop_get_feature(zpool_handle_t *, const char *, char *, + size_t); extern const char *zpool_prop_default_string(zpool_prop_t); extern uint64_t zpool_prop_default_numeric(zpool_prop_t); extern const char *zpool_prop_column_name(zpool_prop_t); diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_config.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_config.c index dc27238c9..d5ba20fde 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_config.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_config.c @@ -18,11 +18,16 @@ * * CDDL HEADER END */ + /* * Copyright 2009 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + /* * The pool configuration repository is stored in /etc/zfs/zpool.cache as a * single packed nvlist. While it would be nice to just read in this @@ -217,6 +222,36 @@ zpool_get_config(zpool_handle_t *zhp, nvlist_t **oldconfig) return (zhp->zpool_config); } +/* + * Retrieves a list of enabled features and their refcounts and caches it in + * the pool handle. + */ +nvlist_t * +zpool_get_features(zpool_handle_t *zhp) +{ + nvlist_t *config, *features; + + config = zpool_get_config(zhp, NULL); + + if (config == NULL || !nvlist_exists(config, + ZPOOL_CONFIG_FEATURE_STATS)) { + int error; + boolean_t missing = B_FALSE; + + error = zpool_refresh_stats(zhp, &missing); + + if (error != 0 || missing) + return (NULL); + + config = zpool_get_config(zhp, NULL); + } + + verify(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_FEATURE_STATS, + &features) == 0); + + return (features); +} + /* * Refresh the vdev statistics associated with the given pool. This is used in * iostat to show configuration changes and determine the delta from the last @@ -301,6 +336,48 @@ zpool_refresh_stats(zpool_handle_t *zhp, boolean_t *missing) return (0); } +/* + * If the __ZFS_POOL_RESTRICT environment variable is set we only iterate over + * pools it lists. + * + * This is an undocumented feature for use during testing only. + * + * This function returns B_TRUE if the pool should be skipped + * during iteration. + */ +static boolean_t +check_restricted(const char *poolname) +{ + static boolean_t initialized = B_FALSE; + static char *restricted = NULL; + + const char *cur, *end; + int len, namelen; + + if (!initialized) { + initialized = B_TRUE; + restricted = getenv("__ZFS_POOL_RESTRICT"); + } + + if (NULL == restricted) + return (B_FALSE); + + cur = restricted; + namelen = strlen(poolname); + do { + end = strchr(cur, ' '); + len = (NULL == end) ? strlen(cur) : (end - cur); + + if (len == namelen && 0 == strncmp(cur, poolname, len)) { + return (B_FALSE); + } + + cur += (len + 1); + } while (NULL != end); + + return (B_TRUE); +} + /* * Iterate over all pools in the system. */ @@ -324,6 +401,9 @@ zpool_iter(libzfs_handle_t *hdl, zpool_iter_f func, void *data) for (cn = uu_avl_first(hdl->libzfs_ns_avl); cn != NULL; cn = uu_avl_next(hdl->libzfs_ns_avl, cn)) { + if (check_restricted(cn->cn_name)) + continue; + if (zpool_open_silent(hdl, cn->cn_name, &zhp) != 0) { hdl->libzfs_pool_iter--; return (-1); @@ -359,6 +439,9 @@ zfs_iter_root(libzfs_handle_t *hdl, zfs_iter_f func, void *data) for (cn = uu_avl_first(hdl->libzfs_ns_avl); cn != NULL; cn = uu_avl_next(hdl->libzfs_ns_avl, cn)) { + if (check_restricted(cn->cn_name)) + continue; + if ((zhp = make_dataset_handle(hdl, cn->cn_name)) == NULL) continue; diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c index 1d81f91f5..c54dfdb00 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright 2010 Nexenta Systems, Inc. All rights reserved. + * Copyright 2012 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2011 by Delphix. All rights reserved. * Copyright (c) 2012 DEY Storage Systems, Inc. All rights reserved. * Copyright (c) 2011-2012 Pawel Jakub Dawidek . @@ -1430,7 +1430,7 @@ zfs_prop_set(zfs_handle_t *zhp, const char *propname, const char *propval) libzfs_handle_t *hdl = zhp->zfs_hdl; nvlist_t *nvl = NULL, *realprops; zfs_prop_t prop; - boolean_t do_prefix; + boolean_t do_prefix = B_TRUE; uint64_t idx; int added_resv; @@ -1484,14 +1484,17 @@ zfs_prop_set(zfs_handle_t *zhp, const char *propname, const char *propval) } /* - * If the dataset's canmount property is being set to noauto, - * or being set to on and the dataset is already mounted, - * then we want to prevent unmounting & remounting it. + * We don't want to unmount & remount the dataset when changing + * its canmount property to 'on' or 'noauto'. We only use + * the changelist logic to unmount when setting canmount=off. */ - do_prefix = !((prop == ZFS_PROP_CANMOUNT) && - (zprop_string_to_index(prop, propval, &idx, - ZFS_TYPE_DATASET) == 0) && (idx == ZFS_CANMOUNT_NOAUTO || - (idx == ZFS_CANMOUNT_ON && zfs_is_mounted(zhp, NULL)))); + if (prop == ZFS_PROP_CANMOUNT) { + uint64_t idx; + int err = zprop_string_to_index(prop, propval, &idx, + ZFS_TYPE_DATASET); + if (err == 0 && idx != ZFS_CANMOUNT_OFF) + do_prefix = B_FALSE; + } if (do_prefix && (ret = changelist_prefix(cl)) != 0) goto error; @@ -3135,7 +3138,8 @@ zfs_create(libzfs_handle_t *hdl, const char *path, zfs_type_t type, /* * Destroys the given dataset. The caller must make sure that the filesystem - * isn't mounted, and that there are no active dependents. + * isn't mounted, and that there are no active dependents. If the file system + * does not exist this function does nothing. */ int zfs_destroy(zfs_handle_t *zhp, boolean_t defer) @@ -3151,7 +3155,8 @@ zfs_destroy(zfs_handle_t *zhp, boolean_t defer) } zc.zc_defer_destroy = defer; - if (zfs_ioctl(zhp->zfs_hdl, ZFS_IOC_DESTROY, &zc) != 0) { + if (zfs_ioctl(zhp->zfs_hdl, ZFS_IOC_DESTROY, &zc) != 0 && + errno != ENOENT) { return (zfs_standard_error_fmt(zhp->zfs_hdl, errno, dgettext(TEXT_DOMAIN, "cannot destroy '%s'"), zhp->zfs_name)); @@ -3541,7 +3546,7 @@ zfs_rollback(zfs_handle_t *zhp, zfs_handle_t *snap, boolean_t force) zhp->zfs_type == ZFS_TYPE_VOLUME); /* - * Destroy all recent snapshots and its dependends. + * Destroy all recent snapshots and their dependents. */ cb.cb_force = force; cb.cb_target = snap->zfs_name; @@ -4072,35 +4077,40 @@ zfs_userspace(zfs_handle_t *zhp, zfs_userquota_prop_t type, zfs_userspace_cb_t func, void *arg) { zfs_cmd_t zc = { 0 }; - int error; zfs_useracct_t buf[100]; + libzfs_handle_t *hdl = zhp->zfs_hdl; + int ret; (void) strlcpy(zc.zc_name, zhp->zfs_name, sizeof (zc.zc_name)); zc.zc_objset_type = type; zc.zc_nvlist_dst = (uintptr_t)buf; - /* CONSTCOND */ - while (1) { + for (;;) { zfs_useracct_t *zua = buf; zc.zc_nvlist_dst_size = sizeof (buf); - error = ioctl(zhp->zfs_hdl->libzfs_fd, - ZFS_IOC_USERSPACE_MANY, &zc); - if (error || zc.zc_nvlist_dst_size == 0) + if (zfs_ioctl(hdl, ZFS_IOC_USERSPACE_MANY, &zc) != 0) { + char errbuf[ZFS_MAXNAMELEN + 32]; + + (void) snprintf(errbuf, sizeof (errbuf), + dgettext(TEXT_DOMAIN, + "cannot get used/quota for %s"), zc.zc_name); + return (zfs_standard_error_fmt(hdl, errno, errbuf)); + } + if (zc.zc_nvlist_dst_size == 0) break; while (zc.zc_nvlist_dst_size > 0) { - error = func(arg, zua->zu_domain, zua->zu_rid, - zua->zu_space); - if (error != 0) - return (error); + if ((ret = func(arg, zua->zu_domain, zua->zu_rid, + zua->zu_space)) != 0) + return (ret); zua++; zc.zc_nvlist_dst_size -= sizeof (zfs_useracct_t); } } - return (error); + return (0); } int diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c index 7e73d0f9b..7e39b0b78 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -437,8 +437,8 @@ get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boolean_t active_ok) uint_t i, nspares, nl2cache; boolean_t config_seen; uint64_t best_txg; - char *name, *hostname, *comment; - uint64_t version, guid; + char *name, *hostname; + uint64_t guid; uint_t children = 0; nvlist_t **child = NULL; uint_t holes; @@ -524,61 +524,54 @@ get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boolean_t active_ok) * configuration: * * version - * pool guid - * name + * pool guid + * name + * pool txg (if available) * comment (if available) - * pool state + * pool state * hostid (if available) * hostname (if available) */ - uint64_t state; + uint64_t state, version, pool_txg; + char *comment = NULL; + + version = fnvlist_lookup_uint64(tmp, + ZPOOL_CONFIG_VERSION); + fnvlist_add_uint64(config, + ZPOOL_CONFIG_VERSION, version); + guid = fnvlist_lookup_uint64(tmp, + ZPOOL_CONFIG_POOL_GUID); + fnvlist_add_uint64(config, + ZPOOL_CONFIG_POOL_GUID, guid); + name = fnvlist_lookup_string(tmp, + ZPOOL_CONFIG_POOL_NAME); + fnvlist_add_string(config, + ZPOOL_CONFIG_POOL_NAME, name); - verify(nvlist_lookup_uint64(tmp, - ZPOOL_CONFIG_VERSION, &version) == 0); - if (nvlist_add_uint64(config, - ZPOOL_CONFIG_VERSION, version) != 0) - goto nomem; - verify(nvlist_lookup_uint64(tmp, - ZPOOL_CONFIG_POOL_GUID, &guid) == 0); - if (nvlist_add_uint64(config, - ZPOOL_CONFIG_POOL_GUID, guid) != 0) - goto nomem; - verify(nvlist_lookup_string(tmp, - ZPOOL_CONFIG_POOL_NAME, &name) == 0); - if (nvlist_add_string(config, - ZPOOL_CONFIG_POOL_NAME, name) != 0) - goto nomem; + if (nvlist_lookup_uint64(tmp, + ZPOOL_CONFIG_POOL_TXG, &pool_txg) == 0) + fnvlist_add_uint64(config, + ZPOOL_CONFIG_POOL_TXG, pool_txg); - /* - * COMMENT is optional, don't bail if it's not - * there, instead, set it to NULL. - */ if (nvlist_lookup_string(tmp, - ZPOOL_CONFIG_COMMENT, &comment) != 0) - comment = NULL; - else if (nvlist_add_string(config, - ZPOOL_CONFIG_COMMENT, comment) != 0) - goto nomem; + ZPOOL_CONFIG_COMMENT, &comment) == 0) + fnvlist_add_string(config, + ZPOOL_CONFIG_COMMENT, comment); - verify(nvlist_lookup_uint64(tmp, - ZPOOL_CONFIG_POOL_STATE, &state) == 0); - if (nvlist_add_uint64(config, - ZPOOL_CONFIG_POOL_STATE, state) != 0) - goto nomem; + state = fnvlist_lookup_uint64(tmp, + ZPOOL_CONFIG_POOL_STATE); + fnvlist_add_uint64(config, + ZPOOL_CONFIG_POOL_STATE, state); hostid = 0; if (nvlist_lookup_uint64(tmp, ZPOOL_CONFIG_HOSTID, &hostid) == 0) { - if (nvlist_add_uint64(config, - ZPOOL_CONFIG_HOSTID, hostid) != 0) - goto nomem; - verify(nvlist_lookup_string(tmp, - ZPOOL_CONFIG_HOSTNAME, - &hostname) == 0); - if (nvlist_add_string(config, - ZPOOL_CONFIG_HOSTNAME, - hostname) != 0) - goto nomem; + fnvlist_add_uint64(config, + ZPOOL_CONFIG_HOSTID, hostid); + hostname = fnvlist_lookup_string(tmp, + ZPOOL_CONFIG_HOSTNAME); + fnvlist_add_string(config, + ZPOOL_CONFIG_HOSTNAME, hostname); } config_seen = B_TRUE; diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c index 723a52336..03bc3e658 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c @@ -43,6 +43,7 @@ #include "zfs_prop.h" #include "libzfs_impl.h" #include "zfs_comutil.h" +#include "zfeature_common.h" static int read_efi_label(nvlist_t *config, diskaddr_t *sb); @@ -301,6 +302,7 @@ zpool_get_prop(zpool_handle_t *zhp, zpool_prop_t prop, char *buf, size_t len, case ZPOOL_PROP_SIZE: case ZPOOL_PROP_ALLOCATED: case ZPOOL_PROP_FREE: + case ZPOOL_PROP_FREEING: case ZPOOL_PROP_EXPANDSZ: (void) zfs_nicenum(intval, buf, len); break; @@ -326,6 +328,12 @@ zpool_get_prop(zpool_handle_t *zhp, zpool_prop_t prop, char *buf, size_t len, (void) strlcpy(buf, zpool_state_to_name(intval, vs->vs_aux), len); break; + case ZPOOL_PROP_VERSION: + if (intval >= SPA_VERSION_FEATURES) { + (void) snprintf(buf, len, "-"); + break; + } + /* FALLTHROUGH */ default: (void) snprintf(buf, len, "%llu", intval); } @@ -430,10 +438,48 @@ zpool_valid_proplist(libzfs_handle_t *hdl, const char *poolname, while ((elem = nvlist_next_nvpair(props, elem)) != NULL) { const char *propname = nvpair_name(elem); + prop = zpool_name_to_prop(propname); + if (prop == ZPROP_INVAL && zpool_prop_feature(propname)) { + int err; + zfeature_info_t *feature; + char *fname = strchr(propname, '@') + 1; + + err = zfeature_lookup_name(fname, &feature); + if (err != 0) { + ASSERT3U(err, ==, ENOENT); + zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, + "invalid feature '%s'"), fname); + (void) zfs_error(hdl, EZFS_BADPROP, errbuf); + goto error; + } + + if (nvpair_type(elem) != DATA_TYPE_STRING) { + zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, + "'%s' must be a string"), propname); + (void) zfs_error(hdl, EZFS_BADPROP, errbuf); + goto error; + } + + (void) nvpair_value_string(elem, &strval); + if (strcmp(strval, ZFS_FEATURE_ENABLED) != 0) { + zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, + "property '%s' can only be set to " + "'enabled'"), propname); + (void) zfs_error(hdl, EZFS_BADPROP, errbuf); + goto error; + } + + if (nvlist_add_uint64(retprops, propname, 0) != 0) { + (void) no_memory(hdl); + goto error; + } + continue; + } + /* * Make sure this property is valid and applies to this type. */ - if ((prop = zpool_name_to_prop(propname)) == ZPROP_INVAL) { + if (prop == ZPROP_INVAL) { zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "invalid property '%s'"), propname); (void) zfs_error(hdl, EZFS_BADPROP, errbuf); @@ -456,7 +502,8 @@ zpool_valid_proplist(libzfs_handle_t *hdl, const char *poolname, */ switch (prop) { case ZPOOL_PROP_VERSION: - if (intval < version || intval > SPA_VERSION) { + if (intval < version || + !SPA_VERSION_IS_SUPPORTED(intval)) { zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "property '%s' number %d is invalid."), propname, intval); @@ -680,10 +727,77 @@ zpool_expand_proplist(zpool_handle_t *zhp, zprop_list_t **plp) libzfs_handle_t *hdl = zhp->zpool_hdl; zprop_list_t *entry; char buf[ZFS_MAXPROPLEN]; + nvlist_t *features = NULL; + zprop_list_t **last; + boolean_t firstexpand = (NULL == *plp); if (zprop_expand_list(hdl, plp, ZFS_TYPE_POOL) != 0) return (-1); + last = plp; + while (*last != NULL) + last = &(*last)->pl_next; + + if ((*plp)->pl_all) + features = zpool_get_features(zhp); + + if ((*plp)->pl_all && firstexpand) { + for (int i = 0; i < SPA_FEATURES; i++) { + zprop_list_t *entry = zfs_alloc(hdl, + sizeof (zprop_list_t)); + entry->pl_prop = ZPROP_INVAL; + entry->pl_user_prop = zfs_asprintf(hdl, "feature@%s", + spa_feature_table[i].fi_uname); + entry->pl_width = strlen(entry->pl_user_prop); + entry->pl_all = B_TRUE; + + *last = entry; + last = &entry->pl_next; + } + } + + /* add any unsupported features */ + for (nvpair_t *nvp = nvlist_next_nvpair(features, NULL); + nvp != NULL; nvp = nvlist_next_nvpair(features, nvp)) { + char *propname; + boolean_t found; + zprop_list_t *entry; + + if (zfeature_is_supported(nvpair_name(nvp))) + continue; + + propname = zfs_asprintf(hdl, "unsupported@%s", + nvpair_name(nvp)); + + /* + * Before adding the property to the list make sure that no + * other pool already added the same property. + */ + found = B_FALSE; + entry = *plp; + while (entry != NULL) { + if (entry->pl_user_prop != NULL && + strcmp(propname, entry->pl_user_prop) == 0) { + found = B_TRUE; + break; + } + entry = entry->pl_next; + } + if (found) { + free(propname); + continue; + } + + entry = zfs_alloc(hdl, sizeof (zprop_list_t)); + entry->pl_prop = ZPROP_INVAL; + entry->pl_user_prop = propname; + entry->pl_width = strlen(entry->pl_user_prop); + entry->pl_all = B_TRUE; + + *last = entry; + last = &entry->pl_next; + } + for (entry = *plp; entry != NULL; entry = entry->pl_next) { if (entry->pl_fixed) @@ -700,6 +814,66 @@ zpool_expand_proplist(zpool_handle_t *zhp, zprop_list_t **plp) return (0); } +/* + * Get the state for the given feature on the given ZFS pool. + */ +int +zpool_prop_get_feature(zpool_handle_t *zhp, const char *propname, char *buf, + size_t len) +{ + uint64_t refcount; + boolean_t found = B_FALSE; + nvlist_t *features = zpool_get_features(zhp); + boolean_t supported; + const char *feature = strchr(propname, '@') + 1; + + supported = zpool_prop_feature(propname); + ASSERT(supported || zpool_prop_unsupported(propname)); + + /* + * Convert from feature name to feature guid. This conversion is + * unecessary for unsupported@... properties because they already + * use guids. + */ + if (supported) { + int ret; + zfeature_info_t *fi; + + ret = zfeature_lookup_name(feature, &fi); + if (ret != 0) { + (void) strlcpy(buf, "-", len); + return (ENOTSUP); + } + feature = fi->fi_guid; + } + + if (nvlist_lookup_uint64(features, feature, &refcount) == 0) + found = B_TRUE; + + if (supported) { + if (!found) { + (void) strlcpy(buf, ZFS_FEATURE_DISABLED, len); + } else { + if (refcount == 0) + (void) strlcpy(buf, ZFS_FEATURE_ENABLED, len); + else + (void) strlcpy(buf, ZFS_FEATURE_ACTIVE, len); + } + } else { + if (found) { + if (refcount == 0) { + (void) strcpy(buf, ZFS_UNSUPPORTED_INACTIVE); + } else { + (void) strcpy(buf, ZFS_UNSUPPORTED_READONLY); + } + } else { + (void) strlcpy(buf, "-", len); + return (ENOTSUP); + } + } + + return (0); +} /* * Don't start the slice at the default block of 34; many storage @@ -1286,8 +1460,10 @@ zpool_rewind_exclaim(libzfs_handle_t *hdl, const char *name, boolean_t dryrun, if (!hdl->libzfs_printerr || config == NULL) return; - if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, &nv) != 0) + if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, &nv) != 0 || + nvlist_lookup_nvlist(nv, ZPOOL_CONFIG_REWIND_INFO, &nv) != 0) { return; + } if (nvlist_lookup_uint64(nv, ZPOOL_CONFIG_LOAD_TIME, &rewindto) != 0) return; @@ -1343,6 +1519,7 @@ zpool_explain_recover(libzfs_handle_t *hdl, const char *name, int reason, /* All attempted rewinds failed if ZPOOL_CONFIG_LOAD_TIME missing */ if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, &nv) != 0 || + nvlist_lookup_nvlist(nv, ZPOOL_CONFIG_REWIND_INFO, &nv) != 0 || nvlist_lookup_uint64(nv, ZPOOL_CONFIG_LOAD_TIME, &rewindto) != 0) goto no_info; @@ -1465,6 +1642,30 @@ print_vdev_tree(libzfs_handle_t *hdl, const char *name, nvlist_t *nv, } } +void +zpool_print_unsup_feat(nvlist_t *config) +{ + nvlist_t *nvinfo, *unsup_feat; + + verify(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, &nvinfo) == + 0); + verify(nvlist_lookup_nvlist(nvinfo, ZPOOL_CONFIG_UNSUP_FEAT, + &unsup_feat) == 0); + + for (nvpair_t *nvp = nvlist_next_nvpair(unsup_feat, NULL); nvp != NULL; + nvp = nvlist_next_nvpair(unsup_feat, nvp)) { + char *desc; + + verify(nvpair_type(nvp) == DATA_TYPE_STRING); + verify(nvpair_value_string(nvp, &desc) == 0); + + if (strlen(desc) > 0) + (void) printf("\t%s (%s)\n", nvpair_name(nvp), desc); + else + (void) printf("\t%s\n", nvpair_name(nvp)); + } +} + /* * Import the given pool using the known configuration and a list of * properties to be set. The configuration should have come from @@ -1571,6 +1772,22 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *config, const char *newname, switch (error) { case ENOTSUP: + if (nv != NULL && nvlist_lookup_nvlist(nv, + ZPOOL_CONFIG_LOAD_INFO, &nvinfo) == 0 && + nvlist_exists(nvinfo, ZPOOL_CONFIG_UNSUP_FEAT)) { + (void) printf(dgettext(TEXT_DOMAIN, "This " + "pool uses the following feature(s) not " + "supported by this system:\n")); + zpool_print_unsup_feat(nv); + if (nvlist_exists(nvinfo, + ZPOOL_CONFIG_CAN_RDONLY)) { + (void) printf(dgettext(TEXT_DOMAIN, + "All unsupported features are only " + "required for writing to the pool." + "\nThe pool can be imported using " + "'-o readonly=on'.\n")); + } + } /* * Unsupported version. */ diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c index 7a6418b37..6bb16ac73 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2012, Joyent, Inc. All rights reserved. * Copyright (c) 2012 Pawel Jakub Dawidek . * All rights reserved. @@ -1387,7 +1387,6 @@ zfs_send(zfs_handle_t *zhp, const char *fromsnap, const char *tosnap, avl_tree_t *fsavl = NULL; static uint64_t holdseq; int spa_version; - boolean_t holdsnaps = B_FALSE; pthread_t tid; int pipefd[2]; dedup_arg_t dda = { 0 }; @@ -1410,11 +1409,6 @@ zfs_send(zfs_handle_t *zhp, const char *fromsnap, const char *tosnap, } } - if (!flags->dryrun && zfs_spa_version(zhp, &spa_version) == 0 && - spa_version >= SPA_VERSION_USERREFS && - (flags->doall || flags->replicate)) - holdsnaps = B_TRUE; - if (flags->dedup && !flags->dryrun) { featureflags |= (DMU_BACKUP_FEATURE_DEDUP | DMU_BACKUP_FEATURE_DEDUPPROPS); @@ -1536,7 +1530,18 @@ zfs_send(zfs_handle_t *zhp, const char *fromsnap, const char *tosnap, sdd.filter_cb_arg = cb_arg; if (debugnvp) sdd.debugnv = *debugnvp; - if (holdsnaps || flags->progress) { + + /* + * Some flags require that we place user holds on the datasets that are + * being sent so they don't get destroyed during the send. We can skip + * this step if the pool is imported read-only since the datasets cannot + * be destroyed. + */ + if (!flags->dryrun && !zpool_get_prop_int(zfs_get_pool_handle(zhp), + ZPOOL_PROP_READONLY, NULL) && + zfs_spa_version(zhp, &spa_version) == 0 && + spa_version >= SPA_VERSION_USERREFS && + (flags->doall || flags->replicate)) { ++holdseq; (void) snprintf(sdd.holdtag, sizeof (sdd.holdtag), ".send-%d-%llu", getpid(), (u_longlong_t)holdseq); diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c index 24725ec04..560bacdc3 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_status.c @@ -18,8 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -42,6 +44,7 @@ #include #include #include "libzfs_impl.h" +#include "zfeature_common.h" /* * Message ID table. This must be kept in sync with the ZPOOL_STATUS_* defines @@ -213,6 +216,20 @@ check_status(nvlist_t *config, boolean_t isimport) vs->vs_aux == VDEV_AUX_VERSION_NEWER) return (ZPOOL_STATUS_VERSION_NEWER); + /* + * Unsupported feature(s). + */ + if (vs->vs_state == VDEV_STATE_CANT_OPEN && + vs->vs_aux == VDEV_AUX_UNSUP_FEAT) { + nvlist_t *nvinfo; + + verify(nvlist_lookup_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, + &nvinfo) == 0); + if (nvlist_exists(nvinfo, ZPOOL_CONFIG_CAN_RDONLY)) + return (ZPOOL_STATUS_UNSUP_FEAT_WRITE); + return (ZPOOL_STATUS_UNSUP_FEAT_READ); + } + /* * Check that the config is complete. */ @@ -300,9 +317,33 @@ check_status(nvlist_t *config, boolean_t isimport) /* * Outdated, but usable, version */ - if (version < SPA_VERSION) + if (SPA_VERSION_IS_SUPPORTED(version) && version != SPA_VERSION) return (ZPOOL_STATUS_VERSION_OLDER); + /* + * Usable pool with disabled features + */ + if (version >= SPA_VERSION_FEATURES) { + int i; + nvlist_t *feat; + + if (isimport) { + feat = fnvlist_lookup_nvlist(config, + ZPOOL_CONFIG_LOAD_INFO); + feat = fnvlist_lookup_nvlist(feat, + ZPOOL_CONFIG_ENABLED_FEAT); + } else { + feat = fnvlist_lookup_nvlist(config, + ZPOOL_CONFIG_FEATURE_STATS); + } + + for (i = 0; i < SPA_FEATURES; i++) { + zfeature_info_t *fi = &spa_feature_table[i]; + if (!nvlist_exists(feat, fi->fi_guid)) + return (ZPOOL_STATUS_FEAT_DISABLED); + } + } + return (ZPOOL_STATUS_OK); } diff --git a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_util.c b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_util.c index c903696fe..0db2dfd1b 100644 --- a/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_util.c +++ b/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_util.c @@ -18,9 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -50,6 +51,7 @@ #include "libzfs_impl.h" #include "zfs_prop.h" +#include "zfeature_common.h" int aok; @@ -119,7 +121,8 @@ libzfs_error_description(libzfs_handle_t *hdl) case EZFS_RESILVERING: return (dgettext(TEXT_DOMAIN, "currently resilvering")); case EZFS_BADVERSION: - return (dgettext(TEXT_DOMAIN, "unsupported version")); + return (dgettext(TEXT_DOMAIN, "unsupported version or " + "feature")); case EZFS_POOLUNAVAIL: return (dgettext(TEXT_DOMAIN, "pool is unavailable")); case EZFS_DEVOVERFLOW: @@ -656,6 +659,7 @@ libzfs_init(void) zfs_prop_init(); zpool_prop_init(); + zpool_feature_init(); libzfs_mnttab_init(hdl); return (hdl); @@ -1325,9 +1329,11 @@ addlist(libzfs_handle_t *hdl, char *propname, zprop_list_t **listp, * this is a pool property or if this isn't a user-defined * dataset property, */ - if (prop == ZPROP_INVAL && (type == ZFS_TYPE_POOL || - (!zfs_prop_user(propname) && !zfs_prop_userquota(propname) && - !zfs_prop_written(propname)))) { + if (prop == ZPROP_INVAL && ((type == ZFS_TYPE_POOL && + !zpool_prop_feature(propname) && + !zpool_prop_unsupported(propname)) || + (type == ZFS_TYPE_DATASET && !zfs_prop_user(propname) && + !zfs_prop_userquota(propname) && !zfs_prop_written(propname)))) { zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "invalid property '%s'"), propname); return (zfs_error(hdl, EZFS_BADPROP, @@ -1339,7 +1345,8 @@ addlist(libzfs_handle_t *hdl, char *propname, zprop_list_t **listp, entry->pl_prop = prop; if (prop == ZPROP_INVAL) { - if ((entry->pl_user_prop = zfs_strdup(hdl, propname)) == NULL) { + if ((entry->pl_user_prop = zfs_strdup(hdl, propname)) == + NULL) { free(entry); return (-1); } diff --git a/cddl/contrib/opensolaris/lib/libzpool/common/kernel.c b/cddl/contrib/opensolaris/lib/libzpool/common/kernel.c index 2c0778777..56bf7181d 100644 --- a/cddl/contrib/opensolaris/lib/libzpool/common/kernel.c +++ b/cddl/contrib/opensolaris/lib/libzpool/common/kernel.c @@ -474,7 +474,9 @@ vn_rdwr(int uio, vnode_t *vp, void *addr, ssize_t len, offset_t offset, * To simulate partial disk writes, we split writes into two * system calls so that the process can be killed in between. */ - split = (len > 0 ? rand() % len : 0); + int sectors = len >> SPA_MINBLOCKSHIFT; + split = (sectors > 0 ? rand() % sectors : 0) << + SPA_MINBLOCKSHIFT; iolen = pwrite64(vp->v_fd, addr, split, offset); iolen += pwrite64(vp->v_fd, (char *)addr + split, len - split, offset + split); diff --git a/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h b/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h index 52224b4bc..b3da1395b 100644 --- a/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h +++ b/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h @@ -34,7 +34,6 @@ extern "C" { #define _SYS_RWLOCK_H #define _SYS_CONDVAR_H #define _SYS_SYSTM_H -#define _SYS_DEBUG_H #define _SYS_T_LOCK_H #define _SYS_VNODE_H #define _SYS_VFS_H @@ -75,7 +74,6 @@ extern "C" { #include #include #include -#include #include #include #include @@ -85,6 +83,7 @@ extern "C" { #include #include #include +#include #define ZFS_EXPORTS_PATH "/etc/zfs/exports" @@ -124,60 +123,6 @@ extern void vpanic(const char *, __va_list); extern int aok; -/* This definition is copied from assert.h. */ -#if defined(__STDC__) -#if __STDC_VERSION__ - 0 >= 199901L -#define zverify(EX) (void)((EX) || (aok) || \ - (__assert(#EX, __FILE__, __LINE__), 0)) -#else -#define zverify(EX) (void)((EX) || (aok) || \ - (__assert(#EX, __FILE__, __LINE__), 0)) -#endif /* __STDC_VERSION__ - 0 >= 199901L */ -#else -#define zverify(EX) (void)((EX) || (aok) || \ - (_assert("EX", __FILE__, __LINE__), 0)) -#endif /* __STDC__ */ - - -#define VERIFY zverify -#define ASSERT zverify -#undef assert -#define assert zverify - -extern void __assert(const char *, const char *, int); - -#ifdef lint -#define VERIFY3_IMPL(x, y, z, t) if (x == z) ((void)0) -#else -/* BEGIN CSTYLED */ -#define VERIFY3_IMPL(LEFT, OP, RIGHT, TYPE) do { \ - const TYPE __left = (TYPE)(LEFT); \ - const TYPE __right = (TYPE)(RIGHT); \ - if (!(__left OP __right) && (!aok)) { \ - char *__buf = alloca(256); \ - (void) snprintf(__buf, 256, "%s %s %s (0x%llx %s 0x%llx)", \ - #LEFT, #OP, #RIGHT, \ - (u_longlong_t)__left, #OP, (u_longlong_t)__right); \ - __assert(__buf, __FILE__, __LINE__); \ - } \ -_NOTE(CONSTCOND) } while (0) -/* END CSTYLED */ -#endif /* lint */ - -#define VERIFY3S(x, y, z) VERIFY3_IMPL(x, y, z, int64_t) -#define VERIFY3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t) -#define VERIFY3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) - -#ifdef NDEBUG -#define ASSERT3S(x, y, z) ((void)0) -#define ASSERT3U(x, y, z) ((void)0) -#define ASSERT3P(x, y, z) ((void)0) -#else -#define ASSERT3S(x, y, z) VERIFY3S(x, y, z) -#define ASSERT3U(x, y, z) VERIFY3U(x, y, z) -#define ASSERT3P(x, y, z) VERIFY3P(x, y, z) -#endif - /* * DTrace SDT probes have different signatures in userland than they do in * kernel. If they're being used in kernel code, re-define them out of diff --git a/cddl/lib/libnvpair/Makefile b/cddl/lib/libnvpair/Makefile index 7bf500193..1a97bf909 100644 --- a/cddl/lib/libnvpair/Makefile +++ b/cddl/lib/libnvpair/Makefile @@ -8,11 +8,16 @@ LIB= nvpair SRCS= libnvpair.c \ nvpair_alloc_system.c \ nvpair_alloc_fixed.c \ - nvpair.c + nvpair.c \ + fnvpair.c CFLAGS+= -I${.CURDIR}/../../../cddl/compat/opensolaris/include +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libzpool/common CFLAGS+= -I${.CURDIR}/../../../sys/cddl/compat/opensolaris CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs CFLAGS+= -I${.CURDIR}/../../../sys +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/head +CFLAGS+= -I${.CURDIR}/../../../cddl/compat/opensolaris/lib/libumem .include diff --git a/cddl/lib/libzfs/Makefile b/cddl/lib/libzfs/Makefile index 74dfe74b4..500024dbf 100644 --- a/cddl/lib/libzfs/Makefile +++ b/cddl/lib/libzfs/Makefile @@ -6,8 +6,8 @@ .PATH: ${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libzfs/common LIB= zfs -DPADD= ${LIBMD} ${LIBPTHREAD} ${LIBUMEM} ${LIBUTIL} ${LIBM} -LDADD= -lmd -lpthread -lumem -lutil -lm +DPADD= ${LIBMD} ${LIBPTHREAD} ${LIBUMEM} ${LIBUTIL} ${LIBM} ${LIBNVPAIR} +LDADD= -lmd -lpthread -lumem -lutil -lm -lnvpair SRCS= deviceid.c \ fsshare.c \ @@ -27,6 +27,7 @@ SRCS+= libzfs_changelist.c \ libzfs_sendrecv.c \ libzfs_status.c \ libzfs_util.c \ + zfeature_common.c \ zfs_comutil.c \ zfs_deleg.c \ zfs_fletcher.c \ diff --git a/cddl/lib/libzpool/Makefile b/cddl/lib/libzpool/Makefile index befe46703..b348bcd89 100644 --- a/cddl/lib/libzpool/Makefile +++ b/cddl/lib/libzpool/Makefile @@ -63,4 +63,7 @@ NO_PROFILE= CSTD= c99 +CFLAGS+= -DDEBUG=1 +#DEBUG_FLAGS+= -g + .include diff --git a/cddl/sbin/zpool/Makefile b/cddl/sbin/zpool/Makefile index b87586389..0b0b922a0 100644 --- a/cddl/sbin/zpool/Makefile +++ b/cddl/sbin/zpool/Makefile @@ -5,7 +5,7 @@ .PATH: ${.CURDIR}/../../../sys/cddl/contrib/opensolaris/common/zfs PROG= zpool -MAN= zpool.8 +MAN= zpool.8 zpool-features.7 SRCS= zpool_main.c zpool_vdev.c zpool_iter.c zpool_util.c zfs_comutil.c SRCS+= timestamp.c diff --git a/cddl/usr.bin/ztest/Makefile b/cddl/usr.bin/ztest/Makefile index deea97440..8faccd9a6 100644 --- a/cddl/usr.bin/ztest/Makefile +++ b/cddl/usr.bin/ztest/Makefile @@ -10,16 +10,20 @@ CFLAGS+= -I${.CURDIR}/../../compat/opensolaris/include CFLAGS+= -I${.CURDIR}/../../compat/opensolaris/lib/libumem CFLAGS+= -I${.CURDIR}/../../contrib/opensolaris/lib/libzpool/common CFLAGS+= -I${.CURDIR}/../../contrib/opensolaris/lib/libnvpair +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/common/zfs CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common/sys CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common CFLAGS+= -I${.CURDIR}/../../contrib/opensolaris/head CFLAGS+= -I${.CURDIR}/../../lib/libumem -DPADD= ${LIBM} ${LIBNVPAIR} ${LIBUMEM} ${LIBZPOOL} \ - ${LIBPTHREAD} ${LIBAVL} -LDADD= -lm -lnvpair -lumem -lzpool -lpthread -lavl +DPADD= ${LIBGEOM} ${LIBM} ${LIBNVPAIR} ${LIBUMEM} ${LIBZPOOL} \ + ${LIBPTHREAD} ${LIBAVL} ${LIBZFS} ${LIBUUTIL} +LDADD= -lgeom -lm -lnvpair -lumem -lzpool -lpthread -lavl -lzfs -luutil CSTD= c99 +CFLAGS+= -DDEBUG=1 +#DEBUG_FLAGS+= -g + .include diff --git a/cddl/usr.sbin/Makefile b/cddl/usr.sbin/Makefile index 49db06b61..399720027 100644 --- a/cddl/usr.sbin/Makefile +++ b/cddl/usr.sbin/Makefile @@ -5,11 +5,13 @@ SUBDIR= ${_dtrace} \ ${_dtruss} \ ${_lockstat} \ - ${_zdb} + ${_zdb} \ + ${_zhack} .if ${MK_ZFS} != "no" .if ${MK_LIBTHR} != "no" _zdb= zdb +_zhack= zhack .endif .endif diff --git a/cddl/usr.sbin/zdb/Makefile b/cddl/usr.sbin/zdb/Makefile index f515d0040..5e33d12e6 100644 --- a/cddl/usr.sbin/zdb/Makefile +++ b/cddl/usr.sbin/zdb/Makefile @@ -26,4 +26,7 @@ LDADD= -lgeom -lm -lnvpair -lpthread -lumem -luutil -lzfs -lzpool CSTD= c99 +CFLAGS+= -DDEBUG=1 +#DEBUG_FLAGS+= -g + .include diff --git a/cddl/usr.sbin/zhack/Makefile b/cddl/usr.sbin/zhack/Makefile new file mode 100644 index 000000000..97ef5751b --- /dev/null +++ b/cddl/usr.sbin/zhack/Makefile @@ -0,0 +1,32 @@ +# $FreeBSD$ + +.PATH: ${.CURDIR}/../../../cddl/contrib/opensolaris/cmd/zhack + +PROG= zhack +NO_MAN= + +WARNS?= 0 +CSTD= c99 + +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/compat/opensolaris +CFLAGS+= -I${.CURDIR}/../../../cddl/compat/opensolaris/include +CFLAGS+= -I${.CURDIR}/../../../cddl/compat/opensolaris/lib/libumem +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libnvpair +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libuutil/common +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libzfs/common +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/lib/libzpool/common +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/uts/common/sys +CFLAGS+= -I${.CURDIR}/../../../sys/cddl/contrib/opensolaris/common/zfs +CFLAGS+= -I${.CURDIR}/../../../cddl/contrib/opensolaris/head +CFLAGS+= -I${.CURDIR}/../../lib/libumem + +DPADD= ${LIBGEOM} ${LIBM} ${LIBNVPAIR} ${LIBPTHREAD} ${LIBUMEM} \ + ${LIBUUTIL} ${LIBZFS} ${LIBZPOOL} +LDADD= -lgeom -lm -lnvpair -lpthread -lumem -luutil -lzfs -lzpool + +CFLAGS+= -DDEBUG=1 +#DEBUG_FLAGS+= -g + +.include diff --git a/rescue/rescue/Makefile b/rescue/rescue/Makefile index bde81c3c5..7d5572bd1 100644 --- a/rescue/rescue/Makefile +++ b/rescue/rescue/Makefile @@ -141,7 +141,7 @@ CRUNCH_LIBS+= -lalias -lcam -lcurses -ldevstat -lipsec CRUNCH_LIBS+= -lipx .endif .if ${MK_ZFS} != "no" -CRUNCH_LIBS+= -lavl -lnvpair -lpthread -lzfs -luutil -lumem +CRUNCH_LIBS+= -lavl -lzfs -lnvpair -lpthread -luutil -lumem .endif CRUNCH_LIBS+= -lgeom -lbsdxml -ljail -lkiconv -lmd -lreadline -lsbuf -lufs -lz diff --git a/sys/boot/zfs/zfsimpl.c b/sys/boot/zfs/zfsimpl.c index 682b1212a..1a82fe6c4 100644 --- a/sys/boot/zfs/zfsimpl.c +++ b/sys/boot/zfs/zfsimpl.c @@ -49,6 +49,13 @@ struct zfsmount { */ static vdev_list_t zfs_vdevs; + /* + * List of ZFS features supported for read + */ +static const char *features_for_read[] = { + NULL +}; + /* * List of all pools, chained through spa_link. */ @@ -200,6 +207,57 @@ nvlist_find(const unsigned char *nvlist, const char *name, int type, return (EIO); } +static int +nvlist_check_features_for_read(const unsigned char *nvlist) +{ + const unsigned char *p, *pair; + int junk; + int encoded_size, decoded_size; + int rc; + + rc = 0; + + p = nvlist; + xdr_int(&p, &junk); + xdr_int(&p, &junk); + + pair = p; + xdr_int(&p, &encoded_size); + xdr_int(&p, &decoded_size); + while (encoded_size && decoded_size) { + int namelen, pairtype; + const char *pairname; + int i, found; + + found = 0; + + xdr_int(&p, &namelen); + pairname = (const char*) p; + p += roundup(namelen, 4); + xdr_int(&p, &pairtype); + + for (i = 0; features_for_read[i] != NULL; i++) { + if (!memcmp(pairname, features_for_read[i], namelen)) { + found = 1; + break; + } + } + + if (!found) { + printf("ZFS: unsupported feature: %s\n", pairname); + rc = EIO; + } + + p = pair + encoded_size; + + pair = p; + xdr_int(&p, &encoded_size); + xdr_int(&p, &decoded_size); + } + + return (rc); +} + /* * Return the next nvlist in an nvlist array. */ @@ -827,6 +885,7 @@ vdev_probe(vdev_phys_read_t *read, void *read_priv, spa_t **spap) uint64_t is_log; const char *pool_name; const unsigned char *vdevs; + const unsigned char *features; int i, rc, is_newer; char *upbuf; const struct uberblock *up; @@ -861,12 +920,19 @@ vdev_probe(vdev_phys_read_t *read, void *read_priv, spa_t **spap) return (EIO); } - if (val > SPA_VERSION) { + if (!SPA_VERSION_IS_SUPPORTED(val)) { printf("ZFS: unsupported ZFS version %u (should be %u)\n", (unsigned) val, (unsigned) SPA_VERSION); return (EIO); } + /* Check ZFS features for read */ + if (nvlist_find(nvlist, + ZPOOL_CONFIG_FEATURES_FOR_READ, + DATA_TYPE_NVLIST, 0, &features) == 0 + && nvlist_check_features_for_read(features) != 0) + return (EIO); + if (nvlist_find(nvlist, ZPOOL_CONFIG_POOL_STATE, DATA_TYPE_UINT64, 0, &val)) { diff --git a/sys/cddl/boot/zfs/zfsimpl.h b/sys/cddl/boot/zfs/zfsimpl.h index df486f8ee..a684c3951 100644 --- a/sys/cddl/boot/zfs/zfsimpl.h +++ b/sys/cddl/boot/zfs/zfsimpl.h @@ -53,6 +53,8 @@ * Use is subject to license terms. */ +#define MAXNAMELEN 256 + /* CRC64 table */ #define ZFS_CRC64_POLY 0xC96C5795D7870F42ULL /* ECMA-182, reflected form */ @@ -508,6 +510,7 @@ typedef enum { #define SPA_VERSION_26 26ULL #define SPA_VERSION_27 27ULL #define SPA_VERSION_28 28ULL +#define SPA_VERSION_5000 5000ULL /* * When bumping up SPA_VERSION, make sure GRUB ZFS understands the on-disk @@ -515,8 +518,8 @@ typedef enum { * and do the appropriate changes. Also bump the version number in * usr/src/grub/capability. */ -#define SPA_VERSION SPA_VERSION_28 -#define SPA_VERSION_STRING "28" +#define SPA_VERSION SPA_VERSION_5000 +#define SPA_VERSION_STRING "5000" /* * Symbolic names for the changes that caused a SPA_VERSION switch. @@ -567,6 +570,12 @@ typedef enum { #define SPA_VERSION_DEADLISTS SPA_VERSION_26 #define SPA_VERSION_FAST_SNAP SPA_VERSION_27 #define SPA_VERSION_MULTI_REPLACE SPA_VERSION_28 +#define SPA_VERSION_BEFORE_FEATURES SPA_VERSION_28 +#define SPA_VERSION_FEATURES SPA_VERSION_5000 + +#define SPA_VERSION_IS_SUPPORTED(v) \ + (((v) >= SPA_VERSION_INITIAL && (v) <= SPA_VERSION_BEFORE_FEATURES) || \ + ((v) >= SPA_VERSION_FEATURES && (v) <= SPA_VERSION)) /* * The following are configuration names used in the nvlist describing a pool's @@ -602,6 +611,7 @@ typedef enum { #define ZPOOL_CONFIG_HOSTNAME "hostname" #define ZPOOL_CONFIG_IS_LOG "is_log" #define ZPOOL_CONFIG_TIMESTAMP "timestamp" /* not stored on disk */ +#define ZPOOL_CONFIG_FEATURES_FOR_READ "features_for_read" /* * The persistent vdev state is stored as separate values rather than a single diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c b/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c index abde30d6a..1e065eae0 100644 --- a/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c @@ -19,9 +19,13 @@ * CDDL HEADER END * * $FreeBSD$ - * + */ +/* + * Copyright 2007 John Birrell . All rights reserved. + * Copyright 2012 Martin Matuska . All rights reserved. */ +#include #include void @@ -68,3 +72,19 @@ cmn_err(int type, const char *fmt, ...) vcmn_err(type, fmt, ap); va_end(ap); } + +int +assfail(const char *a, const char *f, int l) { + + panic("solaris assert: %s, file: %s, line: %d", a, f, l); + + return (0); +} + +void +assfail3(const char *a, uintmax_t lv, const char *op, uintmax_t rv, + const char *f, int l) { + + panic("solaris assert: %s (0x%jx %s 0x%jx), file: %s, line: %d", + a, lv, op, rv, f, l); +} diff --git a/sys/cddl/compat/opensolaris/sys/assfail.h b/sys/cddl/compat/opensolaris/sys/assfail.h new file mode 100644 index 000000000..e6ff2583b --- /dev/null +++ b/sys/cddl/compat/opensolaris/sys/assfail.h @@ -0,0 +1,82 @@ +/*- + * Copyright (c) 2012 Martin Matuska + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _OPENSOLARIS_SYS_ASSFAIL_H_ +#define _OPENSOLARIS_SYS_ASSFAIL_H_ + +#include +#ifndef _KERNEL +#include +#include +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef _KERNEL +int assfail(const char *, const char *, int); +void assfail3(const char *, uintmax_t, const char *, uintmax_t, const char *, + int); +#else /* !defined(_KERNEL) */ + +#ifndef HAVE_ASSFAIL +static __inline int +__assfail(const char *expr, const char *file, int line) +{ + + (void)fprintf(stderr, "Assertion failed: (%s), file %s, line %d.\n", + expr, file, line); + abort(); + /* NOTREACHED */ + return (0); +} +#define assfail __assfail +#endif + +#ifndef HAVE_ASSFAIL3 +static __inline void +__assfail3(const char *expr, uintmax_t lv, const char *op, uintmax_t rv, + const char *file, int line) { + + (void)fprintf(stderr, + "Assertion failed: %s (0x%jx %s 0x%jx), file %s, line %d.\n", + expr, lv, op, rv, file, line); + abort(); + /* NOTREACHED */ +} +#define assfail3 __assfail3 +#endif + +#endif /* !defined(_KERNEL) */ + +#ifdef __cplusplus +} +#endif + +#endif /* _OPENSOLARIS_SYS_ASSFAIL_H_ */ diff --git a/sys/cddl/compat/opensolaris/sys/debug.h b/sys/cddl/compat/opensolaris/sys/debug.h index 34804624e..eb344f837 100644 --- a/sys/cddl/compat/opensolaris/sys/debug.h +++ b/sys/cddl/compat/opensolaris/sys/debug.h @@ -30,19 +30,13 @@ #define _OPENSOLARIS_SYS_DEBUG_H_ #ifdef _KERNEL -#include #include #include_next - -#define assfail(a, f, l) \ - (panic("solaris assert: %s, file: %s, line: %d", (a), (f), (l)), 0) - -#define assfail3(a, lv, op, rv, f, l) \ - panic("solaris assert: %s (0x%jx %s 0x%jx), file: %s, line: %d", \ - (a), (uintmax_t)(lv), (op), (uintmax_t)(rv), (f), (l)) #else /* !_KERNEL */ + #include_next -#endif +#include +#endif /* _KERNEL */ #endif /* _OPENSOLARIS_SYS_DEBUG_H_ */ diff --git a/sys/cddl/contrib/opensolaris/common/nvpair/fnvpair.c b/sys/cddl/contrib/opensolaris/common/nvpair/fnvpair.c new file mode 100644 index 000000000..1b67e6243 --- /dev/null +++ b/sys/cddl/contrib/opensolaris/common/nvpair/fnvpair.c @@ -0,0 +1,498 @@ + +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#include +#ifndef _KERNEL +#include +#else +#include +#include +#endif + +/* + * "Force" nvlist wrapper. + * + * These functions wrap the nvlist_* functions with assertions that assume + * the operation is successful. This allows the caller's code to be much + * more readable, especially for the fnvlist_lookup_* and fnvpair_value_* + * functions, which can return the requested value (rather than filling in + * a pointer). + * + * These functions use NV_UNIQUE_NAME, encoding NV_ENCODE_NATIVE, and allocate + * with KM_SLEEP. + * + * More wrappers should be added as needed -- for example + * nvlist_lookup_*_array and nvpair_value_*_array. + */ + +nvlist_t * +fnvlist_alloc(void) +{ + nvlist_t *nvl; + VERIFY0(nvlist_alloc(&nvl, NV_UNIQUE_NAME, KM_SLEEP)); + return (nvl); +} + +void +fnvlist_free(nvlist_t *nvl) +{ + nvlist_free(nvl); +} + +size_t +fnvlist_size(nvlist_t *nvl) +{ + size_t size; + VERIFY0(nvlist_size(nvl, &size, NV_ENCODE_NATIVE)); + return (size); +} + +/* + * Returns allocated buffer of size *sizep. Caller must free the buffer with + * fnvlist_pack_free(). + */ +char * +fnvlist_pack(nvlist_t *nvl, size_t *sizep) +{ + char *packed = 0; + VERIFY3U(nvlist_pack(nvl, &packed, sizep, NV_ENCODE_NATIVE, + KM_SLEEP), ==, 0); + return (packed); +} + +/*ARGSUSED*/ +void +fnvlist_pack_free(char *pack, size_t size) +{ +#ifdef _KERNEL + kmem_free(pack, size); +#else + free(pack); +#endif +} + +nvlist_t * +fnvlist_unpack(char *buf, size_t buflen) +{ + nvlist_t *rv; + VERIFY0(nvlist_unpack(buf, buflen, &rv, KM_SLEEP)); + return (rv); +} + +nvlist_t * +fnvlist_dup(nvlist_t *nvl) +{ + nvlist_t *rv; + VERIFY0(nvlist_dup(nvl, &rv, KM_SLEEP)); + return (rv); +} + +void +fnvlist_merge(nvlist_t *dst, nvlist_t *src) +{ + VERIFY0(nvlist_merge(dst, src, KM_SLEEP)); +} + +void +fnvlist_add_boolean(nvlist_t *nvl, const char *name) +{ + VERIFY0(nvlist_add_boolean(nvl, name)); +} + +void +fnvlist_add_boolean_value(nvlist_t *nvl, const char *name, boolean_t val) +{ + VERIFY0(nvlist_add_boolean_value(nvl, name, val)); +} + +void +fnvlist_add_byte(nvlist_t *nvl, const char *name, uchar_t val) +{ + VERIFY0(nvlist_add_byte(nvl, name, val)); +} + +void +fnvlist_add_int8(nvlist_t *nvl, const char *name, int8_t val) +{ + VERIFY0(nvlist_add_int8(nvl, name, val)); +} + +void +fnvlist_add_uint8(nvlist_t *nvl, const char *name, uint8_t val) +{ + VERIFY0(nvlist_add_uint8(nvl, name, val)); +} + +void +fnvlist_add_int16(nvlist_t *nvl, const char *name, int16_t val) +{ + VERIFY0(nvlist_add_int16(nvl, name, val)); +} + +void +fnvlist_add_uint16(nvlist_t *nvl, const char *name, uint16_t val) +{ + VERIFY0(nvlist_add_uint16(nvl, name, val)); +} + +void +fnvlist_add_int32(nvlist_t *nvl, const char *name, int32_t val) +{ + VERIFY0(nvlist_add_int32(nvl, name, val)); +} + +void +fnvlist_add_uint32(nvlist_t *nvl, const char *name, uint32_t val) +{ + VERIFY0(nvlist_add_uint32(nvl, name, val)); +} + +void +fnvlist_add_int64(nvlist_t *nvl, const char *name, int64_t val) +{ + VERIFY0(nvlist_add_int64(nvl, name, val)); +} + +void +fnvlist_add_uint64(nvlist_t *nvl, const char *name, uint64_t val) +{ + VERIFY0(nvlist_add_uint64(nvl, name, val)); +} + +void +fnvlist_add_string(nvlist_t *nvl, const char *name, const char *val) +{ + VERIFY0(nvlist_add_string(nvl, name, val)); +} + +void +fnvlist_add_nvlist(nvlist_t *nvl, const char *name, nvlist_t *val) +{ + VERIFY0(nvlist_add_nvlist(nvl, name, val)); +} + +void +fnvlist_add_nvpair(nvlist_t *nvl, nvpair_t *pair) +{ + VERIFY0(nvlist_add_nvpair(nvl, pair)); +} + +void +fnvlist_add_boolean_array(nvlist_t *nvl, const char *name, + boolean_t *val, uint_t n) +{ + VERIFY0(nvlist_add_boolean_array(nvl, name, val, n)); +} + +void +fnvlist_add_byte_array(nvlist_t *nvl, const char *name, uchar_t *val, uint_t n) +{ + VERIFY0(nvlist_add_byte_array(nvl, name, val, n)); +} + +void +fnvlist_add_int8_array(nvlist_t *nvl, const char *name, int8_t *val, uint_t n) +{ + VERIFY0(nvlist_add_int8_array(nvl, name, val, n)); +} + +void +fnvlist_add_uint8_array(nvlist_t *nvl, const char *name, uint8_t *val, uint_t n) +{ + VERIFY0(nvlist_add_uint8_array(nvl, name, val, n)); +} + +void +fnvlist_add_int16_array(nvlist_t *nvl, const char *name, int16_t *val, uint_t n) +{ + VERIFY0(nvlist_add_int16_array(nvl, name, val, n)); +} + +void +fnvlist_add_uint16_array(nvlist_t *nvl, const char *name, + uint16_t *val, uint_t n) +{ + VERIFY0(nvlist_add_uint16_array(nvl, name, val, n)); +} + +void +fnvlist_add_int32_array(nvlist_t *nvl, const char *name, int32_t *val, uint_t n) +{ + VERIFY0(nvlist_add_int32_array(nvl, name, val, n)); +} + +void +fnvlist_add_uint32_array(nvlist_t *nvl, const char *name, + uint32_t *val, uint_t n) +{ + VERIFY0(nvlist_add_uint32_array(nvl, name, val, n)); +} + +void +fnvlist_add_int64_array(nvlist_t *nvl, const char *name, int64_t *val, uint_t n) +{ + VERIFY0(nvlist_add_int64_array(nvl, name, val, n)); +} + +void +fnvlist_add_uint64_array(nvlist_t *nvl, const char *name, + uint64_t *val, uint_t n) +{ + VERIFY0(nvlist_add_uint64_array(nvl, name, val, n)); +} + +void +fnvlist_add_string_array(nvlist_t *nvl, const char *name, + char * const *val, uint_t n) +{ + VERIFY0(nvlist_add_string_array(nvl, name, val, n)); +} + +void +fnvlist_add_nvlist_array(nvlist_t *nvl, const char *name, + nvlist_t **val, uint_t n) +{ + VERIFY0(nvlist_add_nvlist_array(nvl, name, val, n)); +} + +void +fnvlist_remove(nvlist_t *nvl, const char *name) +{ + VERIFY0(nvlist_remove_all(nvl, name)); +} + +void +fnvlist_remove_nvpair(nvlist_t *nvl, nvpair_t *pair) +{ + VERIFY0(nvlist_remove_nvpair(nvl, pair)); +} + +nvpair_t * +fnvlist_lookup_nvpair(nvlist_t *nvl, const char *name) +{ + nvpair_t *rv; + VERIFY0(nvlist_lookup_nvpair(nvl, name, &rv)); + return (rv); +} + +/* returns B_TRUE if the entry exists */ +boolean_t +fnvlist_lookup_boolean(nvlist_t *nvl, const char *name) +{ + return (nvlist_lookup_boolean(nvl, name) == 0); +} + +boolean_t +fnvlist_lookup_boolean_value(nvlist_t *nvl, const char *name) +{ + boolean_t rv; + VERIFY0(nvlist_lookup_boolean_value(nvl, name, &rv)); + return (rv); +} + +uchar_t +fnvlist_lookup_byte(nvlist_t *nvl, const char *name) +{ + uchar_t rv; + VERIFY0(nvlist_lookup_byte(nvl, name, &rv)); + return (rv); +} + +int8_t +fnvlist_lookup_int8(nvlist_t *nvl, const char *name) +{ + int8_t rv; + VERIFY0(nvlist_lookup_int8(nvl, name, &rv)); + return (rv); +} + +int16_t +fnvlist_lookup_int16(nvlist_t *nvl, const char *name) +{ + int16_t rv; + VERIFY0(nvlist_lookup_int16(nvl, name, &rv)); + return (rv); +} + +int32_t +fnvlist_lookup_int32(nvlist_t *nvl, const char *name) +{ + int32_t rv; + VERIFY0(nvlist_lookup_int32(nvl, name, &rv)); + return (rv); +} + +int64_t +fnvlist_lookup_int64(nvlist_t *nvl, const char *name) +{ + int64_t rv; + VERIFY0(nvlist_lookup_int64(nvl, name, &rv)); + return (rv); +} + +uint8_t +fnvlist_lookup_uint8_t(nvlist_t *nvl, const char *name) +{ + uint8_t rv; + VERIFY0(nvlist_lookup_uint8(nvl, name, &rv)); + return (rv); +} + +uint16_t +fnvlist_lookup_uint16(nvlist_t *nvl, const char *name) +{ + uint16_t rv; + VERIFY0(nvlist_lookup_uint16(nvl, name, &rv)); + return (rv); +} + +uint32_t +fnvlist_lookup_uint32(nvlist_t *nvl, const char *name) +{ + uint32_t rv; + VERIFY0(nvlist_lookup_uint32(nvl, name, &rv)); + return (rv); +} + +uint64_t +fnvlist_lookup_uint64(nvlist_t *nvl, const char *name) +{ + uint64_t rv; + VERIFY0(nvlist_lookup_uint64(nvl, name, &rv)); + return (rv); +} + +char * +fnvlist_lookup_string(nvlist_t *nvl, const char *name) +{ + char *rv; + VERIFY0(nvlist_lookup_string(nvl, name, &rv)); + return (rv); +} + +nvlist_t * +fnvlist_lookup_nvlist(nvlist_t *nvl, const char *name) +{ + nvlist_t *rv; + VERIFY0(nvlist_lookup_nvlist(nvl, name, &rv)); + return (rv); +} + +boolean_t +fnvpair_value_boolean_value(nvpair_t *nvp) +{ + boolean_t rv; + VERIFY0(nvpair_value_boolean_value(nvp, &rv)); + return (rv); +} + +uchar_t +fnvpair_value_byte(nvpair_t *nvp) +{ + uchar_t rv; + VERIFY0(nvpair_value_byte(nvp, &rv)); + return (rv); +} + +int8_t +fnvpair_value_int8(nvpair_t *nvp) +{ + int8_t rv; + VERIFY0(nvpair_value_int8(nvp, &rv)); + return (rv); +} + +int16_t +fnvpair_value_int16(nvpair_t *nvp) +{ + int16_t rv; + VERIFY0(nvpair_value_int16(nvp, &rv)); + return (rv); +} + +int32_t +fnvpair_value_int32(nvpair_t *nvp) +{ + int32_t rv; + VERIFY0(nvpair_value_int32(nvp, &rv)); + return (rv); +} + +int64_t +fnvpair_value_int64(nvpair_t *nvp) +{ + int64_t rv; + VERIFY0(nvpair_value_int64(nvp, &rv)); + return (rv); +} + +uint8_t +fnvpair_value_uint8_t(nvpair_t *nvp) +{ + uint8_t rv; + VERIFY0(nvpair_value_uint8(nvp, &rv)); + return (rv); +} + +uint16_t +fnvpair_value_uint16(nvpair_t *nvp) +{ + uint16_t rv; + VERIFY0(nvpair_value_uint16(nvp, &rv)); + return (rv); +} + +uint32_t +fnvpair_value_uint32(nvpair_t *nvp) +{ + uint32_t rv; + VERIFY0(nvpair_value_uint32(nvp, &rv)); + return (rv); +} + +uint64_t +fnvpair_value_uint64(nvpair_t *nvp) +{ + uint64_t rv; + VERIFY0(nvpair_value_uint64(nvp, &rv)); + return (rv); +} + +char * +fnvpair_value_string(nvpair_t *nvp) +{ + char *rv; + VERIFY0(nvpair_value_string(nvp, &rv)); + return (rv); +} + +nvlist_t * +fnvpair_value_nvlist(nvpair_t *nvp) +{ + nvlist_t *rv; + VERIFY0(nvpair_value_nvlist(nvp, &rv)); + return (rv); +} diff --git a/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.c b/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.c new file mode 100644 index 000000000..1a5948e8d --- /dev/null +++ b/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.c @@ -0,0 +1,158 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#ifdef _KERNEL +#include +#else +#include +#include +#endif +#include +#include +#include +#include "zfeature_common.h" + +/* + * Set to disable all feature checks while opening pools, allowing pools with + * unsupported features to be opened. Set for testing only. + */ +boolean_t zfeature_checks_disable = B_FALSE; + +zfeature_info_t spa_feature_table[SPA_FEATURES]; + +/* + * Valid characters for feature guids. This list is mainly for aesthetic + * purposes and could be expanded in the future. There are different allowed + * characters in the guids reverse dns portion (before the colon) and its + * short name (after the colon). + */ +static int +valid_char(char c, boolean_t after_colon) +{ + return ((c >= 'a' && c <= 'z') || + (c >= '0' && c <= '9') || + c == (after_colon ? '_' : '.')); +} + +/* + * Every feature guid must contain exactly one colon which separates a reverse + * dns organization name from the feature's "short" name (e.g. + * "com.company:feature_name"). + */ +boolean_t +zfeature_is_valid_guid(const char *name) +{ + int i; + boolean_t has_colon = B_FALSE; + + i = 0; + while (name[i] != '\0') { + char c = name[i++]; + if (c == ':') { + if (has_colon) + return (B_FALSE); + has_colon = B_TRUE; + continue; + } + if (!valid_char(c, has_colon)) + return (B_FALSE); + } + + return (has_colon); +} + +boolean_t +zfeature_is_supported(const char *guid) +{ + if (zfeature_checks_disable) + return (B_TRUE); + + return (0 == zfeature_lookup_guid(guid, NULL)); +} + +int +zfeature_lookup_guid(const char *guid, zfeature_info_t **res) +{ + for (int i = 0; i < SPA_FEATURES; i++) { + zfeature_info_t *feature = &spa_feature_table[i]; + if (strcmp(guid, feature->fi_guid) == 0) { + if (res != NULL) + *res = feature; + return (0); + } + } + + return (ENOENT); +} + +int +zfeature_lookup_name(const char *name, zfeature_info_t **res) +{ + for (int i = 0; i < SPA_FEATURES; i++) { + zfeature_info_t *feature = &spa_feature_table[i]; + if (strcmp(name, feature->fi_uname) == 0) { + if (res != NULL) + *res = feature; + return (0); + } + } + + return (ENOENT); +} + +static void +zfeature_register(int fid, const char *guid, const char *name, const char *desc, + boolean_t readonly, boolean_t mos, zfeature_info_t **deps) +{ + zfeature_info_t *feature = &spa_feature_table[fid]; + static zfeature_info_t *nodeps[] = { NULL }; + + ASSERT(name != NULL); + ASSERT(desc != NULL); + ASSERT(!readonly || !mos); + ASSERT3U(fid, <, SPA_FEATURES); + ASSERT(zfeature_is_valid_guid(guid)); + + if (deps == NULL) + deps = nodeps; + + feature->fi_guid = guid; + feature->fi_uname = name; + feature->fi_desc = desc; + feature->fi_can_readonly = readonly; + feature->fi_mos = mos; + feature->fi_depends = deps; +} + +void +zpool_feature_init(void) +{ + zfeature_register(SPA_FEATURE_ASYNC_DESTROY, + "com.delphix:async_destroy", "async_destroy", + "Destroy filesystems asynchronously.", B_TRUE, B_FALSE, NULL); + zfeature_register(SPA_FEATURE_EMPTY_BPOBJ, + "com.delphix:empty_bpobj", "empty_bpobj", + "Snapshots use less space.", B_TRUE, B_FALSE, NULL); +} diff --git a/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.h b/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.h new file mode 100644 index 000000000..cb786b225 --- /dev/null +++ b/sys/cddl/contrib/opensolaris/common/zfs/zfeature_common.h @@ -0,0 +1,71 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#ifndef _ZFEATURE_COMMON_H +#define _ZFEATURE_COMMON_H + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +struct zfeature_info; + +typedef struct zfeature_info { + const char *fi_uname; /* User-facing feature name */ + const char *fi_guid; /* On-disk feature identifier */ + const char *fi_desc; /* Feature description */ + boolean_t fi_can_readonly; /* Can open pool readonly w/o support? */ + boolean_t fi_mos; /* Is the feature necessary to read the MOS? */ + struct zfeature_info **fi_depends; /* array; null terminated */ +} zfeature_info_t; + +typedef int (zfeature_func_t)(zfeature_info_t *fi, void *arg); + +#define ZFS_FEATURE_DEBUG + +static enum spa_feature { + SPA_FEATURE_ASYNC_DESTROY, + SPA_FEATURE_EMPTY_BPOBJ, + SPA_FEATURES +} spa_feature_t; + +extern zfeature_info_t spa_feature_table[SPA_FEATURES]; + +extern boolean_t zfeature_is_valid_guid(const char *); + +extern boolean_t zfeature_is_supported(const char *); +extern int zfeature_lookup_guid(const char *, zfeature_info_t **res); +extern int zfeature_lookup_name(const char *, zfeature_info_t **res); + +extern void zpool_feature_init(void); + +#ifdef __cplusplus +} +#endif + +#endif /* _ZFEATURE_COMMON_H */ diff --git a/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c b/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c index 512e06750..72db87937 100644 --- a/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c +++ b/sys/cddl/contrib/opensolaris/common/zfs/zpool_prop.c @@ -79,6 +79,8 @@ zpool_prop_init(void) ZFS_TYPE_POOL, "", "SIZE"); zprop_register_number(ZPOOL_PROP_FREE, "free", 0, PROP_READONLY, ZFS_TYPE_POOL, "", "FREE"); + zprop_register_number(ZPOOL_PROP_FREEING, "freeing", 0, PROP_READONLY, + ZFS_TYPE_POOL, "", "FREEING"); zprop_register_number(ZPOOL_PROP_ALLOCATED, "allocated", 0, PROP_READONLY, ZFS_TYPE_POOL, "", "ALLOC"); zprop_register_number(ZPOOL_PROP_EXPANDSZ, "expandsize", 0, @@ -166,6 +168,26 @@ zpool_prop_default_numeric(zpool_prop_t prop) return (zpool_prop_table[prop].pd_numdefault); } +/* + * Returns true if this is a valid feature@ property. + */ +boolean_t +zpool_prop_feature(const char *name) +{ + static const char *prefix = "feature@"; + return (strncmp(name, prefix, strlen(prefix)) == 0); +} + +/* + * Returns true if this is a valid unsupported@ property. + */ +boolean_t +zpool_prop_unsupported(const char *name) +{ + static const char *prefix = "unsupported@"; + return (strncmp(name, prefix, strlen(prefix)) == 0); +} + int zpool_prop_string_to_index(zpool_prop_t prop, const char *string, uint64_t *index) diff --git a/sys/cddl/contrib/opensolaris/uts/common/Makefile.files b/sys/cddl/contrib/opensolaris/uts/common/Makefile.files index 2ab1d7b8e..30d4f79f4 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/Makefile.files +++ b/sys/cddl/contrib/opensolaris/uts/common/Makefile.files @@ -21,6 +21,7 @@ # # Copyright (c) 1991, 2010, Oracle and/or its affiliates. All rights reserved. +# Copyright (c) 2012 by Delphix. All rights reserved. # # # This Makefile defines all file modules for the directory uts/common @@ -31,6 +32,7 @@ ZFS_COMMON_OBJS += \ arc.o \ bplist.o \ bpobj.o \ + bptree.o \ dbuf.o \ ddt.o \ ddt_zap.o \ @@ -52,6 +54,7 @@ ZFS_COMMON_OBJS += \ dsl_deleg.o \ dsl_prop.o \ dsl_scan.o \ + zfeature.o \ gzip.o \ lzjb.o \ metaslab.o \ @@ -94,11 +97,12 @@ ZFS_COMMON_OBJS += \ zrlock.o ZFS_SHARED_OBJS += \ - zfs_namecheck.o \ - zfs_deleg.o \ - zfs_prop.o \ + zfeature_common.o \ zfs_comutil.o \ + zfs_deleg.o \ zfs_fletcher.o \ + zfs_namecheck.o \ + zfs_prop.o \ zpool_prop.o \ zprop_common.o diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c index 01415f318..ad6e06b07 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c @@ -135,6 +135,14 @@ #include +#ifdef illumos +#ifndef _KERNEL +/* set with ZFS_DEBUG=watch, to enable watchpoints on frozen buffers */ +boolean_t arc_watch = B_FALSE; +int arc_procfd; +#endif +#endif /* illumos */ + static kmutex_t arc_reclaim_thr_lock; static kcondvar_t arc_reclaim_thr_cv; /* used to signal reclaim thr */ static uint8_t arc_thread_exit; @@ -534,6 +542,9 @@ static void arc_get_data_buf(arc_buf_t *buf); static void arc_access(arc_buf_hdr_t *buf, kmutex_t *hash_lock); static int arc_evict_needed(arc_buf_contents_t type); static void arc_evict_ghost(arc_state_t *state, uint64_t spa, int64_t bytes); +#ifdef illumos +static void arc_buf_watch(arc_buf_t *buf); +#endif /* illumos */ static boolean_t l2arc_write_eligible(uint64_t spa_guid, arc_buf_hdr_t *ab); @@ -1069,8 +1080,56 @@ arc_cksum_compute(arc_buf_t *buf, boolean_t force) fletcher_2_native(buf->b_data, buf->b_hdr->b_size, buf->b_hdr->b_freeze_cksum); mutex_exit(&buf->b_hdr->b_freeze_lock); +#ifdef illumos + arc_buf_watch(buf); +#endif /* illumos */ +} + +#ifdef illumos +#ifndef _KERNEL +typedef struct procctl { + long cmd; + prwatch_t prwatch; +} procctl_t; +#endif + +/* ARGSUSED */ +static void +arc_buf_unwatch(arc_buf_t *buf) +{ +#ifndef _KERNEL + if (arc_watch) { + int result; + procctl_t ctl; + ctl.cmd = PCWATCH; + ctl.prwatch.pr_vaddr = (uintptr_t)buf->b_data; + ctl.prwatch.pr_size = 0; + ctl.prwatch.pr_wflags = 0; + result = write(arc_procfd, &ctl, sizeof (ctl)); + ASSERT3U(result, ==, sizeof (ctl)); + } +#endif } +/* ARGSUSED */ +static void +arc_buf_watch(arc_buf_t *buf) +{ +#ifndef _KERNEL + if (arc_watch) { + int result; + procctl_t ctl; + ctl.cmd = PCWATCH; + ctl.prwatch.pr_vaddr = (uintptr_t)buf->b_data; + ctl.prwatch.pr_size = buf->b_hdr->b_size; + ctl.prwatch.pr_wflags = WA_WRITE; + result = write(arc_procfd, &ctl, sizeof (ctl)); + ASSERT3U(result, ==, sizeof (ctl)); + } +#endif +} +#endif /* illumos */ + void arc_buf_thaw(arc_buf_t *buf) { @@ -1095,6 +1154,10 @@ arc_buf_thaw(arc_buf_t *buf) } mutex_exit(&buf->b_hdr->b_freeze_lock); + +#ifdef illumos + arc_buf_unwatch(buf); +#endif /* illumos */ } void @@ -1112,6 +1175,7 @@ arc_buf_freeze(arc_buf_t *buf) buf->b_hdr->b_state == arc_anon); arc_cksum_compute(buf, B_FALSE); mutex_exit(hash_lock); + } static void @@ -1149,7 +1213,7 @@ add_reference(arc_buf_hdr_t *ab, kmutex_t *hash_lock, void *tag) ASSERT(list_link_active(&ab->b_arc_node)); list_remove(list, ab); if (GHOST_STATE(ab->b_state)) { - ASSERT3U(ab->b_datacnt, ==, 0); + ASSERT0(ab->b_datacnt); ASSERT3P(ab->b_buf, ==, NULL); delta = ab->b_size; } @@ -1496,21 +1560,22 @@ arc_buf_add_ref(arc_buf_t *buf, void* tag) * the buffer is placed on l2arc_free_on_write to be freed later. */ static void -arc_buf_data_free(arc_buf_hdr_t *hdr, void (*free_func)(void *, size_t), - void *data, size_t size) +arc_buf_data_free(arc_buf_t *buf, void (*free_func)(void *, size_t)) { + arc_buf_hdr_t *hdr = buf->b_hdr; + if (HDR_L2_WRITING(hdr)) { l2arc_data_free_t *df; df = kmem_alloc(sizeof (l2arc_data_free_t), KM_SLEEP); - df->l2df_data = data; - df->l2df_size = size; + df->l2df_data = buf->b_data; + df->l2df_size = hdr->b_size; df->l2df_func = free_func; mutex_enter(&l2arc_free_on_write_mtx); list_insert_head(l2arc_free_on_write, df); mutex_exit(&l2arc_free_on_write_mtx); ARCSTAT_BUMP(arcstat_l2_free_on_write); } else { - free_func(data, size); + free_func(buf->b_data, hdr->b_size); } } @@ -1526,16 +1591,17 @@ arc_buf_destroy(arc_buf_t *buf, boolean_t recycle, boolean_t all) arc_buf_contents_t type = buf->b_hdr->b_type; arc_cksum_verify(buf); +#ifdef illumos + arc_buf_unwatch(buf); +#endif /* illumos */ if (!recycle) { if (type == ARC_BUFC_METADATA) { - arc_buf_data_free(buf->b_hdr, zio_buf_free, - buf->b_data, size); + arc_buf_data_free(buf, zio_buf_free); arc_space_return(size, ARC_SPACE_DATA); } else { ASSERT(type == ARC_BUFC_DATA); - arc_buf_data_free(buf->b_hdr, - zio_data_buf_free, buf->b_data, size); + arc_buf_data_free(buf, zio_data_buf_free); ARCSTAT_INCR(arcstat_data_size, -size); atomic_add_64(&arc_size, -size); } @@ -1812,7 +1878,7 @@ evict_start: hash_lock = HDR_LOCK(ab); have_lock = MUTEX_HELD(hash_lock); if (have_lock || mutex_tryenter(hash_lock)) { - ASSERT3U(refcount_count(&ab->b_refcnt), ==, 0); + ASSERT0(refcount_count(&ab->b_refcnt)); ASSERT(ab->b_datacnt > 0); while (ab->b_buf) { arc_buf_t *buf = ab->b_buf; @@ -2712,7 +2778,7 @@ arc_access(arc_buf_hdr_t *buf, kmutex_t *hash_lock) * This is a prefetch access... * move this block back to the MRU state. */ - ASSERT3U(refcount_count(&buf->b_refcnt), ==, 0); + ASSERT0(refcount_count(&buf->b_refcnt)); new_state = arc_mru; } @@ -2794,13 +2860,18 @@ arc_read_done(zio_t *zio) callback_list = hdr->b_acb; ASSERT(callback_list != NULL); if (BP_SHOULD_BYTESWAP(zio->io_bp) && zio->io_error == 0) { + dmu_object_byteswap_t bswap = + DMU_OT_BYTESWAP(BP_GET_TYPE(zio->io_bp)); arc_byteswap_func_t *func = BP_GET_LEVEL(zio->io_bp) > 0 ? byteswap_uint64_array : - dmu_ot[BP_GET_TYPE(zio->io_bp)].ot_byteswap; + dmu_ot_byteswap[bswap].ob_func; func(buf->b_data, hdr->b_size); } arc_cksum_compute(buf, B_FALSE); +#ifdef illumos + arc_buf_watch(buf); +#endif /* illumos */ if (hash_lock && zio->io_error == 0 && hdr->b_state == arc_anon) { /* @@ -3051,7 +3122,7 @@ top: /* this block is in the ghost cache */ ASSERT(GHOST_STATE(hdr->b_state)); ASSERT(!HDR_IO_IN_PROGRESS(hdr)); - ASSERT3U(refcount_count(&hdr->b_refcnt), ==, 0); + ASSERT0(refcount_count(&hdr->b_refcnt)); ASSERT(hdr->b_buf == NULL); /* if this is a prefetch, we don't have a reference */ @@ -3365,6 +3436,9 @@ arc_release(arc_buf_t *buf, void *tag) } hdr->b_datacnt -= 1; arc_cksum_verify(buf); +#ifdef illumos + arc_buf_unwatch(buf); +#endif /* illumos */ mutex_exit(hash_lock); @@ -4745,7 +4819,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz) mutex_exit(&l2arc_buflist_mtx); if (pio == NULL) { - ASSERT3U(write_sz, ==, 0); + ASSERT0(write_sz); kmem_cache_free(hdr_cache, head); return (0); } diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c index 022921c66..1920da440 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c @@ -20,13 +20,61 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include #include #include #include +#include +#include + +/* + * Return an empty bpobj, preferably the empty dummy one (dp_empty_bpobj). + */ +uint64_t +bpobj_alloc_empty(objset_t *os, int blocksize, dmu_tx_t *tx) +{ + zfeature_info_t *empty_bpobj_feat = + &spa_feature_table[SPA_FEATURE_EMPTY_BPOBJ]; + spa_t *spa = dmu_objset_spa(os); + dsl_pool_t *dp = dmu_objset_pool(os); + + if (spa_feature_is_enabled(spa, empty_bpobj_feat)) { + if (!spa_feature_is_active(spa, empty_bpobj_feat)) { + ASSERT0(dp->dp_empty_bpobj); + dp->dp_empty_bpobj = + bpobj_alloc(os, SPA_MAXBLOCKSIZE, tx); + VERIFY(zap_add(os, + DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_EMPTY_BPOBJ, sizeof (uint64_t), 1, + &dp->dp_empty_bpobj, tx) == 0); + } + spa_feature_incr(spa, empty_bpobj_feat, tx); + ASSERT(dp->dp_empty_bpobj != 0); + return (dp->dp_empty_bpobj); + } else { + return (bpobj_alloc(os, blocksize, tx)); + } +} + +void +bpobj_decr_empty(objset_t *os, dmu_tx_t *tx) +{ + zfeature_info_t *empty_bpobj_feat = + &spa_feature_table[SPA_FEATURE_EMPTY_BPOBJ]; + dsl_pool_t *dp = dmu_objset_pool(os); + + spa_feature_decr(dmu_objset_spa(os), empty_bpobj_feat, tx); + if (!spa_feature_is_active(dmu_objset_spa(os), empty_bpobj_feat)) { + VERIFY3U(0, ==, zap_remove(dp->dp_meta_objset, + DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_EMPTY_BPOBJ, tx)); + VERIFY3U(0, ==, dmu_object_free(os, dp->dp_empty_bpobj, tx)); + dp->dp_empty_bpobj = 0; + } +} uint64_t bpobj_alloc(objset_t *os, int blocksize, dmu_tx_t *tx) @@ -53,6 +101,7 @@ bpobj_free(objset_t *os, uint64_t obj, dmu_tx_t *tx) int epb; dmu_buf_t *dbuf = NULL; + ASSERT(obj != dmu_objset_pool(os)->dp_empty_bpobj); VERIFY3U(0, ==, bpobj_open(&bpo, os, obj)); mutex_enter(&bpo.bpo_lock); @@ -320,6 +369,12 @@ bpobj_enqueue_subobj(bpobj_t *bpo, uint64_t subobj, dmu_tx_t *tx) ASSERT(bpo->bpo_havesubobj); ASSERT(bpo->bpo_havecomp); + ASSERT(bpo->bpo_object != dmu_objset_pool(bpo->bpo_os)->dp_empty_bpobj); + + if (subobj == dmu_objset_pool(bpo->bpo_os)->dp_empty_bpobj) { + bpobj_decr_empty(bpo->bpo_os, tx); + return; + } VERIFY3U(0, ==, bpobj_open(&subbpo, bpo->bpo_os, subobj)); VERIFY3U(0, ==, bpobj_space(&subbpo, &used, &comp, &uncomp)); @@ -388,6 +443,7 @@ bpobj_enqueue(bpobj_t *bpo, const blkptr_t *bp, dmu_tx_t *tx) blkptr_t *bparray; ASSERT(!BP_IS_HOLE(bp)); + ASSERT(bpo->bpo_object != dmu_objset_pool(bpo->bpo_os)->dp_empty_bpobj); /* We never need the fill count. */ stored_bp.blk_fill = 0; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c new file mode 100644 index 000000000..1a009cfe5 --- /dev/null +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c @@ -0,0 +1,225 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * A bptree is a queue of root block pointers from destroyed datasets. When a + * dataset is destroyed its root block pointer is put on the end of the pool's + * bptree queue so the dataset's blocks can be freed asynchronously by + * dsl_scan_sync. This allows the delete operation to finish without traversing + * all the dataset's blocks. + * + * Note that while bt_begin and bt_end are only ever incremented in this code + * they are effectively reset to 0 every time the entire bptree is freed because + * the bptree's object is destroyed and re-created. + */ + +struct bptree_args { + bptree_phys_t *ba_phys; /* data in bonus buffer, dirtied if freeing */ + boolean_t ba_free; /* true if freeing during traversal */ + + bptree_itor_t *ba_func; /* function to call for each blockpointer */ + void *ba_arg; /* caller supplied argument to ba_func */ + dmu_tx_t *ba_tx; /* caller supplied tx, NULL if not freeing */ +} bptree_args_t; + +uint64_t +bptree_alloc(objset_t *os, dmu_tx_t *tx) +{ + uint64_t obj; + dmu_buf_t *db; + bptree_phys_t *bt; + + obj = dmu_object_alloc(os, DMU_OTN_UINT64_METADATA, + SPA_MAXBLOCKSIZE, DMU_OTN_UINT64_METADATA, + sizeof (bptree_phys_t), tx); + + /* + * Bonus buffer contents are already initialized to 0, but for + * readability we make it explicit. + */ + VERIFY3U(0, ==, dmu_bonus_hold(os, obj, FTAG, &db)); + dmu_buf_will_dirty(db, tx); + bt = db->db_data; + bt->bt_begin = 0; + bt->bt_end = 0; + bt->bt_bytes = 0; + bt->bt_comp = 0; + bt->bt_uncomp = 0; + dmu_buf_rele(db, FTAG); + + return (obj); +} + +int +bptree_free(objset_t *os, uint64_t obj, dmu_tx_t *tx) +{ + dmu_buf_t *db; + bptree_phys_t *bt; + + VERIFY3U(0, ==, dmu_bonus_hold(os, obj, FTAG, &db)); + bt = db->db_data; + ASSERT3U(bt->bt_begin, ==, bt->bt_end); + ASSERT0(bt->bt_bytes); + ASSERT0(bt->bt_comp); + ASSERT0(bt->bt_uncomp); + dmu_buf_rele(db, FTAG); + + return (dmu_object_free(os, obj, tx)); +} + +void +bptree_add(objset_t *os, uint64_t obj, blkptr_t *bp, uint64_t birth_txg, + uint64_t bytes, uint64_t comp, uint64_t uncomp, dmu_tx_t *tx) +{ + dmu_buf_t *db; + bptree_phys_t *bt; + bptree_entry_phys_t bte; + + /* + * bptree objects are in the pool mos, therefore they can only be + * modified in syncing context. Furthermore, this is only modified + * by the sync thread, so no locking is necessary. + */ + ASSERT(dmu_tx_is_syncing(tx)); + + VERIFY3U(0, ==, dmu_bonus_hold(os, obj, FTAG, &db)); + bt = db->db_data; + + bte.be_birth_txg = birth_txg; + bte.be_bp = *bp; + bzero(&bte.be_zb, sizeof (bte.be_zb)); + dmu_write(os, obj, bt->bt_end * sizeof (bte), sizeof (bte), &bte, tx); + + dmu_buf_will_dirty(db, tx); + bt->bt_end++; + bt->bt_bytes += bytes; + bt->bt_comp += comp; + bt->bt_uncomp += uncomp; + dmu_buf_rele(db, FTAG); +} + +/* ARGSUSED */ +static int +bptree_visit_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp, arc_buf_t *pbuf, + const zbookmark_t *zb, const dnode_phys_t *dnp, void *arg) +{ + int err; + struct bptree_args *ba = arg; + + if (bp == NULL) + return (0); + + err = ba->ba_func(ba->ba_arg, bp, ba->ba_tx); + if (err == 0 && ba->ba_free) { + ba->ba_phys->bt_bytes -= bp_get_dsize_sync(spa, bp); + ba->ba_phys->bt_comp -= BP_GET_PSIZE(bp); + ba->ba_phys->bt_uncomp -= BP_GET_UCSIZE(bp); + } + return (err); +} + +int +bptree_iterate(objset_t *os, uint64_t obj, boolean_t free, bptree_itor_t func, + void *arg, dmu_tx_t *tx) +{ + int err; + uint64_t i; + dmu_buf_t *db; + struct bptree_args ba; + + ASSERT(!free || dmu_tx_is_syncing(tx)); + + err = dmu_bonus_hold(os, obj, FTAG, &db); + if (err != 0) + return (err); + + if (free) + dmu_buf_will_dirty(db, tx); + + ba.ba_phys = db->db_data; + ba.ba_free = free; + ba.ba_func = func; + ba.ba_arg = arg; + ba.ba_tx = tx; + + err = 0; + for (i = ba.ba_phys->bt_begin; i < ba.ba_phys->bt_end; i++) { + bptree_entry_phys_t bte; + + ASSERT(!free || i == ba.ba_phys->bt_begin); + + err = dmu_read(os, obj, i * sizeof (bte), sizeof (bte), + &bte, DMU_READ_NO_PREFETCH); + if (err != 0) + break; + + err = traverse_dataset_destroyed(os->os_spa, &bte.be_bp, + bte.be_birth_txg, &bte.be_zb, + TRAVERSE_PREFETCH_METADATA | TRAVERSE_POST, + bptree_visit_cb, &ba); + if (free) { + ASSERT(err == 0 || err == ERESTART); + if (err != 0) { + /* save bookmark for future resume */ + ASSERT3U(bte.be_zb.zb_objset, ==, + ZB_DESTROYED_OBJSET); + ASSERT0(bte.be_zb.zb_level); + dmu_write(os, obj, i * sizeof (bte), + sizeof (bte), &bte, tx); + break; + } else { + ba.ba_phys->bt_begin++; + (void) dmu_free_range(os, obj, + i * sizeof (bte), sizeof (bte), tx); + } + } + } + + ASSERT(!free || err != 0 || ba.ba_phys->bt_begin == ba.ba_phys->bt_end); + + /* if all blocks are free there should be no used space */ + if (ba.ba_phys->bt_begin == ba.ba_phys->bt_end) { + ASSERT0(ba.ba_phys->bt_bytes); + ASSERT0(ba.ba_phys->bt_comp); + ASSERT0(ba.ba_phys->bt_uncomp); + } + + dmu_buf_rele(db, FTAG); + + return (err); +} diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c index dc9f4823b..571a5a3b6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c @@ -21,6 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -227,7 +228,7 @@ dbuf_is_metadata(dmu_buf_impl_t *db) boolean_t is_metadata; DB_DNODE_ENTER(db); - is_metadata = dmu_ot[DB_DNODE(db)->dn_type].ot_metadata; + is_metadata = DMU_OT_IS_METADATA(DB_DNODE(db)->dn_type); DB_DNODE_EXIT(db); return (is_metadata); @@ -327,7 +328,7 @@ dbuf_verify(dmu_buf_impl_t *db) } else if (db->db_blkid == DMU_SPILL_BLKID) { ASSERT(dn != NULL); ASSERT3U(db->db.db_size, >=, dn->dn_bonuslen); - ASSERT3U(db->db.db_offset, ==, 0); + ASSERT0(db->db.db_offset); } else { ASSERT3U(db->db.db_offset, ==, db->db_blkid * db->db.db_size); } @@ -2307,7 +2308,7 @@ dbuf_sync_leaf(dbuf_dirty_record_t *dr, dmu_tx_t *tx) dbuf_dirty_record_t **drp; ASSERT(*datap != NULL); - ASSERT3U(db->db_level, ==, 0); + ASSERT0(db->db_level); ASSERT3U(dn->dn_phys->dn_bonuslen, <=, DN_MAX_BONUSLEN); bcopy(*datap, DN_BONUS(dn->dn_phys), dn->dn_phys->dn_bonuslen); DB_DNODE_EXIT(db); @@ -2510,7 +2511,7 @@ dbuf_write_done(zio_t *zio, arc_buf_t *buf, void *vdb) uint64_t txg = zio->io_txg; dbuf_dirty_record_t **drp, *dr; - ASSERT3U(zio->io_error, ==, 0); + ASSERT0(zio->io_error); ASSERT(db->db_blkptr == bp); if (zio->io_flags & ZIO_FLAG_IO_REWRITE) { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c index 0edf62e89..ef3d0f446 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c @@ -21,6 +21,7 @@ /* * Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -1067,11 +1068,9 @@ ddt_sync_table(ddt_t *ddt, dmu_tx_t *tx, uint64_t txg) ASSERT(spa->spa_uberblock.ub_version >= SPA_VERSION_DEDUP); if (spa->spa_ddt_stat_object == 0) { - spa->spa_ddt_stat_object = zap_create(ddt->ddt_os, - DMU_OT_DDT_STATS, DMU_OT_NONE, 0, tx); - VERIFY(zap_add(ddt->ddt_os, DMU_POOL_DIRECTORY_OBJECT, - DMU_POOL_DDT_STATS, sizeof (uint64_t), 1, - &spa->spa_ddt_stat_object, tx) == 0); + spa->spa_ddt_stat_object = zap_create_link(ddt->ddt_os, + DMU_OT_DDT_STATS, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_DDT_STATS, tx); } while ((dde = avl_destroy_nodes(&ddt->ddt_tree, &cookie)) != NULL) { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c index f7d471fa3..a4c63f36e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -45,60 +46,73 @@ #endif const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = { - { byteswap_uint8_array, TRUE, "unallocated" }, - { zap_byteswap, TRUE, "object directory" }, - { byteswap_uint64_array, TRUE, "object array" }, - { byteswap_uint8_array, TRUE, "packed nvlist" }, - { byteswap_uint64_array, TRUE, "packed nvlist size" }, - { byteswap_uint64_array, TRUE, "bpobj" }, - { byteswap_uint64_array, TRUE, "bpobj header" }, - { byteswap_uint64_array, TRUE, "SPA space map header" }, - { byteswap_uint64_array, TRUE, "SPA space map" }, - { byteswap_uint64_array, TRUE, "ZIL intent log" }, - { dnode_buf_byteswap, TRUE, "DMU dnode" }, - { dmu_objset_byteswap, TRUE, "DMU objset" }, - { byteswap_uint64_array, TRUE, "DSL directory" }, - { zap_byteswap, TRUE, "DSL directory child map"}, - { zap_byteswap, TRUE, "DSL dataset snap map" }, - { zap_byteswap, TRUE, "DSL props" }, - { byteswap_uint64_array, TRUE, "DSL dataset" }, - { zfs_znode_byteswap, TRUE, "ZFS znode" }, - { zfs_oldacl_byteswap, TRUE, "ZFS V0 ACL" }, - { byteswap_uint8_array, FALSE, "ZFS plain file" }, - { zap_byteswap, TRUE, "ZFS directory" }, - { zap_byteswap, TRUE, "ZFS master node" }, - { zap_byteswap, TRUE, "ZFS delete queue" }, - { byteswap_uint8_array, FALSE, "zvol object" }, - { zap_byteswap, TRUE, "zvol prop" }, - { byteswap_uint8_array, FALSE, "other uint8[]" }, - { byteswap_uint64_array, FALSE, "other uint64[]" }, - { zap_byteswap, TRUE, "other ZAP" }, - { zap_byteswap, TRUE, "persistent error log" }, - { byteswap_uint8_array, TRUE, "SPA history" }, - { byteswap_uint64_array, TRUE, "SPA history offsets" }, - { zap_byteswap, TRUE, "Pool properties" }, - { zap_byteswap, TRUE, "DSL permissions" }, - { zfs_acl_byteswap, TRUE, "ZFS ACL" }, - { byteswap_uint8_array, TRUE, "ZFS SYSACL" }, - { byteswap_uint8_array, TRUE, "FUID table" }, - { byteswap_uint64_array, TRUE, "FUID table size" }, - { zap_byteswap, TRUE, "DSL dataset next clones"}, - { zap_byteswap, TRUE, "scan work queue" }, - { zap_byteswap, TRUE, "ZFS user/group used" }, - { zap_byteswap, TRUE, "ZFS user/group quota" }, - { zap_byteswap, TRUE, "snapshot refcount tags"}, - { zap_byteswap, TRUE, "DDT ZAP algorithm" }, - { zap_byteswap, TRUE, "DDT statistics" }, - { byteswap_uint8_array, TRUE, "System attributes" }, - { zap_byteswap, TRUE, "SA master node" }, - { zap_byteswap, TRUE, "SA attr registration" }, - { zap_byteswap, TRUE, "SA attr layouts" }, - { zap_byteswap, TRUE, "scan translations" }, - { byteswap_uint8_array, FALSE, "deduplicated block" }, - { zap_byteswap, TRUE, "DSL deadlist map" }, - { byteswap_uint64_array, TRUE, "DSL deadlist map hdr" }, - { zap_byteswap, TRUE, "DSL dir clones" }, - { byteswap_uint64_array, TRUE, "bpobj subobj" }, + { DMU_BSWAP_UINT8, TRUE, "unallocated" }, + { DMU_BSWAP_ZAP, TRUE, "object directory" }, + { DMU_BSWAP_UINT64, TRUE, "object array" }, + { DMU_BSWAP_UINT8, TRUE, "packed nvlist" }, + { DMU_BSWAP_UINT64, TRUE, "packed nvlist size" }, + { DMU_BSWAP_UINT64, TRUE, "bpobj" }, + { DMU_BSWAP_UINT64, TRUE, "bpobj header" }, + { DMU_BSWAP_UINT64, TRUE, "SPA space map header" }, + { DMU_BSWAP_UINT64, TRUE, "SPA space map" }, + { DMU_BSWAP_UINT64, TRUE, "ZIL intent log" }, + { DMU_BSWAP_DNODE, TRUE, "DMU dnode" }, + { DMU_BSWAP_OBJSET, TRUE, "DMU objset" }, + { DMU_BSWAP_UINT64, TRUE, "DSL directory" }, + { DMU_BSWAP_ZAP, TRUE, "DSL directory child map"}, + { DMU_BSWAP_ZAP, TRUE, "DSL dataset snap map" }, + { DMU_BSWAP_ZAP, TRUE, "DSL props" }, + { DMU_BSWAP_UINT64, TRUE, "DSL dataset" }, + { DMU_BSWAP_ZNODE, TRUE, "ZFS znode" }, + { DMU_BSWAP_OLDACL, TRUE, "ZFS V0 ACL" }, + { DMU_BSWAP_UINT8, FALSE, "ZFS plain file" }, + { DMU_BSWAP_ZAP, TRUE, "ZFS directory" }, + { DMU_BSWAP_ZAP, TRUE, "ZFS master node" }, + { DMU_BSWAP_ZAP, TRUE, "ZFS delete queue" }, + { DMU_BSWAP_UINT8, FALSE, "zvol object" }, + { DMU_BSWAP_ZAP, TRUE, "zvol prop" }, + { DMU_BSWAP_UINT8, FALSE, "other uint8[]" }, + { DMU_BSWAP_UINT64, FALSE, "other uint64[]" }, + { DMU_BSWAP_ZAP, TRUE, "other ZAP" }, + { DMU_BSWAP_ZAP, TRUE, "persistent error log" }, + { DMU_BSWAP_UINT8, TRUE, "SPA history" }, + { DMU_BSWAP_UINT64, TRUE, "SPA history offsets" }, + { DMU_BSWAP_ZAP, TRUE, "Pool properties" }, + { DMU_BSWAP_ZAP, TRUE, "DSL permissions" }, + { DMU_BSWAP_ACL, TRUE, "ZFS ACL" }, + { DMU_BSWAP_UINT8, TRUE, "ZFS SYSACL" }, + { DMU_BSWAP_UINT8, TRUE, "FUID table" }, + { DMU_BSWAP_UINT64, TRUE, "FUID table size" }, + { DMU_BSWAP_ZAP, TRUE, "DSL dataset next clones"}, + { DMU_BSWAP_ZAP, TRUE, "scan work queue" }, + { DMU_BSWAP_ZAP, TRUE, "ZFS user/group used" }, + { DMU_BSWAP_ZAP, TRUE, "ZFS user/group quota" }, + { DMU_BSWAP_ZAP, TRUE, "snapshot refcount tags"}, + { DMU_BSWAP_ZAP, TRUE, "DDT ZAP algorithm" }, + { DMU_BSWAP_ZAP, TRUE, "DDT statistics" }, + { DMU_BSWAP_UINT8, TRUE, "System attributes" }, + { DMU_BSWAP_ZAP, TRUE, "SA master node" }, + { DMU_BSWAP_ZAP, TRUE, "SA attr registration" }, + { DMU_BSWAP_ZAP, TRUE, "SA attr layouts" }, + { DMU_BSWAP_ZAP, TRUE, "scan translations" }, + { DMU_BSWAP_UINT8, FALSE, "deduplicated block" }, + { DMU_BSWAP_ZAP, TRUE, "DSL deadlist map" }, + { DMU_BSWAP_UINT64, TRUE, "DSL deadlist map hdr" }, + { DMU_BSWAP_ZAP, TRUE, "DSL dir clones" }, + { DMU_BSWAP_UINT64, TRUE, "bpobj subobj" } +}; + +const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS] = { + { byteswap_uint8_array, "uint8" }, + { byteswap_uint16_array, "uint16" }, + { byteswap_uint32_array, "uint32" }, + { byteswap_uint64_array, "uint64" }, + { zap_byteswap, "zap" }, + { dnode_buf_byteswap, "dnode" }, + { dmu_objset_byteswap, "objset" }, + { zfs_znode_byteswap, "znode" }, + { zfs_oldacl_byteswap, "oldacl" }, + { zfs_acl_byteswap, "acl" } }; int @@ -175,7 +189,7 @@ dmu_set_bonustype(dmu_buf_t *db_fake, dmu_object_type_t type, dmu_tx_t *tx) DB_DNODE_ENTER(db); dn = DB_DNODE(db); - if (type > DMU_OT_NUMTYPES) { + if (!DMU_OT_IS_VALID(type)) { error = EINVAL; } else if (dn->dn_bonus != db) { error = EINVAL; @@ -1513,7 +1527,7 @@ void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp, zio_prop_t *zp) { dmu_object_type_t type = dn ? dn->dn_type : DMU_OT_OBJSET; - boolean_t ismd = (level > 0 || dmu_ot[type].ot_metadata || + boolean_t ismd = (level > 0 || DMU_OT_IS_METADATA(type) || (wp & WP_SPILL)); enum zio_checksum checksum = os->os_checksum; enum zio_compress compress = os->os_compress; @@ -1755,15 +1769,15 @@ dmu_init(void) dnode_init(); dbuf_init(); zfetch_init(); - arc_init(); l2arc_init(); + arc_init(); } void dmu_fini(void) { - l2arc_fini(); arc_fini(); + l2arc_fini(); zfetch_fini(); dbuf_fini(); dnode_fini(); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c index 09d13db71..f4f4e1a1e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c @@ -1187,15 +1187,6 @@ dmu_objset_is_dirty(objset_t *os, uint64_t txg) !list_is_empty(&os->os_free_dnodes[txg & TXG_MASK])); } -boolean_t -dmu_objset_is_dirty_anywhere(objset_t *os) -{ - for (int t = 0; t < TXG_SIZE; t++) - if (dmu_objset_is_dirty(os, t)) - return (B_TRUE); - return (B_FALSE); -} - static objset_used_cb_t *used_cbs[DMU_OST_NUMTYPES]; void diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c index efb37a105..1e57476d3 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c @@ -20,11 +20,8 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. - */ -/* * Copyright 2011 Nexenta Systems, Inc. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2012, Joyent, Inc. All rights reserved. * Copyright (c) 2012, Martin Matuska . All rights reserved. */ @@ -62,7 +59,7 @@ dump_bytes(dmu_sendarg_t *dsp, void *buf, int len) dsl_dataset_t *ds = dsp->dsa_os->os_dsl_dataset; struct uio auio; struct iovec aiov; - ASSERT3U(len % 8, ==, 0); + ASSERT0(len % 8); fletcher_4_incremental_native(buf, len, &dsp->dsa_zc); aiov.iov_base = buf; @@ -1007,7 +1004,7 @@ restore_read(struct restorearg *ra, int len) int done = 0; /* some things will require 8-byte alignment, so everything must */ - ASSERT3U(len % 8, ==, 0); + ASSERT0(len % 8); while (done < len) { ssize_t resid; @@ -1120,8 +1117,8 @@ restore_object(struct restorearg *ra, objset_t *os, struct drr_object *drro) void *data = NULL; if (drro->drr_type == DMU_OT_NONE || - drro->drr_type >= DMU_OT_NUMTYPES || - drro->drr_bonustype >= DMU_OT_NUMTYPES || + !DMU_OT_IS_VALID(drro->drr_type) || + !DMU_OT_IS_VALID(drro->drr_bonustype) || drro->drr_checksumtype >= ZIO_CHECKSUM_FUNCTIONS || drro->drr_compress >= ZIO_COMPRESS_FUNCTIONS || P2PHASE(drro->drr_blksz, SPA_MINBLOCKSIZE) || @@ -1186,7 +1183,9 @@ restore_object(struct restorearg *ra, objset_t *os, struct drr_object *drro) ASSERT3U(db->db_size, >=, drro->drr_bonuslen); bcopy(data, db->db_data, drro->drr_bonuslen); if (ra->byteswap) { - dmu_ot[drro->drr_bonustype].ot_byteswap(db->db_data, + dmu_object_byteswap_t byteswap = + DMU_OT_BYTESWAP(drro->drr_bonustype); + dmu_ot_byteswap[byteswap].ob_func(db->db_data, drro->drr_bonuslen); } dmu_buf_rele(db, FTAG); @@ -1229,7 +1228,7 @@ restore_write(struct restorearg *ra, objset_t *os, int err; if (drrw->drr_offset + drrw->drr_length < drrw->drr_offset || - drrw->drr_type >= DMU_OT_NUMTYPES) + !DMU_OT_IS_VALID(drrw->drr_type)) return (EINVAL); data = restore_read(ra, drrw->drr_length); @@ -1248,8 +1247,11 @@ restore_write(struct restorearg *ra, objset_t *os, dmu_tx_abort(tx); return (err); } - if (ra->byteswap) - dmu_ot[drrw->drr_type].ot_byteswap(data, drrw->drr_length); + if (ra->byteswap) { + dmu_object_byteswap_t byteswap = + DMU_OT_BYTESWAP(drrw->drr_type); + dmu_ot_byteswap[byteswap].ob_func(data, drrw->drr_length); + } dmu_write(os, drrw->drr_object, drrw->drr_offset, drrw->drr_length, data, tx); dmu_tx_commit(tx); @@ -1647,13 +1649,6 @@ dmu_recv_existing_end(dmu_recv_cookie_t *drc) dsl_dataset_t *ds = drc->drc_logical_ds; int err, myerr; - /* - * XXX hack; seems the ds is still dirty and dsl_pool_zil_clean() - * expects it to have a ds_user_ptr (and zil), but clone_swap() - * can close it. - */ - txg_wait_synced(ds->ds_dir->dd_pool, 0); - if (dsl_dataset_tryown(ds, FALSE, dmu_recv_tag)) { err = dsl_dataset_clone_swap(drc->drc_real_ds, ds, drc->drc_force); @@ -1684,7 +1679,7 @@ out: (void) add_ds_to_guidmap(drc->drc_guid_to_ds_map, ds); dsl_dataset_disown(ds, dmu_recv_tag); myerr = dsl_dataset_destroy(drc->drc_real_ds, dmu_recv_tag, B_FALSE); - ASSERT3U(myerr, ==, 0); + ASSERT0(myerr); return (err); } diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c index 023f90e12..34f19cdcb 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -53,6 +54,7 @@ typedef struct traverse_data { uint64_t td_objset; blkptr_t *td_rootbp; uint64_t td_min_txg; + zbookmark_t *td_resume; int td_flags; prefetch_data_t *td_pfd; blkptr_cb_t *td_func; @@ -61,6 +63,8 @@ typedef struct traverse_data { static int traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp, arc_buf_t *buf, uint64_t objset, uint64_t object); +static void prefetch_dnode_metadata(traverse_data_t *td, const dnode_phys_t *, + arc_buf_t *buf, uint64_t objset, uint64_t object); static int traverse_zil_block(zilog_t *zilog, blkptr_t *bp, void *arg, uint64_t claim_txg) @@ -128,17 +132,102 @@ traverse_zil(traverse_data_t *td, zil_header_t *zh) zil_free(zilog); } +typedef enum resume_skip { + RESUME_SKIP_ALL, + RESUME_SKIP_NONE, + RESUME_SKIP_CHILDREN +} resume_skip_t; + +/* + * Returns RESUME_SKIP_ALL if td indicates that we are resuming a traversal and + * the block indicated by zb does not need to be visited at all. Returns + * RESUME_SKIP_CHILDREN if we are resuming a post traversal and we reach the + * resume point. This indicates that this block should be visited but not its + * children (since they must have been visited in a previous traversal). + * Otherwise returns RESUME_SKIP_NONE. + */ +static resume_skip_t +resume_skip_check(traverse_data_t *td, const dnode_phys_t *dnp, + const zbookmark_t *zb) +{ + if (td->td_resume != NULL && !ZB_IS_ZERO(td->td_resume)) { + /* + * If we already visited this bp & everything below, + * don't bother doing it again. + */ + if (zbookmark_is_before(dnp, zb, td->td_resume)) + return (RESUME_SKIP_ALL); + + /* + * If we found the block we're trying to resume from, zero + * the bookmark out to indicate that we have resumed. + */ + ASSERT3U(zb->zb_object, <=, td->td_resume->zb_object); + if (bcmp(zb, td->td_resume, sizeof (*zb)) == 0) { + bzero(td->td_resume, sizeof (*zb)); + if (td->td_flags & TRAVERSE_POST) + return (RESUME_SKIP_CHILDREN); + } + } + return (RESUME_SKIP_NONE); +} + +static void +traverse_pause(traverse_data_t *td, const zbookmark_t *zb) +{ + ASSERT(td->td_resume != NULL); + ASSERT0(zb->zb_level); + bcopy(zb, td->td_resume, sizeof (*td->td_resume)); +} + +static void +traverse_prefetch_metadata(traverse_data_t *td, + arc_buf_t *pbuf, const blkptr_t *bp, const zbookmark_t *zb) +{ + uint32_t flags = ARC_NOWAIT | ARC_PREFETCH; + + if (!(td->td_flags & TRAVERSE_PREFETCH_METADATA)) + return; + /* + * If we are in the process of resuming, don't prefetch, because + * some children will not be needed (and in fact may have already + * been freed). + */ + if (td->td_resume != NULL && !ZB_IS_ZERO(td->td_resume)) + return; + if (BP_IS_HOLE(bp) || bp->blk_birth <= td->td_min_txg) + return; + if (BP_GET_LEVEL(bp) == 0 && BP_GET_TYPE(bp) != DMU_OT_DNODE) + return; + + (void) arc_read(NULL, td->td_spa, bp, + pbuf, NULL, NULL, ZIO_PRIORITY_ASYNC_READ, + ZIO_FLAG_CANFAIL, &flags, zb); +} + static int traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, - arc_buf_t *pbuf, blkptr_t *bp, const zbookmark_t *zb) + arc_buf_t *pbuf, const blkptr_t *bp, const zbookmark_t *zb) { zbookmark_t czb; int err = 0, lasterr = 0; arc_buf_t *buf = NULL; prefetch_data_t *pd = td->td_pfd; boolean_t hard = td->td_flags & TRAVERSE_HARD; + boolean_t pause = B_FALSE; + + switch (resume_skip_check(td, dnp, zb)) { + case RESUME_SKIP_ALL: + return (0); + case RESUME_SKIP_CHILDREN: + goto post; + case RESUME_SKIP_NONE: + break; + default: + ASSERT(0); + } - if (bp->blk_birth == 0) { + if (BP_IS_HOLE(bp)) { err = td->td_func(td->td_spa, NULL, NULL, pbuf, zb, dnp, td->td_arg); return (err); @@ -164,8 +253,10 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, td->td_arg); if (err == TRAVERSE_VISIT_NO_CHILDREN) return (0); - if (err) - return (err); + if (err == ERESTART) + pause = B_TRUE; /* handle pausing at a common point */ + if (err != 0) + goto post; } if (BP_GET_LEVEL(bp) > 0) { @@ -179,14 +270,21 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb); if (err) return (err); + cbp = buf->b_data; + + for (i = 0; i < epb; i++) { + SET_BOOKMARK(&czb, zb->zb_objset, zb->zb_object, + zb->zb_level - 1, + zb->zb_blkid * epb + i); + traverse_prefetch_metadata(td, buf, &cbp[i], &czb); + } /* recursively visitbp() blocks below this */ - cbp = buf->b_data; - for (i = 0; i < epb; i++, cbp++) { + for (i = 0; i < epb; i++) { SET_BOOKMARK(&czb, zb->zb_objset, zb->zb_object, zb->zb_level - 1, zb->zb_blkid * epb + i); - err = traverse_visitbp(td, dnp, buf, cbp, &czb); + err = traverse_visitbp(td, dnp, buf, &cbp[i], &czb); if (err) { if (!hard) break; @@ -203,11 +301,16 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb); if (err) return (err); + dnp = buf->b_data; + + for (i = 0; i < epb; i++) { + prefetch_dnode_metadata(td, &dnp[i], buf, zb->zb_objset, + zb->zb_blkid * epb + i); + } /* recursively visitbp() blocks below this */ - dnp = buf->b_data; - for (i = 0; i < epb; i++, dnp++) { - err = traverse_dnode(td, dnp, buf, zb->zb_objset, + for (i = 0; i < epb; i++) { + err = traverse_dnode(td, &dnp[i], buf, zb->zb_objset, zb->zb_blkid * epb + i); if (err) { if (!hard) @@ -228,6 +331,15 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, osp = buf->b_data; dnp = &osp->os_meta_dnode; + prefetch_dnode_metadata(td, dnp, buf, zb->zb_objset, + DMU_META_DNODE_OBJECT); + if (arc_buf_size(buf) >= sizeof (objset_phys_t)) { + prefetch_dnode_metadata(td, &osp->os_userused_dnode, + buf, zb->zb_objset, DMU_USERUSED_OBJECT); + prefetch_dnode_metadata(td, &osp->os_groupused_dnode, + buf, zb->zb_objset, DMU_USERUSED_OBJECT); + } + err = traverse_dnode(td, dnp, buf, zb->zb_objset, DMU_META_DNODE_OBJECT); if (err && hard) { @@ -253,14 +365,41 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp, if (buf) (void) arc_buf_remove_ref(buf, &buf); +post: if (err == 0 && lasterr == 0 && (td->td_flags & TRAVERSE_POST)) { err = td->td_func(td->td_spa, NULL, bp, pbuf, zb, dnp, td->td_arg); + if (err == ERESTART) + pause = B_TRUE; + } + + if (pause && td->td_resume != NULL) { + ASSERT3U(err, ==, ERESTART); + ASSERT(!hard); + traverse_pause(td, zb); } return (err != 0 ? err : lasterr); } +static void +prefetch_dnode_metadata(traverse_data_t *td, const dnode_phys_t *dnp, + arc_buf_t *buf, uint64_t objset, uint64_t object) +{ + int j; + zbookmark_t czb; + + for (j = 0; j < dnp->dn_nblkptr; j++) { + SET_BOOKMARK(&czb, objset, object, dnp->dn_nlevels - 1, j); + traverse_prefetch_metadata(td, buf, &dnp->dn_blkptr[j], &czb); + } + + if (dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) { + SET_BOOKMARK(&czb, objset, object, 0, DMU_SPILL_BLKID); + traverse_prefetch_metadata(td, buf, &dnp->dn_spill, &czb); + } +} + static int traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp, arc_buf_t *buf, uint64_t objset, uint64_t object) @@ -271,8 +410,7 @@ traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp, for (j = 0; j < dnp->dn_nblkptr; j++) { SET_BOOKMARK(&czb, objset, object, dnp->dn_nlevels - 1, j); - err = traverse_visitbp(td, dnp, buf, - (blkptr_t *)&dnp->dn_blkptr[j], &czb); + err = traverse_visitbp(td, dnp, buf, &dnp->dn_blkptr[j], &czb); if (err) { if (!hard) break; @@ -281,10 +419,8 @@ traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp, } if (dnp->dn_flags & DNODE_FLAG_SPILL_BLKPTR) { - SET_BOOKMARK(&czb, objset, - object, 0, DMU_SPILL_BLKID); - err = traverse_visitbp(td, dnp, buf, - (blkptr_t *)&dnp->dn_spill, &czb); + SET_BOOKMARK(&czb, objset, object, 0, DMU_SPILL_BLKID); + err = traverse_visitbp(td, dnp, buf, &dnp->dn_spill, &czb); if (err) { if (!hard) return (err); @@ -353,18 +489,29 @@ traverse_prefetch_thread(void *arg) * in syncing context). */ static int -traverse_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *rootbp, - uint64_t txg_start, int flags, blkptr_cb_t func, void *arg) +traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp, + uint64_t txg_start, zbookmark_t *resume, int flags, + blkptr_cb_t func, void *arg) { traverse_data_t td; prefetch_data_t pd = { 0 }; zbookmark_t czb; int err; + ASSERT(ds == NULL || objset == ds->ds_object); + ASSERT(!(flags & TRAVERSE_PRE) || !(flags & TRAVERSE_POST)); + + /* + * The data prefetching mechanism (the prefetch thread) is incompatible + * with resuming from a bookmark. + */ + ASSERT(resume == NULL || !(flags & TRAVERSE_PREFETCH_DATA)); + td.td_spa = spa; - td.td_objset = ds ? ds->ds_object : 0; + td.td_objset = objset; td.td_rootbp = rootbp; td.td_min_txg = txg_start; + td.td_resume = resume; td.td_func = func; td.td_arg = arg; td.td_pfd = &pd; @@ -386,7 +533,7 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *rootbp, traverse_zil(&td, &os->os_zil_header); } - if (!(flags & TRAVERSE_PREFETCH) || + if (!(flags & TRAVERSE_PREFETCH_DATA) || 0 == taskq_dispatch(system_taskq, traverse_prefetch_thread, &td, TQ_NOQUEUE)) pd.pd_exited = B_TRUE; @@ -416,8 +563,17 @@ int traverse_dataset(dsl_dataset_t *ds, uint64_t txg_start, int flags, blkptr_cb_t func, void *arg) { - return (traverse_impl(ds->ds_dir->dd_pool->dp_spa, ds, - &ds->ds_phys->ds_bp, txg_start, flags, func, arg)); + return (traverse_impl(ds->ds_dir->dd_pool->dp_spa, ds, ds->ds_object, + &ds->ds_phys->ds_bp, txg_start, NULL, flags, func, arg)); +} + +int +traverse_dataset_destroyed(spa_t *spa, blkptr_t *blkptr, + uint64_t txg_start, zbookmark_t *resume, int flags, + blkptr_cb_t func, void *arg) +{ + return (traverse_impl(spa, NULL, ZB_DESTROYED_OBJSET, + blkptr, txg_start, resume, flags, func, arg)); } /* @@ -434,8 +590,8 @@ traverse_pool(spa_t *spa, uint64_t txg_start, int flags, boolean_t hard = (flags & TRAVERSE_HARD); /* visit the MOS */ - err = traverse_impl(spa, NULL, spa_get_rootblkptr(spa), - txg_start, flags, func, arg); + err = traverse_impl(spa, NULL, 0, spa_get_rootblkptr(spa), + txg_start, NULL, flags, func, arg); if (err) return (err); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c index b4579e278..91336a030 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c @@ -20,9 +20,8 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - */ -/* * Copyright 2011 Nexenta Systems, Inc. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -430,6 +429,7 @@ dmu_tx_count_free(dmu_tx_hold_t *txh, uint64_t off, uint64_t len) dsl_dataset_t *ds = dn->dn_objset->os_dsl_dataset; spa_t *spa = txh->txh_tx->tx_pool->dp_spa; int epbs; + uint64_t l0span = 0, nl1blks = 0; if (dn->dn_nlevels == 0) return; @@ -462,6 +462,7 @@ dmu_tx_count_free(dmu_tx_hold_t *txh, uint64_t off, uint64_t len) nblks = dn->dn_maxblkid - blkid; } + l0span = nblks; /* save for later use to calc level > 1 overhead */ if (dn->dn_nlevels == 1) { int i; for (i = 0; i < nblks; i++) { @@ -474,24 +475,10 @@ dmu_tx_count_free(dmu_tx_hold_t *txh, uint64_t off, uint64_t len) } unref += BP_GET_ASIZE(bp); } + nl1blks = 1; nblks = 0; } - /* - * Add in memory requirements of higher-level indirects. - * This assumes a worst-possible scenario for dn_nlevels. - */ - { - uint64_t blkcnt = 1 + ((nblks >> epbs) >> epbs); - int level = (dn->dn_nlevels > 1) ? 2 : 1; - - while (level++ < DN_MAX_LEVELS) { - txh->txh_memory_tohold += blkcnt << dn->dn_indblkshift; - blkcnt = 1 + (blkcnt >> epbs); - } - ASSERT(blkcnt <= dn->dn_nblkptr); - } - lastblk = blkid + nblks - 1; while (nblks) { dmu_buf_impl_t *dbuf; @@ -562,11 +549,35 @@ dmu_tx_count_free(dmu_tx_hold_t *txh, uint64_t off, uint64_t len) } dbuf_rele(dbuf, FTAG); + ++nl1blks; blkid += tochk; nblks -= tochk; } rw_exit(&dn->dn_struct_rwlock); + /* + * Add in memory requirements of higher-level indirects. + * This assumes a worst-possible scenario for dn_nlevels and a + * worst-possible distribution of l1-blocks over the region to free. + */ + { + uint64_t blkcnt = 1 + ((l0span >> epbs) >> epbs); + int level = 2; + /* + * Here we don't use DN_MAX_LEVEL, but calculate it with the + * given datablkshift and indblkshift. This makes the + * difference between 19 and 8 on large files. + */ + int maxlevel = 2 + (DN_MAX_OFFSET_SHIFT - dn->dn_datablkshift) / + (dn->dn_indblkshift - SPA_BLKPTRSHIFT); + + while (level++ < maxlevel) { + txh->txh_memory_tohold += MAX(MIN(blkcnt, nl1blks), 1) + << dn->dn_indblkshift; + blkcnt = 1 + (blkcnt >> epbs); + } + } + /* account for new level 1 indirect blocks that might show up */ if (skipped > 0) { txh->txh_fudge += skipped << dn->dn_indblkshift; @@ -676,7 +687,7 @@ dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name) return; } - ASSERT3P(dmu_ot[dn->dn_type].ot_byteswap, ==, zap_byteswap); + ASSERT3P(DMU_OT_BYTESWAP(dn->dn_type), ==, DMU_BSWAP_ZAP); if (dn->dn_maxblkid == 0 && !add) { blkptr_t *bp; @@ -900,7 +911,7 @@ dmu_tx_try_assign(dmu_tx_t *tx, uint64_t txg_how) uint64_t memory, asize, fsize, usize; uint64_t towrite, tofree, tooverwrite, tounref, tohold, fudge; - ASSERT3U(tx->tx_txg, ==, 0); + ASSERT0(tx->tx_txg); if (tx->tx_err) return (tx->tx_err); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c index ca2b69ab1..0fd01010f 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -139,32 +140,32 @@ dnode_dest(void *arg, void *unused) ASSERT(!list_link_active(&dn->dn_dirty_link[i])); avl_destroy(&dn->dn_ranges[i]); list_destroy(&dn->dn_dirty_records[i]); - ASSERT3U(dn->dn_next_nblkptr[i], ==, 0); - ASSERT3U(dn->dn_next_nlevels[i], ==, 0); - ASSERT3U(dn->dn_next_indblkshift[i], ==, 0); - ASSERT3U(dn->dn_next_bonustype[i], ==, 0); - ASSERT3U(dn->dn_rm_spillblk[i], ==, 0); - ASSERT3U(dn->dn_next_bonuslen[i], ==, 0); - ASSERT3U(dn->dn_next_blksz[i], ==, 0); + ASSERT0(dn->dn_next_nblkptr[i]); + ASSERT0(dn->dn_next_nlevels[i]); + ASSERT0(dn->dn_next_indblkshift[i]); + ASSERT0(dn->dn_next_bonustype[i]); + ASSERT0(dn->dn_rm_spillblk[i]); + ASSERT0(dn->dn_next_bonuslen[i]); + ASSERT0(dn->dn_next_blksz[i]); } - ASSERT3U(dn->dn_allocated_txg, ==, 0); - ASSERT3U(dn->dn_free_txg, ==, 0); - ASSERT3U(dn->dn_assigned_txg, ==, 0); - ASSERT3U(dn->dn_dirtyctx, ==, 0); + ASSERT0(dn->dn_allocated_txg); + ASSERT0(dn->dn_free_txg); + ASSERT0(dn->dn_assigned_txg); + ASSERT0(dn->dn_dirtyctx); ASSERT3P(dn->dn_dirtyctx_firstset, ==, NULL); ASSERT3P(dn->dn_bonus, ==, NULL); ASSERT(!dn->dn_have_spill); ASSERT3P(dn->dn_zio, ==, NULL); - ASSERT3U(dn->dn_oldused, ==, 0); - ASSERT3U(dn->dn_oldflags, ==, 0); - ASSERT3U(dn->dn_olduid, ==, 0); - ASSERT3U(dn->dn_oldgid, ==, 0); - ASSERT3U(dn->dn_newuid, ==, 0); - ASSERT3U(dn->dn_newgid, ==, 0); - ASSERT3U(dn->dn_id_flags, ==, 0); - - ASSERT3U(dn->dn_dbufs_count, ==, 0); + ASSERT0(dn->dn_oldused); + ASSERT0(dn->dn_oldflags); + ASSERT0(dn->dn_olduid); + ASSERT0(dn->dn_oldgid); + ASSERT0(dn->dn_newuid); + ASSERT0(dn->dn_newgid); + ASSERT0(dn->dn_id_flags); + + ASSERT0(dn->dn_dbufs_count); list_destroy(&dn->dn_dbufs); } @@ -196,7 +197,7 @@ dnode_verify(dnode_t *dn) ASSERT(dn->dn_objset); ASSERT(dn->dn_handle->dnh_dnode == dn); - ASSERT(dn->dn_phys->dn_type < DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(dn->dn_phys->dn_type)); if (!(zfs_flags & ZFS_DEBUG_DNODE_VERIFY)) return; @@ -215,7 +216,7 @@ dnode_verify(dnode_t *dn) ASSERT3U(1<dn_datablkshift, ==, dn->dn_datablksz); } ASSERT3U(dn->dn_nlevels, <=, 30); - ASSERT3U(dn->dn_type, <=, DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(dn->dn_type)); ASSERT3U(dn->dn_nblkptr, >=, 1); ASSERT3U(dn->dn_nblkptr, <=, DN_MAX_NBLKPTR); ASSERT3U(dn->dn_bonuslen, <=, DN_MAX_BONUSLEN); @@ -281,8 +282,10 @@ dnode_byteswap(dnode_phys_t *dnp) */ int off = (dnp->dn_nblkptr-1) * sizeof (blkptr_t); size_t len = DN_MAX_BONUSLEN - off; - ASSERT3U(dnp->dn_bonustype, <, DMU_OT_NUMTYPES); - dmu_ot[dnp->dn_bonustype].ot_byteswap(dnp->dn_bonus + off, len); + ASSERT(DMU_OT_IS_VALID(dnp->dn_bonustype)); + dmu_object_byteswap_t byteswap = + DMU_OT_BYTESWAP(dnp->dn_bonustype); + dmu_ot_byteswap[byteswap].ob_func(dnp->dn_bonus + off, len); } /* Swap SPILL block if we have one */ @@ -361,7 +364,7 @@ dnode_rm_spill(dnode_t *dn, dmu_tx_t *tx) static void dnode_setdblksz(dnode_t *dn, int size) { - ASSERT3U(P2PHASE(size, SPA_MINBLOCKSIZE), ==, 0); + ASSERT0(P2PHASE(size, SPA_MINBLOCKSIZE)); ASSERT3U(size, <=, SPA_MAXBLOCKSIZE); ASSERT3U(size, >=, SPA_MINBLOCKSIZE); ASSERT3U(size >> SPA_MINBLOCKSHIFT, <, @@ -410,7 +413,7 @@ dnode_create(objset_t *os, dnode_phys_t *dnp, dmu_buf_impl_t *db, dmu_zfetch_init(&dn->dn_zfetch, dn); - ASSERT(dn->dn_phys->dn_type < DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(dn->dn_phys->dn_type)); mutex_enter(&os->os_lock); list_insert_head(&os->os_dnodes, dn); @@ -499,31 +502,31 @@ dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs, ASSERT(bcmp(dn->dn_phys, &dnode_phys_zero, sizeof (dnode_phys_t)) == 0); ASSERT(dn->dn_phys->dn_type == DMU_OT_NONE); ASSERT(ot != DMU_OT_NONE); - ASSERT3U(ot, <, DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(ot)); ASSERT((bonustype == DMU_OT_NONE && bonuslen == 0) || (bonustype == DMU_OT_SA && bonuslen == 0) || (bonustype != DMU_OT_NONE && bonuslen != 0)); - ASSERT3U(bonustype, <, DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(bonustype)); ASSERT3U(bonuslen, <=, DN_MAX_BONUSLEN); ASSERT(dn->dn_type == DMU_OT_NONE); - ASSERT3U(dn->dn_maxblkid, ==, 0); - ASSERT3U(dn->dn_allocated_txg, ==, 0); - ASSERT3U(dn->dn_assigned_txg, ==, 0); + ASSERT0(dn->dn_maxblkid); + ASSERT0(dn->dn_allocated_txg); + ASSERT0(dn->dn_assigned_txg); ASSERT(refcount_is_zero(&dn->dn_tx_holds)); ASSERT3U(refcount_count(&dn->dn_holds), <=, 1); ASSERT3P(list_head(&dn->dn_dbufs), ==, NULL); for (i = 0; i < TXG_SIZE; i++) { - ASSERT3U(dn->dn_next_nblkptr[i], ==, 0); - ASSERT3U(dn->dn_next_nlevels[i], ==, 0); - ASSERT3U(dn->dn_next_indblkshift[i], ==, 0); - ASSERT3U(dn->dn_next_bonuslen[i], ==, 0); - ASSERT3U(dn->dn_next_bonustype[i], ==, 0); - ASSERT3U(dn->dn_rm_spillblk[i], ==, 0); - ASSERT3U(dn->dn_next_blksz[i], ==, 0); + ASSERT0(dn->dn_next_nblkptr[i]); + ASSERT0(dn->dn_next_nlevels[i]); + ASSERT0(dn->dn_next_indblkshift[i]); + ASSERT0(dn->dn_next_bonuslen[i]); + ASSERT0(dn->dn_next_bonustype[i]); + ASSERT0(dn->dn_rm_spillblk[i]); + ASSERT0(dn->dn_next_blksz[i]); ASSERT(!list_link_active(&dn->dn_dirty_link[i])); ASSERT3P(list_head(&dn->dn_dirty_records[i]), ==, NULL); - ASSERT3U(avl_numnodes(&dn->dn_ranges[i]), ==, 0); + ASSERT0(avl_numnodes(&dn->dn_ranges[i])); } dn->dn_type = ot; @@ -565,13 +568,13 @@ dnode_reallocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, ASSERT3U(blocksize, >=, SPA_MINBLOCKSIZE); ASSERT3U(blocksize, <=, SPA_MAXBLOCKSIZE); - ASSERT3U(blocksize % SPA_MINBLOCKSIZE, ==, 0); + ASSERT0(blocksize % SPA_MINBLOCKSIZE); ASSERT(dn->dn_object != DMU_META_DNODE_OBJECT || dmu_tx_private_ok(tx)); ASSERT(tx->tx_txg != 0); ASSERT((bonustype == DMU_OT_NONE && bonuslen == 0) || (bonustype != DMU_OT_NONE && bonuslen != 0) || (bonustype == DMU_OT_SA && bonuslen == 0)); - ASSERT3U(bonustype, <, DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(bonustype)); ASSERT3U(bonuslen, <=, DN_MAX_BONUSLEN); /* clean up any unreferenced dbufs */ @@ -1237,9 +1240,9 @@ dnode_setdirty(dnode_t *dn, dmu_tx_t *tx) ASSERT(!refcount_is_zero(&dn->dn_holds) || list_head(&dn->dn_dbufs)); ASSERT(dn->dn_datablksz != 0); - ASSERT3U(dn->dn_next_bonuslen[txg&TXG_MASK], ==, 0); - ASSERT3U(dn->dn_next_blksz[txg&TXG_MASK], ==, 0); - ASSERT3U(dn->dn_next_bonustype[txg&TXG_MASK], ==, 0); + ASSERT0(dn->dn_next_bonuslen[txg&TXG_MASK]); + ASSERT0(dn->dn_next_blksz[txg&TXG_MASK]); + ASSERT0(dn->dn_next_bonustype[txg&TXG_MASK]); dprintf_ds(os->os_dsl_dataset, "obj=%llu txg=%llu\n", dn->dn_object, txg); @@ -1589,7 +1592,7 @@ dnode_free_range(dnode_t *dn, uint64_t off, uint64_t len, dmu_tx_t *tx) else tail = P2PHASE(len, blksz); - ASSERT3U(P2PHASE(off, blksz), ==, 0); + ASSERT0(P2PHASE(off, blksz)); /* zero out any partial block data at the end of the range */ if (tail) { if (len < tail) @@ -1771,7 +1774,7 @@ dnode_diduse_space(dnode_t *dn, int64_t delta) space += delta; if (spa_version(dn->dn_objset->os_spa) < SPA_VERSION_DNODE_BYTES) { ASSERT((dn->dn_phys->dn_flags & DNODE_FLAG_USED_BYTES) == 0); - ASSERT3U(P2PHASE(space, 1<dn_phys->dn_used = space >> DEV_BSHIFT; } else { dn->dn_phys->dn_used = space; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c index 32afe7d74..4862dcbdf 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c @@ -18,8 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -272,7 +274,7 @@ free_children(dmu_buf_impl_t *db, uint64_t blkid, uint64_t nblks, int trunc, continue; rw_enter(&dn->dn_struct_rwlock, RW_READER); err = dbuf_hold_impl(dn, db->db_level-1, i, TRUE, FTAG, &subdb); - ASSERT3U(err, ==, 0); + ASSERT0(err); rw_exit(&dn->dn_struct_rwlock); if (free_children(subdb, blkid, nblks, trunc, tx) == ALL) { @@ -292,7 +294,7 @@ free_children(dmu_buf_impl_t *db, uint64_t blkid, uint64_t nblks, int trunc, continue; else if (i == end && !trunc) continue; - ASSERT3U(bp->blk_birth, ==, 0); + ASSERT0(bp->blk_birth); } #endif ASSERT(all || blocks_freed == 0 || db->db_last_dirty); @@ -348,7 +350,7 @@ dnode_sync_free_range(dnode_t *dn, uint64_t blkid, uint64_t nblks, dmu_tx_t *tx) continue; rw_enter(&dn->dn_struct_rwlock, RW_READER); err = dbuf_hold_impl(dn, dnlevel-1, i, TRUE, FTAG, &db); - ASSERT3U(err, ==, 0); + ASSERT0(err); rw_exit(&dn->dn_struct_rwlock); if (free_children(db, blkid, nblks, trunc, tx) == ALL) { @@ -472,7 +474,7 @@ dnode_sync_free(dnode_t *dn, dmu_tx_t *tx) * Our contents should have been freed in dnode_sync() by the * free range record inserted by the caller of dnode_free(). */ - ASSERT3U(DN_USED_BYTES(dn->dn_phys), ==, 0); + ASSERT0(DN_USED_BYTES(dn->dn_phys)); ASSERT(BP_IS_HOLE(dn->dn_phys->dn_blkptr)); dnode_undirty_dbufs(&dn->dn_dirty_records[txgoff]); @@ -597,7 +599,7 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx) } if (dn->dn_next_bonustype[txgoff]) { - ASSERT(dn->dn_next_bonustype[txgoff] < DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(dn->dn_next_bonustype[txgoff])); dnp->dn_bonustype = dn->dn_next_bonustype[txgoff]; dn->dn_next_bonustype[txgoff] = 0; } diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c index 648f2f353..ccbaa5ed3 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2012, Joyent, Inc. All rights reserved. * Copyright (c) 2011 Pawel Jakub Dawidek . * All rights reserved. @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -103,16 +104,10 @@ dsl_dataset_block_born(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx) if (BP_IS_HOLE(bp)) return; ASSERT(BP_GET_TYPE(bp) != DMU_OT_NONE); - ASSERT3U(BP_GET_TYPE(bp), <, DMU_OT_NUMTYPES); + ASSERT(DMU_OT_IS_VALID(BP_GET_TYPE(bp))); if (ds == NULL) { - /* - * Account for the meta-objset space in its placeholder - * dsl_dir. - */ - ASSERT3U(compressed, ==, uncompressed); /* it's all metadata */ - dsl_dir_diduse_space(tx->tx_pool->dp_mos_dir, DD_USED_HEAD, - used, compressed, uncompressed, tx); - dsl_dir_dirty(tx->tx_pool->dp_mos_dir, tx); + dsl_pool_mos_diduse_space(tx->tx_pool, + used, compressed, uncompressed); return; } dmu_buf_will_dirty(ds->ds_dbuf, tx); @@ -120,7 +115,7 @@ dsl_dataset_block_born(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx) mutex_enter(&ds->ds_dir->dd_lock); mutex_enter(&ds->ds_lock); delta = parent_delta(ds, used); - ds->ds_phys->ds_used_bytes += used; + ds->ds_phys->ds_referenced_bytes += used; ds->ds_phys->ds_compressed_bytes += compressed; ds->ds_phys->ds_uncompressed_bytes += uncompressed; ds->ds_phys->ds_unique_bytes += used; @@ -148,15 +143,9 @@ dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx, ASSERT(used > 0); if (ds == NULL) { - /* - * Account for the meta-objset space in its placeholder - * dataset. - */ dsl_free(tx->tx_pool, tx->tx_txg, bp); - - dsl_dir_diduse_space(tx->tx_pool->dp_mos_dir, DD_USED_HEAD, - -used, -compressed, -uncompressed, tx); - dsl_dir_dirty(tx->tx_pool->dp_mos_dir, tx); + dsl_pool_mos_diduse_space(tx->tx_pool, + -used, -compressed, -uncompressed); return (used); } ASSERT3P(tx->tx_pool, ==, ds->ds_dir->dd_pool); @@ -214,8 +203,8 @@ dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx, } } mutex_enter(&ds->ds_lock); - ASSERT3U(ds->ds_phys->ds_used_bytes, >=, used); - ds->ds_phys->ds_used_bytes -= used; + ASSERT3U(ds->ds_phys->ds_referenced_bytes, >=, used); + ds->ds_phys->ds_referenced_bytes -= used; ASSERT3U(ds->ds_phys->ds_compressed_bytes, >=, compressed); ds->ds_phys->ds_compressed_bytes -= compressed; ASSERT3U(ds->ds_phys->ds_uncompressed_bytes, >=, uncompressed); @@ -827,8 +816,8 @@ dsl_dataset_create_sync_dd(dsl_dir_t *dd, dsl_dataset_t *origin, dsphys->ds_prev_snap_obj = origin->ds_object; dsphys->ds_prev_snap_txg = origin->ds_phys->ds_creation_txg; - dsphys->ds_used_bytes = - origin->ds_phys->ds_used_bytes; + dsphys->ds_referenced_bytes = + origin->ds_phys->ds_referenced_bytes; dsphys->ds_compressed_bytes = origin->ds_phys->ds_compressed_bytes; dsphys->ds_uncompressed_bytes = @@ -981,7 +970,6 @@ dmu_snapshots_destroy_nvl(nvlist_t *snaps, boolean_t defer, char *failed) for (pair = nvlist_next_nvpair(snaps, NULL); pair != NULL; pair = nvlist_next_nvpair(snaps, pair)) { dsl_dataset_t *ds; - int err; err = dsl_dataset_own(nvpair_name(pair), B_TRUE, dstg, &ds); if (err == 0) { @@ -1116,56 +1104,55 @@ dsl_dataset_destroy(dsl_dataset_t *ds, void *tag, boolean_t defer) dummy_ds.ds_dir = dd; dummy_ds.ds_object = ds->ds_object; - /* - * Check for errors and mark this ds as inconsistent, in - * case we crash while freeing the objects. - */ - err = dsl_sync_task_do(dd->dd_pool, dsl_dataset_destroy_begin_check, - dsl_dataset_destroy_begin_sync, ds, NULL, 0); - if (err) - goto out; - - err = dmu_objset_from_ds(ds, &os); - if (err) - goto out; - - /* - * remove the objects in open context, so that we won't - * have too much to do in syncing context. - */ - for (obj = 0; err == 0; err = dmu_object_next(os, &obj, FALSE, - ds->ds_phys->ds_prev_snap_txg)) { + if (!spa_feature_is_enabled(dsl_dataset_get_spa(ds), + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { /* - * Ignore errors, if there is not enough disk space - * we will deal with it in dsl_dataset_destroy_sync(). + * Check for errors and mark this ds as inconsistent, in + * case we crash while freeing the objects. */ - (void) dmu_free_object(os, obj); - } - if (err != ESRCH) - goto out; + err = dsl_sync_task_do(dd->dd_pool, + dsl_dataset_destroy_begin_check, + dsl_dataset_destroy_begin_sync, ds, NULL, 0); + if (err) + goto out; - /* - * Only the ZIL knows how to free log blocks. - */ - zil_destroy(dmu_objset_zil(os), B_FALSE); + err = dmu_objset_from_ds(ds, &os); + if (err) + goto out; - /* - * Sync out all in-flight IO. - */ - txg_wait_synced(dd->dd_pool, 0); + /* + * Remove all objects while in the open context so that + * there is less work to do in the syncing context. + */ + for (obj = 0; err == 0; err = dmu_object_next(os, &obj, FALSE, + ds->ds_phys->ds_prev_snap_txg)) { + /* + * Ignore errors, if there is not enough disk space + * we will deal with it in dsl_dataset_destroy_sync(). + */ + (void) dmu_free_object(os, obj); + } + if (err != ESRCH) + goto out; - /* - * If we managed to free all the objects in open - * context, the user space accounting should be zero. - */ - if (ds->ds_phys->ds_bp.blk_fill == 0 && - dmu_objset_userused_enabled(os)) { - uint64_t count; + /* + * Sync out all in-flight IO. + */ + txg_wait_synced(dd->dd_pool, 0); - ASSERT(zap_count(os, DMU_USERUSED_OBJECT, &count) != 0 || - count == 0); - ASSERT(zap_count(os, DMU_GROUPUSED_OBJECT, &count) != 0 || - count == 0); + /* + * If we managed to free all the objects in open + * context, the user space accounting should be zero. + */ + if (ds->ds_phys->ds_bp.blk_fill == 0 && + dmu_objset_userused_enabled(os)) { + uint64_t count; + + ASSERT(zap_count(os, DMU_USERUSED_OBJECT, + &count) != 0 || count == 0); + ASSERT(zap_count(os, DMU_GROUPUSED_OBJECT, + &count) != 0 || count == 0); + } } rw_enter(&dd->dd_pool->dp_config_rwlock, RW_READER); @@ -1271,6 +1258,17 @@ dsl_dataset_dirty(dsl_dataset_t *ds, dmu_tx_t *tx) } } +boolean_t +dsl_dataset_is_dirty(dsl_dataset_t *ds) +{ + for (int t = 0; t < TXG_SIZE; t++) { + if (txg_list_member(&ds->ds_dir->dd_pool->dp_dirty_datasets, + ds, t)) + return (B_TRUE); + } + return (B_FALSE); +} + /* * The unique space in the head dataset can be calculated by subtracting * the space used in the most recent snapshot, that is still being used @@ -1288,7 +1286,7 @@ dsl_dataset_recalc_head_uniq(dsl_dataset_t *ds) ASSERT(!dsl_dataset_is_snapshot(ds)); if (ds->ds_phys->ds_prev_snap_obj != 0) - mrs_used = ds->ds_prev->ds_phys->ds_used_bytes; + mrs_used = ds->ds_prev->ds_phys->ds_referenced_bytes; else mrs_used = 0; @@ -1296,7 +1294,7 @@ dsl_dataset_recalc_head_uniq(dsl_dataset_t *ds) ASSERT3U(dlused, <=, mrs_used); ds->ds_phys->ds_unique_bytes = - ds->ds_phys->ds_used_bytes - (mrs_used - dlused); + ds->ds_phys->ds_referenced_bytes - (mrs_used - dlused); if (spa_version(ds->ds_dir->dd_pool->dp_spa) >= SPA_VERSION_UNIQUE_ACCURATE) @@ -1551,7 +1549,7 @@ remove_from_next_clones(dsl_dataset_t *ds, uint64_t obj, dmu_tx_t *tx) * remove this one. */ if (err != ENOENT) { - VERIFY3U(err, ==, 0); + VERIFY0(err); } ASSERT3U(0, ==, zap_count(mos, ds->ds_phys->ds_next_clones_obj, &count)); @@ -1638,7 +1636,7 @@ process_old_deadlist(dsl_dataset_t *ds, dsl_dataset_t *ds_prev, poa.pio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED); VERIFY3U(0, ==, bpobj_iterate(&ds_next->ds_deadlist.dl_bpobj, process_old_cb, &poa, tx)); - VERIFY3U(zio_wait(poa.pio), ==, 0); + VERIFY0(zio_wait(poa.pio)); ASSERT3U(poa.used, ==, ds->ds_phys->ds_unique_bytes); /* change snapused */ @@ -1655,6 +1653,30 @@ process_old_deadlist(dsl_dataset_t *ds, dsl_dataset_t *ds_prev, ds_next->ds_phys->ds_deadlist_obj); } +static int +old_synchronous_dataset_destroy(dsl_dataset_t *ds, dmu_tx_t *tx) +{ + int err; + struct killarg ka; + + /* + * Free everything that we point to (that's born after + * the previous snapshot, if we are a clone) + * + * NB: this should be very quick, because we already + * freed all the objects in open context. + */ + ka.ds = ds; + ka.tx = tx; + err = traverse_dataset(ds, + ds->ds_phys->ds_prev_snap_txg, TRAVERSE_POST, + kill_blkptr, &ka); + ASSERT0(err); + ASSERT(!DS_UNIQUE_IS_ACCURATE(ds) || ds->ds_phys->ds_unique_bytes == 0); + + return (err); +} + void dsl_dataset_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) { @@ -1701,7 +1723,7 @@ dsl_dataset_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) psa.psa_effective_value = 0; /* predict default value */ dsl_dataset_set_reservation_sync(ds, &psa, tx); - ASSERT3U(ds->ds_reserved, ==, 0); + ASSERT0(ds->ds_reserved); } ASSERT(RW_WRITE_HELD(&dp->dp_config_rwlock)); @@ -1801,7 +1823,6 @@ dsl_dataset_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) tx); dsl_dir_diduse_space(tx->tx_pool->dp_free_dir, DD_USED_HEAD, used, comp, uncomp, tx); - dsl_dir_dirty(tx->tx_pool->dp_free_dir, tx); /* Merge our deadlist into next's and free it. */ dsl_deadlist_merge(&ds_next->ds_deadlist, @@ -1877,32 +1898,57 @@ dsl_dataset_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) } dsl_dataset_rele(ds_next, FTAG); } else { + zfeature_info_t *async_destroy = + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY]; + objset_t *os; + /* * There's no next snapshot, so this is a head dataset. * Destroy the deadlist. Unless it's a clone, the * deadlist should be empty. (If it's a clone, it's * safe to ignore the deadlist contents.) */ - struct killarg ka; - dsl_deadlist_close(&ds->ds_deadlist); dsl_deadlist_free(mos, ds->ds_phys->ds_deadlist_obj, tx); ds->ds_phys->ds_deadlist_obj = 0; - /* - * Free everything that we point to (that's born after - * the previous snapshot, if we are a clone) - * - * NB: this should be very quick, because we already - * freed all the objects in open context. - */ - ka.ds = ds; - ka.tx = tx; - err = traverse_dataset(ds, ds->ds_phys->ds_prev_snap_txg, - TRAVERSE_POST, kill_blkptr, &ka); - ASSERT3U(err, ==, 0); - ASSERT(!DS_UNIQUE_IS_ACCURATE(ds) || - ds->ds_phys->ds_unique_bytes == 0); + VERIFY3U(0, ==, dmu_objset_from_ds(ds, &os)); + + if (!spa_feature_is_enabled(dp->dp_spa, async_destroy)) { + err = old_synchronous_dataset_destroy(ds, tx); + } else { + /* + * Move the bptree into the pool's list of trees to + * clean up and update space accounting information. + */ + uint64_t used, comp, uncomp; + + zil_destroy_sync(dmu_objset_zil(os), tx); + + if (!spa_feature_is_active(dp->dp_spa, async_destroy)) { + spa_feature_incr(dp->dp_spa, async_destroy, tx); + dp->dp_bptree_obj = bptree_alloc(mos, tx); + VERIFY(zap_add(mos, + DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_BPTREE_OBJ, sizeof (uint64_t), 1, + &dp->dp_bptree_obj, tx) == 0); + } + + used = ds->ds_dir->dd_phys->dd_used_bytes; + comp = ds->ds_dir->dd_phys->dd_compressed_bytes; + uncomp = ds->ds_dir->dd_phys->dd_uncompressed_bytes; + + ASSERT(!DS_UNIQUE_IS_ACCURATE(ds) || + ds->ds_phys->ds_unique_bytes == used); + + bptree_add(mos, dp->dp_bptree_obj, + &ds->ds_phys->ds_bp, ds->ds_phys->ds_prev_snap_txg, + used, comp, uncomp, tx); + dsl_dir_diduse_space(ds->ds_dir, DD_USED_HEAD, + -used, -comp, -uncomp, tx); + dsl_dir_diduse_space(dp->dp_free_dir, DD_USED_HEAD, + used, comp, uncomp, tx); + } if (ds->ds_prev != NULL) { if (spa_version(dp->dp_spa) >= SPA_VERSION_DIR_CLONES) { @@ -1944,7 +1990,7 @@ dsl_dataset_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) err = dsl_dataset_snap_lookup(ds_head, ds->ds_snapname, &val); - ASSERT3U(err, ==, 0); + ASSERT0(err); ASSERT3U(val, ==, obj); } #endif @@ -2095,7 +2141,7 @@ dsl_dataset_snapshot_sync(void *arg1, void *arg2, dmu_tx_t *tx) dsphys->ds_creation_time = gethrestime_sec(); dsphys->ds_creation_txg = crtxg; dsphys->ds_deadlist_obj = ds->ds_phys->ds_deadlist_obj; - dsphys->ds_used_bytes = ds->ds_phys->ds_used_bytes; + dsphys->ds_referenced_bytes = ds->ds_phys->ds_referenced_bytes; dsphys->ds_compressed_bytes = ds->ds_phys->ds_compressed_bytes; dsphys->ds_uncompressed_bytes = ds->ds_phys->ds_uncompressed_bytes; dsphys->ds_flags = ds->ds_phys->ds_flags; @@ -2184,7 +2230,6 @@ dsl_dataset_sync(dsl_dataset_t *ds, zio_t *zio, dmu_tx_t *tx) dmu_buf_will_dirty(ds->ds_dbuf, tx); ds->ds_phys->ds_fsid_guid = ds->ds_fsid_guid; - dsl_dir_dirty(ds->ds_dir, tx); dmu_objset_sync(ds->ds_objset, zio, tx); } @@ -2219,10 +2264,22 @@ get_clones_stat(dsl_dataset_t *ds, nvlist_t *nv) zap_cursor_advance(&zc)) { dsl_dataset_t *clone; char buf[ZFS_MAXNAMELEN]; + /* + * Even though we hold the dp_config_rwlock, the dataset + * may fail to open, returning ENOENT. If there is a + * thread concurrently attempting to destroy this + * dataset, it will have the ds_rwlock held for + * RW_WRITER. Our call to dsl_dataset_hold_obj() -> + * dsl_dataset_hold_ref() will fail its + * rw_tryenter(&ds->ds_rwlock, RW_READER), drop the + * dp_config_rwlock, and wait for the destroy progress + * and signal ds_exclusive_cv. If the destroy was + * successful, we will see that + * DSL_DATASET_IS_DESTROYED(), and return ENOENT. + */ if (dsl_dataset_hold_obj(ds->ds_dir->dd_pool, - za.za_first_integer, FTAG, &clone) != 0) { - goto fail; - } + za.za_first_integer, FTAG, &clone) != 0) + continue; dsl_dir_name(clone->ds_dir, buf); VERIFY(nvlist_add_boolean(val, buf) == 0); dsl_dataset_rele(clone, FTAG); @@ -2286,7 +2343,6 @@ dsl_dataset_stats(dsl_dataset_t *ds, nvlist_t *nv) } } } - ratio = ds->ds_phys->ds_compressed_bytes == 0 ? 100 : (ds->ds_phys->ds_uncompressed_bytes * 100 / ds->ds_phys->ds_compressed_bytes); @@ -2345,7 +2401,7 @@ dsl_dataset_space(dsl_dataset_t *ds, uint64_t *refdbytesp, uint64_t *availbytesp, uint64_t *usedobjsp, uint64_t *availobjsp) { - *refdbytesp = ds->ds_phys->ds_used_bytes; + *refdbytesp = ds->ds_phys->ds_referenced_bytes; *availbytesp = dsl_dir_space_available(ds->ds_dir, NULL, 0, TRUE); if (ds->ds_reserved > ds->ds_phys->ds_unique_bytes) *availbytesp += ds->ds_reserved - ds->ds_phys->ds_unique_bytes; @@ -2441,14 +2497,14 @@ dsl_dataset_snapshot_rename_sync(void *arg1, void *arg2, dmu_tx_t *tx) VERIFY(0 == dsl_dataset_get_snapname(ds)); err = dsl_dataset_snap_remove(hds, ds->ds_snapname, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); dsl_dataset_name(ds, oldname); mutex_enter(&ds->ds_lock); (void) strcpy(ds->ds_snapname, newsnapname); mutex_exit(&ds->ds_lock); err = zap_add(mos, hds->ds_phys->ds_snapnames_zapobj, ds->ds_snapname, 8, 1, &ds->ds_object, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); dsl_dataset_name(ds, newname); #ifdef _KERNEL zvol_rename_minors(oldname, newname); @@ -2688,7 +2744,7 @@ dsl_dataset_promote_check(void *arg1, void *arg2, dmu_tx_t *tx) * Note however, if we stop before we reach the ORIGIN we get: * uN + kN + kN-1 + ... + kM - uM-1 */ - pa->used = origin_ds->ds_phys->ds_used_bytes; + pa->used = origin_ds->ds_phys->ds_referenced_bytes; pa->comp = origin_ds->ds_phys->ds_compressed_bytes; pa->uncomp = origin_ds->ds_phys->ds_uncompressed_bytes; for (snap = list_head(&pa->shared_snaps); snap; @@ -2722,7 +2778,7 @@ dsl_dataset_promote_check(void *arg1, void *arg2, dmu_tx_t *tx) * so we need to subtract out the clone origin's used space. */ if (pa->origin_origin) { - pa->used -= pa->origin_origin->ds_phys->ds_used_bytes; + pa->used -= pa->origin_origin->ds_phys->ds_referenced_bytes; pa->comp -= pa->origin_origin->ds_phys->ds_compressed_bytes; pa->uncomp -= pa->origin_origin->ds_phys->ds_uncompressed_bytes; } @@ -2906,7 +2962,7 @@ dsl_dataset_promote_sync(void *arg1, void *arg2, dmu_tx_t *tx) zap_cursor_fini(&zc); } - ASSERT3U(dsl_prop_numcb(ds), ==, 0); + ASSERT0(dsl_prop_numcb(ds)); } /* @@ -3238,8 +3294,8 @@ dsl_dataset_clone_swap_sync(void *arg1, void *arg2, dmu_tx_t *tx) dsl_deadlist_space(&csa->ohds->ds_deadlist, &odl_used, &odl_comp, &odl_uncomp); - dused = csa->cds->ds_phys->ds_used_bytes + cdl_used - - (csa->ohds->ds_phys->ds_used_bytes + odl_used); + dused = csa->cds->ds_phys->ds_referenced_bytes + cdl_used - + (csa->ohds->ds_phys->ds_referenced_bytes + odl_used); dcomp = csa->cds->ds_phys->ds_compressed_bytes + cdl_comp - (csa->ohds->ds_phys->ds_compressed_bytes + odl_comp); duncomp = csa->cds->ds_phys->ds_uncompressed_bytes + @@ -3268,8 +3324,8 @@ dsl_dataset_clone_swap_sync(void *arg1, void *arg2, dmu_tx_t *tx) } /* swap ds_*_bytes */ - SWITCH64(csa->ohds->ds_phys->ds_used_bytes, - csa->cds->ds_phys->ds_used_bytes); + SWITCH64(csa->ohds->ds_phys->ds_referenced_bytes, + csa->cds->ds_phys->ds_referenced_bytes); SWITCH64(csa->ohds->ds_phys->ds_compressed_bytes, csa->cds->ds_phys->ds_compressed_bytes); SWITCH64(csa->ohds->ds_phys->ds_uncompressed_bytes, @@ -3398,8 +3454,9 @@ dsl_dataset_check_quota(dsl_dataset_t *ds, boolean_t check_quota, * on-disk is over quota and there are no pending changes (which * may free up space for us). */ - if (ds->ds_phys->ds_used_bytes + inflight >= ds->ds_quota) { - if (inflight > 0 || ds->ds_phys->ds_used_bytes < ds->ds_quota) + if (ds->ds_phys->ds_referenced_bytes + inflight >= ds->ds_quota) { + if (inflight > 0 || + ds->ds_phys->ds_referenced_bytes < ds->ds_quota) error = ERESTART; else error = EDQUOT; @@ -3426,7 +3483,7 @@ dsl_dataset_set_quota_check(void *arg1, void *arg2, dmu_tx_t *tx) if (psa->psa_effective_value == 0) return (0); - if (psa->psa_effective_value < ds->ds_phys->ds_used_bytes || + if (psa->psa_effective_value < ds->ds_phys->ds_referenced_bytes || psa->psa_effective_value < ds->ds_reserved) return (ENOSPC); @@ -3448,10 +3505,6 @@ dsl_dataset_set_quota_sync(void *arg1, void *arg2, dmu_tx_t *tx) if (ds->ds_quota != effective_value) { dmu_buf_will_dirty(ds->ds_dbuf, tx); ds->ds_quota = effective_value; - - spa_history_log_internal(LOG_DS_REFQUOTA, - ds->ds_dir->dd_pool->dp_spa, tx, "%lld dataset = %llu ", - (longlong_t)ds->ds_quota, ds->ds_object); } } @@ -3555,10 +3608,6 @@ dsl_dataset_set_reservation_sync(void *arg1, void *arg2, dmu_tx_t *tx) dsl_dir_diduse_space(ds->ds_dir, DD_USED_REFRSRV, delta, 0, 0, tx); mutex_exit(&ds->ds_dir->dd_lock); - - spa_history_log_internal(LOG_DS_REFRESERV, - ds->ds_dir->dd_pool->dp_spa, tx, "%lld dataset = %llu", - (longlong_t)effective_value, ds->ds_object); } int @@ -3907,6 +3956,11 @@ dsl_dataset_user_release_sync(void *arg1, void *tag, dmu_tx_t *tx) VERIFY(error == 0 || error == ENOENT); zapobj = ds->ds_phys->ds_userrefs_obj; VERIFY(0 == zap_remove(mos, zapobj, ra->htag, tx)); + + spa_history_log_internal(LOG_DS_USER_RELEASE, + dp->dp_spa, tx, "<%s> %lld dataset = %llu", + ra->htag, (longlong_t)refs, dsobj); + if (ds->ds_userrefs == 0 && ds->ds_phys->ds_num_children == 1 && DS_IS_DEFER_DESTROY(ds)) { struct dsl_ds_destroyarg dsda = {0}; @@ -3917,10 +3971,6 @@ dsl_dataset_user_release_sync(void *arg1, void *tag, dmu_tx_t *tx) /* We already did the destroy_check */ dsl_dataset_destroy_sync(&dsda, tag, tx); } - - spa_history_log_internal(LOG_DS_USER_RELEASE, - dp->dp_spa, tx, "<%s> %lld dataset = %llu", - ra->htag, (longlong_t)refs, dsobj); } static int @@ -4180,8 +4230,8 @@ dsl_dataset_space_written(dsl_dataset_t *oldsnap, dsl_dataset_t *new, dsl_pool_t *dp = new->ds_dir->dd_pool; *usedp = 0; - *usedp += new->ds_phys->ds_used_bytes; - *usedp -= oldsnap->ds_phys->ds_used_bytes; + *usedp += new->ds_phys->ds_referenced_bytes; + *usedp -= oldsnap->ds_phys->ds_referenced_bytes; *compp = 0; *compp += new->ds_phys->ds_compressed_bytes; @@ -4197,9 +4247,13 @@ dsl_dataset_space_written(dsl_dataset_t *oldsnap, dsl_dataset_t *new, dsl_dataset_t *snap; uint64_t used, comp, uncomp; - err = dsl_dataset_hold_obj(dp, snapobj, FTAG, &snap); - if (err != 0) - break; + if (snapobj == new->ds_object) { + snap = new; + } else { + err = dsl_dataset_hold_obj(dp, snapobj, FTAG, &snap); + if (err != 0) + break; + } if (snap->ds_phys->ds_prev_snap_txg == oldsnap->ds_phys->ds_creation_txg) { @@ -4228,7 +4282,8 @@ dsl_dataset_space_written(dsl_dataset_t *oldsnap, dsl_dataset_t *new, * was not a snapshot of/before new. */ snapobj = snap->ds_phys->ds_prev_snap_obj; - dsl_dataset_rele(snap, FTAG); + if (snap != new) + dsl_dataset_rele(snap, FTAG); if (snapobj == 0) { err = EINVAL; break; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c index dd6db2120..4f39c397a 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -163,12 +163,49 @@ dsl_deadlist_free(objset_t *os, uint64_t dlobj, dmu_tx_t *tx) for (zap_cursor_init(&zc, os, dlobj); zap_cursor_retrieve(&zc, &za) == 0; - zap_cursor_advance(&zc)) - bpobj_free(os, za.za_first_integer, tx); + zap_cursor_advance(&zc)) { + uint64_t obj = za.za_first_integer; + if (obj == dmu_objset_pool(os)->dp_empty_bpobj) + bpobj_decr_empty(os, tx); + else + bpobj_free(os, obj, tx); + } zap_cursor_fini(&zc); VERIFY3U(0, ==, dmu_object_free(os, dlobj, tx)); } +static void +dle_enqueue(dsl_deadlist_t *dl, dsl_deadlist_entry_t *dle, + const blkptr_t *bp, dmu_tx_t *tx) +{ + if (dle->dle_bpobj.bpo_object == + dmu_objset_pool(dl->dl_os)->dp_empty_bpobj) { + uint64_t obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + bpobj_close(&dle->dle_bpobj); + bpobj_decr_empty(dl->dl_os, tx); + VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj)); + VERIFY3U(0, ==, zap_update_int_key(dl->dl_os, dl->dl_object, + dle->dle_mintxg, obj, tx)); + } + bpobj_enqueue(&dle->dle_bpobj, bp, tx); +} + +static void +dle_enqueue_subobj(dsl_deadlist_t *dl, dsl_deadlist_entry_t *dle, + uint64_t obj, dmu_tx_t *tx) +{ + if (dle->dle_bpobj.bpo_object != + dmu_objset_pool(dl->dl_os)->dp_empty_bpobj) { + bpobj_enqueue_subobj(&dle->dle_bpobj, obj, tx); + } else { + bpobj_close(&dle->dle_bpobj); + bpobj_decr_empty(dl->dl_os, tx); + VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj)); + VERIFY3U(0, ==, zap_update_int_key(dl->dl_os, dl->dl_object, + dle->dle_mintxg, obj, tx)); + } +} + void dsl_deadlist_insert(dsl_deadlist_t *dl, const blkptr_t *bp, dmu_tx_t *tx) { @@ -197,7 +234,7 @@ dsl_deadlist_insert(dsl_deadlist_t *dl, const blkptr_t *bp, dmu_tx_t *tx) dle = avl_nearest(&dl->dl_tree, where, AVL_BEFORE); else dle = AVL_PREV(&dl->dl_tree, dle); - bpobj_enqueue(&dle->dle_bpobj, bp, tx); + dle_enqueue(dl, dle, bp, tx); } /* @@ -217,7 +254,7 @@ dsl_deadlist_add_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx) dle = kmem_alloc(sizeof (*dle), KM_SLEEP); dle->dle_mintxg = mintxg; - obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + obj = bpobj_alloc_empty(dl->dl_os, SPA_MAXBLOCKSIZE, tx); VERIFY3U(0, ==, bpobj_open(&dle->dle_bpobj, dl->dl_os, obj)); avl_add(&dl->dl_tree, dle); @@ -243,8 +280,7 @@ dsl_deadlist_remove_key(dsl_deadlist_t *dl, uint64_t mintxg, dmu_tx_t *tx) dle = avl_find(&dl->dl_tree, &dle_tofind, NULL); dle_prev = AVL_PREV(&dl->dl_tree, dle); - bpobj_enqueue_subobj(&dle_prev->dle_bpobj, - dle->dle_bpobj.bpo_object, tx); + dle_enqueue_subobj(dl, dle_prev, dle->dle_bpobj.bpo_object, tx); avl_remove(&dl->dl_tree, dle); bpobj_close(&dle->dle_bpobj); @@ -302,7 +338,7 @@ dsl_deadlist_clone(dsl_deadlist_t *dl, uint64_t maxtxg, if (dle->dle_mintxg >= maxtxg) break; - obj = bpobj_alloc(dl->dl_os, SPA_MAXBLOCKSIZE, tx); + obj = bpobj_alloc_empty(dl->dl_os, SPA_MAXBLOCKSIZE, tx); VERIFY3U(0, ==, zap_add_int_key(dl->dl_os, newobj, dle->dle_mintxg, obj, tx)); } @@ -400,7 +436,7 @@ dsl_deadlist_insert_bpobj(dsl_deadlist_t *dl, uint64_t obj, uint64_t birth, dle = avl_find(&dl->dl_tree, &dle_tofind, &where); if (dle == NULL) dle = avl_nearest(&dl->dl_tree, where, AVL_BEFORE); - bpobj_enqueue_subobj(&dle->dle_bpobj, obj, tx); + dle_enqueue_subobj(dl, dle, obj, tx); } static int diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c index 0b5fa0bcd..8f9d2c5f8 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -171,10 +171,8 @@ dsl_deleg_set_sync(void *arg1, void *arg2, dmu_tx_t *tx) VERIFY(nvpair_value_nvlist(whopair, &perms) == 0); if (zap_lookup(mos, zapobj, whokey, 8, 1, &jumpobj) != 0) { - jumpobj = zap_create(mos, DMU_OT_DSL_PERMS, - DMU_OT_NONE, 0, tx); - VERIFY(zap_update(mos, zapobj, - whokey, 8, 1, &jumpobj, tx) == 0); + jumpobj = zap_create_link(mos, DMU_OT_DSL_PERMS, + zapobj, whokey, tx); } while (permpair = nvlist_next_nvpair(perms, permpair)) { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c index 1213445da..4d954bd51 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c @@ -195,7 +195,6 @@ errout: kmem_free(dd, sizeof (dsl_dir_t)); dmu_buf_rele(dbuf, tag); return (err); - } void @@ -229,7 +228,7 @@ dsl_dir_name(dsl_dir_t *dd, char *buf) } } -/* Calculate name legnth, avoiding all the strcat calls of dsl_dir_name */ +/* Calculate name length, avoiding all the strcat calls of dsl_dir_name */ int dsl_dir_namelen(dsl_dir_t *dd) { @@ -463,12 +462,14 @@ dsl_dir_destroy_check(void *arg1, void *arg2, dmu_tx_t *tx) /* * There should be exactly two holds, both from * dsl_dataset_destroy: one on the dd directory, and one on its - * head ds. Otherwise, someone is trying to lookup something - * inside this dir while we want to destroy it. The - * config_rwlock ensures that nobody else opens it after we - * check. + * head ds. If there are more holds, then a concurrent thread is + * performing a lookup inside this dir while we're trying to destroy + * it. To minimize this possibility, we perform this check only + * in syncing context and fail the operation if we encounter + * additional holds. The dp_config_rwlock ensures that nobody else + * opens it after we check. */ - if (dmu_buf_refcount(dd->dd_dbuf) > 2) + if (dmu_tx_is_syncing(tx) && dmu_buf_refcount(dd->dd_dbuf) > 2) return (EBUSY); err = zap_count(mos, dd->dd_phys->dd_child_dir_zapobj, &count); @@ -502,10 +503,10 @@ dsl_dir_destroy_sync(void *arg1, void *tag, dmu_tx_t *tx) dsl_dir_set_reservation_sync(ds, &psa, tx); - ASSERT3U(dd->dd_phys->dd_used_bytes, ==, 0); - ASSERT3U(dd->dd_phys->dd_reserved, ==, 0); + ASSERT0(dd->dd_phys->dd_used_bytes); + ASSERT0(dd->dd_phys->dd_reserved); for (t = 0; t < DD_USED_NUM; t++) - ASSERT3U(dd->dd_phys->dd_used_breakdown[t], ==, 0); + ASSERT0(dd->dd_phys->dd_used_breakdown[t]); VERIFY(0 == zap_destroy(mos, dd->dd_phys->dd_child_dir_zapobj, tx)); VERIFY(0 == zap_destroy(mos, dd->dd_phys->dd_props_zapobj, tx)); @@ -593,10 +594,8 @@ dsl_dir_sync(dsl_dir_t *dd, dmu_tx_t *tx) { ASSERT(dmu_tx_is_syncing(tx)); - dmu_buf_will_dirty(dd->dd_dbuf, tx); - mutex_enter(&dd->dd_lock); - ASSERT3U(dd->dd_tempreserved[tx->tx_txg&TXG_MASK], ==, 0); + ASSERT0(dd->dd_tempreserved[tx->tx_txg&TXG_MASK]); dprintf_dd(dd, "txg=%llu towrite=%lluK\n", tx->tx_txg, dd->dd_space_towrite[tx->tx_txg&TXG_MASK] / 1024); dd->dd_space_towrite[tx->tx_txg&TXG_MASK] = 0; @@ -951,8 +950,6 @@ dsl_dir_diduse_space(dsl_dir_t *dd, dd_used_t type, ASSERT(dmu_tx_is_syncing(tx)); ASSERT(type < DD_USED_NUM); - dsl_dir_dirty(dd, tx); - if (needlock) mutex_enter(&dd->dd_lock); accounted_delta = parent_delta(dd, dd->dd_phys->dd_used_bytes, used); @@ -961,6 +958,7 @@ dsl_dir_diduse_space(dsl_dir_t *dd, dd_used_t type, dd->dd_phys->dd_compressed_bytes >= -compressed); ASSERT(uncompressed >= 0 || dd->dd_phys->dd_uncompressed_bytes >= -uncompressed); + dmu_buf_will_dirty(dd->dd_dbuf, tx); dd->dd_phys->dd_used_bytes += used; dd->dd_phys->dd_uncompressed_bytes += uncompressed; dd->dd_phys->dd_compressed_bytes += compressed; @@ -1002,13 +1000,13 @@ dsl_dir_transfer_space(dsl_dir_t *dd, int64_t delta, if (delta == 0 || !(dd->dd_phys->dd_flags & DD_FLAG_USED_BREAKDOWN)) return; - dsl_dir_dirty(dd, tx); if (needlock) mutex_enter(&dd->dd_lock); ASSERT(delta > 0 ? dd->dd_phys->dd_used_breakdown[oldtype] >= delta : dd->dd_phys->dd_used_breakdown[newtype] >= -delta); ASSERT(dd->dd_phys->dd_used_bytes >= ABS(delta)); + dmu_buf_will_dirty(dd->dd_dbuf, tx); dd->dd_phys->dd_used_breakdown[oldtype] -= delta; dd->dd_phys->dd_used_breakdown[newtype] += delta; if (needlock) @@ -1065,10 +1063,6 @@ dsl_dir_set_quota_sync(void *arg1, void *arg2, dmu_tx_t *tx) mutex_enter(&dd->dd_lock); dd->dd_phys->dd_quota = effective_value; mutex_exit(&dd->dd_lock); - - spa_history_log_internal(LOG_DS_QUOTA, dd->dd_pool->dp_spa, - tx, "%lld dataset = %llu ", - (longlong_t)effective_value, dd->dd_phys->dd_head_dataset_obj); } int @@ -1181,10 +1175,6 @@ dsl_dir_set_reservation_sync(void *arg1, void *arg2, dmu_tx_t *tx) delta, 0, 0, tx); } mutex_exit(&dd->dd_lock); - - spa_history_log_internal(LOG_DS_RESERVATION, dd->dd_pool->dp_spa, - tx, "%lld dataset = %llu", - (longlong_t)effective_value, dd->dd_phys->dd_head_dataset_obj); } int @@ -1340,7 +1330,7 @@ dsl_dir_rename_sync(void *arg1, void *arg2, dmu_tx_t *tx) dsl_dir_name(dd, oldname); err = zap_remove(mos, dd->dd_parent->dd_phys->dd_child_dir_zapobj, dd->dd_myname, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); (void) strcpy(dd->dd_myname, ra->mynewname); dsl_dir_close(dd->dd_parent, dd); @@ -1351,7 +1341,7 @@ dsl_dir_rename_sync(void *arg1, void *arg2, dmu_tx_t *tx) /* add to new parent zapobj */ err = zap_add(mos, ra->newparent->dd_phys->dd_child_dir_zapobj, dd->dd_myname, 8, 1, &dd->dd_object, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); dsl_dir_name(dd, newname); #ifdef _KERNEL zfsvfs_update_fromname(oldname, newname); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c index 7b15958cc..65b9665d6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -40,6 +40,9 @@ #include #include #include +#include +#include +#include int zfs_no_write_throttle = 0; int zfs_write_limit_shift = 3; /* 1/8th of physical memory */ @@ -109,12 +112,12 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) txg_list_create(&dp->dp_dirty_datasets, offsetof(dsl_dataset_t, ds_dirty_link)); + txg_list_create(&dp->dp_dirty_zilogs, + offsetof(zilog_t, zl_dirty_link)); txg_list_create(&dp->dp_dirty_dirs, offsetof(dsl_dir_t, dd_dirty_link)); txg_list_create(&dp->dp_sync_tasks, offsetof(dsl_sync_task_group_t, dstg_node)); - list_create(&dp->dp_synced_datasets, sizeof (dsl_dataset_t), - offsetof(dsl_dataset_t, ds_synced_link)); mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL); @@ -125,20 +128,30 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) } int -dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) +dsl_pool_init(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) { int err; dsl_pool_t *dp = dsl_pool_open_impl(spa, txg); + + err = dmu_objset_open_impl(spa, NULL, &dp->dp_meta_rootbp, + &dp->dp_meta_objset); + if (err != 0) + dsl_pool_close(dp); + else + *dpp = dp; + + return (err); +} + +int +dsl_pool_open(dsl_pool_t *dp) +{ + int err; dsl_dir_t *dd; dsl_dataset_t *ds; uint64_t obj; rw_enter(&dp->dp_config_rwlock, RW_WRITER); - err = dmu_objset_open_impl(spa, NULL, &dp->dp_meta_rootbp, - &dp->dp_meta_objset); - if (err) - goto out; - err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1, &dp->dp_root_dir_obj); @@ -154,7 +167,7 @@ dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) if (err) goto out; - if (spa_version(spa) >= SPA_VERSION_ORIGIN) { + if (spa_version(dp->dp_spa) >= SPA_VERSION_ORIGIN) { err = dsl_pool_open_special_dir(dp, ORIGIN_DIR_NAME, &dd); if (err) goto out; @@ -171,7 +184,7 @@ dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) goto out; } - if (spa_version(spa) >= SPA_VERSION_DEADLISTS) { + if (spa_version(dp->dp_spa) >= SPA_VERSION_DEADLISTS) { err = dsl_pool_open_special_dir(dp, FREE_DIR_NAME, &dp->dp_free_dir); if (err) @@ -185,6 +198,24 @@ dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) dp->dp_meta_objset, obj)); } + if (spa_feature_is_active(dp->dp_spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { + err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_BPTREE_OBJ, sizeof (uint64_t), 1, + &dp->dp_bptree_obj); + if (err != 0) + goto out; + } + + if (spa_feature_is_active(dp->dp_spa, + &spa_feature_table[SPA_FEATURE_EMPTY_BPOBJ])) { + err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_EMPTY_BPOBJ, sizeof (uint64_t), 1, + &dp->dp_empty_bpobj); + if (err != 0) + goto out; + } + err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TMP_USERREFS, sizeof (uint64_t), 1, &dp->dp_tmp_userrefs_obj); @@ -193,15 +224,10 @@ dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp) if (err) goto out; - err = dsl_scan_init(dp, txg); + err = dsl_scan_init(dp, dp->dp_tx.tx_open_txg); out: rw_exit(&dp->dp_config_rwlock); - if (err) - dsl_pool_close(dp); - else - *dpp = dp; - return (err); } @@ -231,9 +257,9 @@ dsl_pool_close(dsl_pool_t *dp) dmu_objset_evict(dp->dp_meta_objset); txg_list_destroy(&dp->dp_dirty_datasets); + txg_list_destroy(&dp->dp_dirty_zilogs); txg_list_destroy(&dp->dp_sync_tasks); txg_list_destroy(&dp->dp_dirty_dirs); - list_destroy(&dp->dp_synced_datasets); arc_flush(dp->dp_spa); txg_fini(dp); @@ -263,7 +289,7 @@ dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg) /* create the pool directory */ err = zap_create_claim(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_OT_OBJECT_DIRECTORY, DMU_OT_NONE, 0, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); /* Initialize scan structures */ VERIFY3U(0, ==, dsl_scan_init(dp, txg)); @@ -313,6 +339,21 @@ dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg) return (dp); } +/* + * Account for the meta-objset space in its placeholder dsl_dir. + */ +void +dsl_pool_mos_diduse_space(dsl_pool_t *dp, + int64_t used, int64_t comp, int64_t uncomp) +{ + ASSERT3U(comp, ==, uncomp); /* it's all metadata */ + mutex_enter(&dp->dp_lock); + dp->dp_mos_used_delta += used; + dp->dp_mos_compressed_delta += comp; + dp->dp_mos_uncompressed_delta += uncomp; + mutex_exit(&dp->dp_lock); +} + static int deadlist_enqueue_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) { @@ -331,11 +372,14 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) dmu_tx_t *tx; dsl_dir_t *dd; dsl_dataset_t *ds; - dsl_sync_task_group_t *dstg; objset_t *mos = dp->dp_meta_objset; hrtime_t start, write_time; uint64_t data_written; int err; + list_t synced_datasets; + + list_create(&synced_datasets, sizeof (dsl_dataset_t), + offsetof(dsl_dataset_t, ds_synced_link)); /* * We need to copy dp_space_towrite() before doing @@ -358,7 +402,7 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) * may sync newly-created datasets on pass 2. */ ASSERT(!list_link_active(&ds->ds_synced_link)); - list_insert_tail(&dp->dp_synced_datasets, ds); + list_insert_tail(&synced_datasets, ds); dsl_dataset_sync(ds, zio, tx); } DTRACE_PROBE(pool_sync__1setup); @@ -368,15 +412,20 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) ASSERT(err == 0); DTRACE_PROBE(pool_sync__2rootzio); - for (ds = list_head(&dp->dp_synced_datasets); ds; - ds = list_next(&dp->dp_synced_datasets, ds)) + /* + * After the data blocks have been written (ensured by the zio_wait() + * above), update the user/group space accounting. + */ + for (ds = list_head(&synced_datasets); ds; + ds = list_next(&synced_datasets, ds)) dmu_objset_do_userquota_updates(ds->ds_objset, tx); /* * Sync the datasets again to push out the changes due to * userspace updates. This must be done before we process the - * sync tasks, because that could cause a snapshot of a dataset - * whose ds_bp will be rewritten when we do this 2nd sync. + * sync tasks, so that any snapshots will have the correct + * user accounting information (and we won't get confused + * about which blocks are part of the snapshot). */ zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED); while (ds = txg_list_remove(&dp->dp_dirty_datasets, txg)) { @@ -387,30 +436,42 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) err = zio_wait(zio); /* - * Move dead blocks from the pending deadlist to the on-disk - * deadlist. + * Now that the datasets have been completely synced, we can + * clean up our in-memory structures accumulated while syncing: + * + * - move dead blocks from the pending deadlist to the on-disk deadlist + * - clean up zil records + * - release hold from dsl_dataset_dirty() */ - for (ds = list_head(&dp->dp_synced_datasets); ds; - ds = list_next(&dp->dp_synced_datasets, ds)) { + while (ds = list_remove_head(&synced_datasets)) { + objset_t *os = ds->ds_objset; bplist_iterate(&ds->ds_pending_deadlist, deadlist_enqueue_cb, &ds->ds_deadlist, tx); + ASSERT(!dmu_objset_is_dirty(os, txg)); + dmu_buf_rele(ds->ds_dbuf, ds); } - while (dstg = txg_list_remove(&dp->dp_sync_tasks, txg)) { - /* - * No more sync tasks should have been added while we - * were syncing. - */ - ASSERT(spa_sync_pass(dp->dp_spa) == 1); - dsl_sync_task_group_sync(dstg, tx); - } - DTRACE_PROBE(pool_sync__3task); - start = gethrtime(); while (dd = txg_list_remove(&dp->dp_dirty_dirs, txg)) dsl_dir_sync(dd, tx); write_time += gethrtime() - start; + /* + * The MOS's space is accounted for in the pool/$MOS + * (dp_mos_dir). We can't modify the mos while we're syncing + * it, so we remember the deltas and apply them here. + */ + if (dp->dp_mos_used_delta != 0 || dp->dp_mos_compressed_delta != 0 || + dp->dp_mos_uncompressed_delta != 0) { + dsl_dir_diduse_space(dp->dp_mos_dir, DD_USED_HEAD, + dp->dp_mos_used_delta, + dp->dp_mos_compressed_delta, + dp->dp_mos_uncompressed_delta, tx); + dp->dp_mos_used_delta = 0; + dp->dp_mos_compressed_delta = 0; + dp->dp_mos_uncompressed_delta = 0; + } + start = gethrtime(); if (list_head(&mos->os_dirty_dnodes[txg & TXG_MASK]) != NULL || list_head(&mos->os_free_dnodes[txg & TXG_MASK]) != NULL) { @@ -426,6 +487,27 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) hrtime_t, dp->dp_read_overhead); write_time -= dp->dp_read_overhead; + /* + * If we modify a dataset in the same txg that we want to destroy it, + * its dsl_dir's dd_dbuf will be dirty, and thus have a hold on it. + * dsl_dir_destroy_check() will fail if there are unexpected holds. + * Therefore, we want to sync the MOS (thus syncing the dd_dbuf + * and clearing the hold on it) before we process the sync_tasks. + * The MOS data dirtied by the sync_tasks will be synced on the next + * pass. + */ + DTRACE_PROBE(pool_sync__3task); + if (!txg_list_empty(&dp->dp_sync_tasks, txg)) { + dsl_sync_task_group_t *dstg; + /* + * No more sync tasks should have been added while we + * were syncing. + */ + ASSERT(spa_sync_pass(dp->dp_spa) == 1); + while (dstg = txg_list_remove(&dp->dp_sync_tasks, txg)) + dsl_sync_task_group_sync(dstg, tx); + } + dmu_tx_commit(tx); dp->dp_space_towrite[txg & TXG_MASK] = 0; @@ -474,15 +556,14 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) void dsl_pool_sync_done(dsl_pool_t *dp, uint64_t txg) { + zilog_t *zilog; dsl_dataset_t *ds; - objset_t *os; - while (ds = list_head(&dp->dp_synced_datasets)) { - list_remove(&dp->dp_synced_datasets, ds); - os = ds->ds_objset; - zil_clean(os->os_zil, txg); - ASSERT(!dmu_objset_is_dirty(os, txg)); - dmu_buf_rele(ds->ds_dbuf, ds); + while (zilog = txg_list_remove(&dp->dp_dirty_zilogs, txg)) { + ds = dmu_objset_ds(zilog->zl_os); + zil_clean(zilog, txg); + ASSERT(!dmu_objset_is_dirty(zilog->zl_os, txg)); + dmu_buf_rele(ds->ds_dbuf, zilog); } ASSERT(!dmu_objset_is_dirty(dp->dp_meta_objset, txg)); } @@ -495,7 +576,7 @@ int dsl_pool_sync_context(dsl_pool_t *dp) { return (curthread == dp->dp_tx.tx_sync_thread || - spa_get_dsl(dp->dp_spa) == NULL); + spa_is_initializing(dp->dp_spa)); } uint64_t @@ -813,11 +894,8 @@ dsl_pool_user_hold_create_obj(dsl_pool_t *dp, dmu_tx_t *tx) ASSERT(dp->dp_tmp_userrefs_obj == 0); ASSERT(dmu_tx_is_syncing(tx)); - dp->dp_tmp_userrefs_obj = zap_create(mos, DMU_OT_USERREFS, - DMU_OT_NONE, 0, tx); - - VERIFY(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TMP_USERREFS, - sizeof (uint64_t), 1, &dp->dp_tmp_userrefs_obj, tx) == 0); + dp->dp_tmp_userrefs_obj = zap_create_link(mos, DMU_OT_USERREFS, + DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_TMP_USERREFS, tx); } static int diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c index 475b494c0..5fc37e210 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -44,6 +45,7 @@ #include #include #include +#include #ifdef _KERNEL #include #endif @@ -56,16 +58,47 @@ static scan_cb_t dsl_scan_remove_cb; static dsl_syncfunc_t dsl_scan_cancel_sync; static void dsl_scan_sync_state(dsl_scan_t *, dmu_tx_t *tx); -int zfs_top_maxinflight = 32; /* maximum I/Os per top-level */ -int zfs_resilver_delay = 2; /* number of ticks to delay resilver */ -int zfs_scrub_delay = 4; /* number of ticks to delay scrub */ -int zfs_scan_idle = 50; /* idle window in clock ticks */ +unsigned int zfs_top_maxinflight = 32; /* maximum I/Os per top-level */ +unsigned int zfs_resilver_delay = 2; /* number of ticks to delay resilver */ +unsigned int zfs_scrub_delay = 4; /* number of ticks to delay scrub */ +unsigned int zfs_scan_idle = 50; /* idle window in clock ticks */ -int zfs_scan_min_time_ms = 1000; /* min millisecs to scrub per txg */ -int zfs_free_min_time_ms = 1000; /* min millisecs to free per txg */ -int zfs_resilver_min_time_ms = 3000; /* min millisecs to resilver per txg */ +unsigned int zfs_scan_min_time_ms = 1000; /* min millisecs to scrub per txg */ +unsigned int zfs_free_min_time_ms = 1000; /* min millisecs to free per txg */ +unsigned int zfs_resilver_min_time_ms = 3000; /* min millisecs to resilver + per txg */ boolean_t zfs_no_scrub_io = B_FALSE; /* set to disable scrub i/o */ boolean_t zfs_no_scrub_prefetch = B_FALSE; /* set to disable srub prefetching */ + +SYSCTL_DECL(_vfs_zfs); +TUNABLE_INT("vfs.zfs.top_maxinflight", &zfs_top_maxinflight); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, top_maxinflight, CTLFLAG_RW, + &zfs_top_maxinflight, 0, "Maximum I/Os per top-level vdev"); +TUNABLE_INT("vfs.zfs.resilver_delay", &zfs_resilver_delay); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, resilver_delay, CTLFLAG_RW, + &zfs_resilver_delay, 0, "Number of ticks to delay resilver"); +TUNABLE_INT("vfs.zfs.scrub_delay", &zfs_scrub_delay); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, scrub_delay, CTLFLAG_RW, + &zfs_scrub_delay, 0, "Number of ticks to delay scrub"); +TUNABLE_INT("vfs.zfs.scan_idle", &zfs_scan_idle); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, scan_idle, CTLFLAG_RW, + &zfs_scan_idle, 0, "Idle scan window in clock ticks"); +TUNABLE_INT("vfs.zfs.scan_min_time_ms", &zfs_scan_min_time_ms); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, scan_min_time_ms, CTLFLAG_RW, + &zfs_scan_min_time_ms, 0, "Min millisecs to scrub per txg"); +TUNABLE_INT("vfs.zfs.free_min_time_ms", &zfs_free_min_time_ms); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, free_min_time_ms, CTLFLAG_RW, + &zfs_free_min_time_ms, 0, "Min millisecs to free per txg"); +TUNABLE_INT("vfs.zfs.resilver_min_time_ms", &zfs_resilver_min_time_ms); +SYSCTL_UINT(_vfs_zfs, OID_AUTO, resilver_min_time_ms, CTLFLAG_RW, + &zfs_resilver_min_time_ms, 0, "Min millisecs to resilver per txg"); +TUNABLE_INT("vfs.zfs.no_scrub_io", &zfs_no_scrub_io); +SYSCTL_INT(_vfs_zfs, OID_AUTO, no_scrub_io, CTLFLAG_RW, + &zfs_no_scrub_io, 0, "Disable scrub I/O"); +TUNABLE_INT("vfs.zfs.no_scrub_prefetch", &zfs_no_scrub_prefetch); +SYSCTL_INT(_vfs_zfs, OID_AUTO, no_scrub_prefetch, CTLFLAG_RW, + &zfs_no_scrub_prefetch, 0, "Disable scrub prefetching"); + enum ddt_class zfs_scrub_ddt_class_max = DDT_CLASS_DUPLICATE; #define DSL_SCAN_IS_SCRUB_RESILVER(scn) \ @@ -381,55 +414,6 @@ dsl_read_nolock(zio_t *pio, spa_t *spa, const blkptr_t *bpp, priority, zio_flags, arc_flags, zb)); } -static boolean_t -bookmark_is_zero(const zbookmark_t *zb) -{ - return (zb->zb_objset == 0 && zb->zb_object == 0 && - zb->zb_level == 0 && zb->zb_blkid == 0); -} - -/* dnp is the dnode for zb1->zb_object */ -static boolean_t -bookmark_is_before(const dnode_phys_t *dnp, const zbookmark_t *zb1, - const zbookmark_t *zb2) -{ - uint64_t zb1nextL0, zb2thisobj; - - ASSERT(zb1->zb_objset == zb2->zb_objset); - ASSERT(zb2->zb_level == 0); - - /* - * A bookmark in the deadlist is considered to be after - * everything else. - */ - if (zb2->zb_object == DMU_DEADLIST_OBJECT) - return (B_TRUE); - - /* The objset_phys_t isn't before anything. */ - if (dnp == NULL) - return (B_FALSE); - - zb1nextL0 = (zb1->zb_blkid + 1) << - ((zb1->zb_level) * (dnp->dn_indblkshift - SPA_BLKPTRSHIFT)); - - zb2thisobj = zb2->zb_object ? zb2->zb_object : - zb2->zb_blkid << (DNODE_BLOCK_SHIFT - DNODE_SHIFT); - - if (zb1->zb_object == DMU_META_DNODE_OBJECT) { - uint64_t nextobj = zb1nextL0 * - (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT) >> DNODE_SHIFT; - return (nextobj <= zb2thisobj); - } - - if (zb1->zb_object < zb2thisobj) - return (B_TRUE); - if (zb1->zb_object > zb2thisobj) - return (B_FALSE); - if (zb2->zb_object == DMU_META_DNODE_OBJECT) - return (B_FALSE); - return (zb1nextL0 <= zb2->zb_blkid); -} - static uint64_t dsl_scan_ds_maxtxg(dsl_dataset_t *ds) { @@ -452,7 +436,7 @@ static boolean_t dsl_scan_check_pause(dsl_scan_t *scn, const zbookmark_t *zb) { uint64_t elapsed_nanosecs; - int mintime; + unsigned int mintime; /* we never skip user/group accounting objects */ if (zb && (int64_t)zb->zb_object < 0) @@ -461,7 +445,7 @@ dsl_scan_check_pause(dsl_scan_t *scn, const zbookmark_t *zb) if (scn->scn_pausing) return (B_TRUE); /* we're already pausing */ - if (!bookmark_is_zero(&scn->scn_phys.scn_bookmark)) + if (!ZB_IS_ZERO(&scn->scn_phys.scn_bookmark)) return (B_FALSE); /* we're resuming */ /* We only know how to resume from level-0 blocks. */ @@ -616,13 +600,13 @@ dsl_scan_check_resume(dsl_scan_t *scn, const dnode_phys_t *dnp, /* * We never skip over user/group accounting objects (obj<0) */ - if (!bookmark_is_zero(&scn->scn_phys.scn_bookmark) && + if (!ZB_IS_ZERO(&scn->scn_phys.scn_bookmark) && (int64_t)zb->zb_object >= 0) { /* * If we already visited this bp & everything below (in * a prior txg sync), don't bother doing it again. */ - if (bookmark_is_before(dnp, zb, &scn->scn_phys.scn_bookmark)) + if (zbookmark_is_before(dnp, zb, &scn->scn_phys.scn_bookmark)) return (B_TRUE); /* @@ -815,22 +799,6 @@ dsl_scan_visitbp(blkptr_t *bp, const zbookmark_t *zb, if (bp->blk_birth <= scn->scn_phys.scn_cur_min_txg) return; - if (BP_GET_TYPE(bp) != DMU_OT_USERGROUP_USED) { - /* - * For non-user-accounting blocks, we need to read the - * new bp (from a deleted snapshot, found in - * check_existing_xlation). If we used the old bp, - * pointers inside this block from before we resumed - * would be untranslated. - * - * For user-accounting blocks, we need to read the old - * bp, because we will apply the entire space delta to - * it (original untranslated -> translations from - * deleted snap -> now). - */ - bp_toread = *bp; - } - if (dsl_scan_recurse(scn, ds, ostype, dnp, &bp_toread, zb, tx, &buf) != 0) return; @@ -1395,19 +1363,28 @@ dsl_scan_visit(dsl_scan_t *scn, dmu_tx_t *tx) zap_cursor_fini(&zc); } -static int -dsl_scan_free_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) +static boolean_t +dsl_scan_free_should_pause(dsl_scan_t *scn) { - dsl_scan_t *scn = arg; uint64_t elapsed_nanosecs; elapsed_nanosecs = gethrtime() - scn->scn_sync_start_time; - - if (elapsed_nanosecs / NANOSEC > zfs_txg_timeout || + return (elapsed_nanosecs / NANOSEC > zfs_txg_timeout || (elapsed_nanosecs / MICROSEC > zfs_free_min_time_ms && txg_sync_waiting(scn->scn_dp)) || - spa_shutting_down(scn->scn_dp->dp_spa)) - return (ERESTART); + spa_shutting_down(scn->scn_dp->dp_spa)); +} + +static int +dsl_scan_free_block_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) +{ + dsl_scan_t *scn = arg; + + if (!scn->scn_is_bptree || + (BP_GET_LEVEL(bp) == 0 && BP_GET_TYPE(bp) != DMU_OT_OBJSET)) { + if (dsl_scan_free_should_pause(scn)) + return (ERESTART); + } zio_nowait(zio_free_sync(scn->scn_zio_root, scn->scn_dp->dp_spa, dmu_tx_get_txg(tx), bp, 0)); @@ -1432,6 +1409,10 @@ dsl_scan_active(dsl_scan_t *scn) if (scn->scn_phys.scn_state == DSS_SCANNING) return (B_TRUE); + if (spa_feature_is_active(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { + return (B_TRUE); + } if (spa_version(scn->scn_dp->dp_spa) >= SPA_VERSION_DEADLISTS) { (void) bpobj_space(&scn->scn_dp->dp_free_bpobj, &used, &comp, &uncomp); @@ -1478,14 +1459,40 @@ dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx) * traversing it. */ if (spa_version(dp->dp_spa) >= SPA_VERSION_DEADLISTS) { + scn->scn_is_bptree = B_FALSE; scn->scn_zio_root = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED); err = bpobj_iterate(&dp->dp_free_bpobj, - dsl_scan_free_cb, scn, tx); + dsl_scan_free_block_cb, scn, tx); VERIFY3U(0, ==, zio_wait(scn->scn_zio_root)); + + if (err == 0 && spa_feature_is_active(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])) { + scn->scn_is_bptree = B_TRUE; + scn->scn_zio_root = zio_root(dp->dp_spa, NULL, + NULL, ZIO_FLAG_MUSTSUCCEED); + err = bptree_iterate(dp->dp_meta_objset, + dp->dp_bptree_obj, B_TRUE, dsl_scan_free_block_cb, + scn, tx); + VERIFY3U(0, ==, zio_wait(scn->scn_zio_root)); + if (err != 0) + return; + + /* disable async destroy feature */ + spa_feature_decr(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY], tx); + ASSERT(!spa_feature_is_active(spa, + &spa_feature_table[SPA_FEATURE_ASYNC_DESTROY])); + VERIFY3U(0, ==, zap_remove(dp->dp_meta_objset, + DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_BPTREE_OBJ, tx)); + VERIFY3U(0, ==, bptree_free(dp->dp_meta_objset, + dp->dp_bptree_obj, tx)); + dp->dp_bptree_obj = 0; + } if (scn->scn_visited_this_txg) { zfs_dbgmsg("freed %llu blocks in %llums from " - "free_bpobj txg %llu", + "free_bpobj/bptree txg %llu", (longlong_t)scn->scn_visited_this_txg, (longlong_t) (gethrtime() - scn->scn_sync_start_time) / MICROSEC, @@ -1600,6 +1607,8 @@ count_block(zfs_all_blkstats_t *zab, const blkptr_t *bp) for (i = 0; i < 4; i++) { int l = (i < 2) ? BP_GET_LEVEL(bp) : DN_MAX_LEVELS; int t = (i & 1) ? BP_GET_TYPE(bp) : DMU_OT_TOTAL; + if (t & DMU_OT_NEWTYPE) + t = DMU_OT_OTHER; zfs_blkstat_t *zb = &zab->zab_type[l][t]; int equal; @@ -1660,7 +1669,7 @@ dsl_scan_scrub_cb(dsl_pool_t *dp, boolean_t needs_io; int zio_flags = ZIO_FLAG_SCAN_THREAD | ZIO_FLAG_RAW | ZIO_FLAG_CANFAIL; int zio_priority; - int scan_delay = 0; + unsigned int scan_delay = 0; if (phys_birth <= scn->scn_phys.scn_min_txg || phys_birth >= scn->scn_phys.scn_max_txg) @@ -1717,7 +1726,8 @@ dsl_scan_scrub_cb(dsl_pool_t *dp, if (needs_io && !zfs_no_scrub_io) { vdev_t *rvd = spa->spa_root_vdev; - uint64_t maxinflight = rvd->vdev_children * zfs_top_maxinflight; + uint64_t maxinflight = rvd->vdev_children * + MAX(zfs_top_maxinflight, 1); void *data = zio_data_buf_alloc(size); mutex_enter(&spa->spa_scrub_lock); @@ -1731,7 +1741,7 @@ dsl_scan_scrub_cb(dsl_pool_t *dp, * then throttle our workload to limit the impact of a scan. */ if (ddi_get_lbolt64() - spa->spa_last_io <= zfs_scan_idle) - delay(scan_delay); + delay(MAX((int)scan_delay, 0)); zio_nowait(zio_read(NULL, spa, bp, data, size, dsl_scan_scrub_done, NULL, zio_priority, diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c index b0818ce27..b4a3798b4 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c @@ -161,7 +161,7 @@ dsl_sync_task_group_sync(dsl_sync_task_group_t *dstg, dmu_tx_t *tx) dsl_pool_t *dp = dstg->dstg_pool; uint64_t quota, used; - ASSERT3U(dstg->dstg_err, ==, 0); + ASSERT0(dstg->dstg_err); /* * Check for sufficient space. We just check against what's @@ -228,12 +228,7 @@ dsl_sync_task_do_nowait(dsl_pool_t *dp, dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc, void *arg1, void *arg2, int blocks_modified, dmu_tx_t *tx) { - dsl_sync_task_group_t *dstg; - - if (!spa_writeable(dp->dp_spa)) - return; - - dstg = dsl_sync_task_group_create(dp); + dsl_sync_task_group_t *dstg = dsl_sync_task_group_create(dp); dsl_sync_task_create(dstg, checkfunc, syncfunc, arg1, arg2, blocks_modified); dsl_sync_task_group_nowait(dstg, tx); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c index 3a289da67..a0723a3c3 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c @@ -775,7 +775,7 @@ metaslab_fini(metaslab_t *msp) for (int t = 0; t < TXG_DEFER_SIZE; t++) space_map_destroy(&msp->ms_defermap[t]); - ASSERT3S(msp->ms_deferspace, ==, 0); + ASSERT0(msp->ms_deferspace); mutex_exit(&msp->ms_lock); mutex_destroy(&msp->ms_lock); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c index 3b160278a..894bbc3d0 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c @@ -18,9 +18,11 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. * Portions Copyright 2011 iXsystems, Inc + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -427,10 +429,9 @@ sa_add_layout_entry(objset_t *os, sa_attr_type_t *attrs, int attr_count, char attr_name[8]; if (sa->sa_layout_attr_obj == 0) { - sa->sa_layout_attr_obj = zap_create(os, - DMU_OT_SA_ATTR_LAYOUTS, DMU_OT_NONE, 0, tx); - VERIFY(zap_add(os, sa->sa_master_obj, SA_LAYOUTS, 8, 1, - &sa->sa_layout_attr_obj, tx) == 0); + sa->sa_layout_attr_obj = zap_create_link(os, + DMU_OT_SA_ATTR_LAYOUTS, + sa->sa_master_obj, SA_LAYOUTS, tx); } (void) snprintf(attr_name, sizeof (attr_name), @@ -1555,10 +1556,9 @@ sa_attr_register_sync(sa_handle_t *hdl, dmu_tx_t *tx) } if (sa->sa_reg_attr_obj == 0) { - sa->sa_reg_attr_obj = zap_create(hdl->sa_os, - DMU_OT_SA_ATTR_REGISTRATION, DMU_OT_NONE, 0, tx); - VERIFY(zap_add(hdl->sa_os, sa->sa_master_obj, - SA_REGISTRY, 8, 1, &sa->sa_reg_attr_obj, tx) == 0); + sa->sa_reg_attr_obj = zap_create_link(hdl->sa_os, + DMU_OT_SA_ATTR_REGISTRATION, + sa->sa_master_obj, SA_REGISTRY, tx); } for (i = 0; i != sa->sa_num_attrs; i++) { if (sa->sa_attr_table[i].sa_registered) diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c index cddf24081..742e6373e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #ifdef _KERNEL @@ -117,7 +118,10 @@ const zio_taskq_info_t zio_taskqs[ZIO_TYPES][ZIO_TASKQ_TYPES] = { { ZTI_ONE, ZTI_NULL, ZTI_ONE, ZTI_NULL }, }; +static dsl_syncfunc_t spa_sync_version; static dsl_syncfunc_t spa_sync_props; +static dsl_checkfunc_t spa_change_guid_check; +static dsl_syncfunc_t spa_change_guid_sync; static boolean_t spa_has_active_shared_spare(spa_t *spa); static int spa_load_impl(spa_t *spa, uint64_t, nvlist_t *config, spa_load_state_t state, spa_import_type_t type, boolean_t mosconfig, @@ -176,6 +180,7 @@ static void spa_prop_get_config(spa_t *spa, nvlist_t **nvp) { vdev_t *rvd = spa->spa_root_vdev; + dsl_pool_t *pool = spa->spa_dsl_pool; uint64_t size; uint64_t alloc; uint64_t space; @@ -222,6 +227,22 @@ spa_prop_get_config(spa_t *spa, nvlist_t **nvp) spa_prop_add_list(*nvp, ZPOOL_PROP_VERSION, NULL, version, src); } + if (pool != NULL) { + dsl_dir_t *freedir = pool->dp_free_dir; + + /* + * The $FREE directory was introduced in SPA_VERSION_DEADLISTS, + * when opening pools before this version freedir will be NULL. + */ + if (freedir != NULL) { + spa_prop_add_list(*nvp, ZPOOL_PROP_FREEING, NULL, + freedir->dd_phys->dd_used_bytes, src); + } else { + spa_prop_add_list(*nvp, ZPOOL_PROP_FREEING, + NULL, 0, src); + } + } + spa_prop_add_list(*nvp, ZPOOL_PROP_GUID, NULL, spa_guid(spa), src); if (spa->spa_comment != NULL) { @@ -361,25 +382,55 @@ spa_prop_validate(spa_t *spa, nvlist_t *props) nvpair_t *elem; int error = 0, reset_bootfs = 0; uint64_t objnum; + boolean_t has_feature = B_FALSE; elem = NULL; while ((elem = nvlist_next_nvpair(props, elem)) != NULL) { - zpool_prop_t prop; - char *propname, *strval; uint64_t intval; - objset_t *os; - char *slash, *check; + char *strval, *slash, *check, *fname; + const char *propname = nvpair_name(elem); + zpool_prop_t prop = zpool_name_to_prop(propname); - propname = nvpair_name(elem); + switch (prop) { + case ZPROP_INVAL: + if (!zpool_prop_feature(propname)) { + error = EINVAL; + break; + } - if ((prop = zpool_name_to_prop(propname)) == ZPROP_INVAL) - return (EINVAL); + /* + * Sanitize the input. + */ + if (nvpair_type(elem) != DATA_TYPE_UINT64) { + error = EINVAL; + break; + } + + if (nvpair_value_uint64(elem, &intval) != 0) { + error = EINVAL; + break; + } + + if (intval != 0) { + error = EINVAL; + break; + } + + fname = strchr(propname, '@') + 1; + if (zfeature_lookup_name(fname, NULL) != 0) { + error = EINVAL; + break; + } + + has_feature = B_TRUE; + break; - switch (prop) { case ZPOOL_PROP_VERSION: error = nvpair_value_uint64(elem, &intval); if (!error && - (intval < spa_version(spa) || intval > SPA_VERSION)) + (intval < spa_version(spa) || + intval > SPA_VERSION_BEFORE_FEATURES || + has_feature)) error = EINVAL; break; @@ -416,6 +467,7 @@ spa_prop_validate(spa_t *spa, nvlist_t *props) error = nvpair_value_string(elem, &strval); if (!error) { + objset_t *os; uint64_t compress; if (strval == NULL || strval[0] == '\0') { @@ -565,33 +617,58 @@ int spa_prop_set(spa_t *spa, nvlist_t *nvp) { int error; - nvpair_t *elem; + nvpair_t *elem = NULL; boolean_t need_sync = B_FALSE; - zpool_prop_t prop; if ((error = spa_prop_validate(spa, nvp)) != 0) return (error); - elem = NULL; while ((elem = nvlist_next_nvpair(nvp, elem)) != NULL) { - if ((prop = zpool_name_to_prop( - nvpair_name(elem))) == ZPROP_INVAL) - return (EINVAL); + zpool_prop_t prop = zpool_name_to_prop(nvpair_name(elem)); if (prop == ZPOOL_PROP_CACHEFILE || prop == ZPOOL_PROP_ALTROOT || prop == ZPOOL_PROP_READONLY) continue; + if (prop == ZPOOL_PROP_VERSION || prop == ZPROP_INVAL) { + uint64_t ver; + + if (prop == ZPOOL_PROP_VERSION) { + VERIFY(nvpair_value_uint64(elem, &ver) == 0); + } else { + ASSERT(zpool_prop_feature(nvpair_name(elem))); + ver = SPA_VERSION_FEATURES; + need_sync = B_TRUE; + } + + /* Save time if the version is already set. */ + if (ver == spa_version(spa)) + continue; + + /* + * In addition to the pool directory object, we might + * create the pool properties object, the features for + * read object, the features for write object, or the + * feature descriptions object. + */ + error = dsl_sync_task_do(spa_get_dsl(spa), NULL, + spa_sync_version, spa, &ver, 6); + if (error) + return (error); + continue; + } + need_sync = B_TRUE; break; } - if (need_sync) + if (need_sync) { return (dsl_sync_task_do(spa_get_dsl(spa), NULL, spa_sync_props, - spa, nvp, 3)); - else - return (0); + spa, nvp, 6)); + } + + return (0); } /* @@ -608,6 +685,56 @@ spa_prop_clear_bootfs(spa_t *spa, uint64_t dsobj, dmu_tx_t *tx) } } +/*ARGSUSED*/ +static int +spa_change_guid_check(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + uint64_t *newguid = arg2; + vdev_t *rvd = spa->spa_root_vdev; + uint64_t vdev_state; + + spa_config_enter(spa, SCL_STATE, FTAG, RW_READER); + vdev_state = rvd->vdev_state; + spa_config_exit(spa, SCL_STATE, FTAG); + + if (vdev_state != VDEV_STATE_HEALTHY) + return (ENXIO); + + ASSERT3U(spa_guid(spa), !=, *newguid); + + return (0); +} + +static void +spa_change_guid_sync(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + uint64_t *newguid = arg2; + uint64_t oldguid; + vdev_t *rvd = spa->spa_root_vdev; + + oldguid = spa_guid(spa); + + spa_config_enter(spa, SCL_STATE, FTAG, RW_READER); + rvd->vdev_guid = *newguid; + rvd->vdev_guid_sum += (*newguid - oldguid); + vdev_config_dirty(rvd); + spa_config_exit(spa, SCL_STATE, FTAG); + +#ifdef __FreeBSD__ + /* + * TODO: until recent illumos logging changes are merged + * log reguid as pool property change + */ + spa_history_log_internal(LOG_POOL_PROPSET, spa, tx, + "guid change old=%llu new=%llu", oldguid, *newguid); +#else + spa_history_log_internal(spa, "guid change", tx, "old=%lld new=%lld", + oldguid, *newguid); +#endif +} + /* * Change the GUID for the pool. This is done so that we can later * re-import a pool built from a clone of our own vdevs. We will modify @@ -620,29 +747,23 @@ spa_prop_clear_bootfs(spa_t *spa, uint64_t dsobj, dmu_tx_t *tx) int spa_change_guid(spa_t *spa) { - uint64_t oldguid, newguid; - uint64_t txg; - - if (!(spa_mode_global & FWRITE)) - return (EROFS); - - txg = spa_vdev_enter(spa); - - if (spa->spa_root_vdev->vdev_state != VDEV_STATE_HEALTHY) - return (spa_vdev_exit(spa, NULL, txg, ENXIO)); + int error; + uint64_t guid; - oldguid = spa_guid(spa); - newguid = spa_generate_guid(NULL); - ASSERT3U(oldguid, !=, newguid); + mutex_enter(&spa_namespace_lock); + guid = spa_generate_guid(NULL); - spa->spa_root_vdev->vdev_guid = newguid; - spa->spa_root_vdev->vdev_guid_sum += (newguid - oldguid); + error = dsl_sync_task_do(spa_get_dsl(spa), spa_change_guid_check, + spa_change_guid_sync, spa, &guid, 5); - vdev_config_dirty(spa->spa_root_vdev); + if (error == 0) { + spa_config_sync(spa, B_FALSE, B_TRUE); + spa_event_notify(spa, NULL, ESC_ZFS_POOL_REGUID); + } - spa_event_notify(spa, NULL, ESC_ZFS_POOL_REGUID); + mutex_exit(&spa_namespace_lock); - return (spa_vdev_exit(spa, NULL, txg, 0)); + return (error); } /* @@ -1630,7 +1751,7 @@ spa_load_verify_done(zio_t *zio) int error = zio->io_error; if (error) { - if ((BP_GET_LEVEL(bp) != 0 || dmu_ot[type].ot_metadata) && + if ((BP_GET_LEVEL(bp) != 0 || DMU_OT_IS_METADATA(type)) && type != DMU_OT_INTENT_LOG) atomic_add_64(&sle->sle_meta_count, 1); else @@ -1860,6 +1981,9 @@ spa_load(spa_t *spa, spa_load_state_t state, spa_import_type_t type, KM_SLEEP) == 0); } + nvlist_free(spa->spa_load_info); + spa->spa_load_info = fnvlist_alloc(); + gethrestime(&spa->spa_loaded_ts); error = spa_load_impl(spa, pool_guid, config, state, type, mosconfig, &ereport); @@ -1892,12 +2016,14 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, { int error = 0; nvlist_t *nvroot = NULL; + nvlist_t *label; vdev_t *rvd; uberblock_t *ub = &spa->spa_uberblock; uint64_t children, config_cache_txg = spa->spa_config_txg; int orig_mode = spa->spa_mode; int parse; uint64_t obj; + boolean_t missing_feat_write = B_FALSE; /* * If this is an untrusted config, access the pool in read-only mode. @@ -1977,19 +2103,78 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, /* * Find the best uberblock. */ - vdev_uberblock_load(NULL, rvd, ub); + vdev_uberblock_load(rvd, ub, &label); /* * If we weren't able to find a single valid uberblock, return failure. */ - if (ub->ub_txg == 0) + if (ub->ub_txg == 0) { + nvlist_free(label); return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, ENXIO)); + } /* - * If the pool is newer than the code, we can't open it. + * If the pool has an unsupported version we can't open it. */ - if (ub->ub_version > SPA_VERSION) + if (!SPA_VERSION_IS_SUPPORTED(ub->ub_version)) { + nvlist_free(label); return (spa_vdev_err(rvd, VDEV_AUX_VERSION_NEWER, ENOTSUP)); + } + + if (ub->ub_version >= SPA_VERSION_FEATURES) { + nvlist_t *features; + + /* + * If we weren't able to find what's necessary for reading the + * MOS in the label, return failure. + */ + if (label == NULL || nvlist_lookup_nvlist(label, + ZPOOL_CONFIG_FEATURES_FOR_READ, &features) != 0) { + nvlist_free(label); + return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, + ENXIO)); + } + + /* + * Update our in-core representation with the definitive values + * from the label. + */ + nvlist_free(spa->spa_label_features); + VERIFY(nvlist_dup(features, &spa->spa_label_features, 0) == 0); + } + + nvlist_free(label); + + /* + * Look through entries in the label nvlist's features_for_read. If + * there is a feature listed there which we don't understand then we + * cannot open a pool. + */ + if (ub->ub_version >= SPA_VERSION_FEATURES) { + nvlist_t *unsup_feat; + + VERIFY(nvlist_alloc(&unsup_feat, NV_UNIQUE_NAME, KM_SLEEP) == + 0); + + for (nvpair_t *nvp = nvlist_next_nvpair(spa->spa_label_features, + NULL); nvp != NULL; + nvp = nvlist_next_nvpair(spa->spa_label_features, nvp)) { + if (!zfeature_is_supported(nvpair_name(nvp))) { + VERIFY(nvlist_add_string(unsup_feat, + nvpair_name(nvp), "") == 0); + } + } + + if (!nvlist_empty(unsup_feat)) { + VERIFY(nvlist_add_nvlist(spa->spa_load_info, + ZPOOL_CONFIG_UNSUP_FEAT, unsup_feat) == 0); + nvlist_free(unsup_feat); + return (spa_vdev_err(rvd, VDEV_AUX_UNSUP_FEAT, + ENOTSUP)); + } + + nvlist_free(unsup_feat); + } /* * If the vdev guid sum doesn't match the uberblock, we have an @@ -2023,7 +2208,7 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, spa->spa_claim_max_txg = spa->spa_first_txg; spa->spa_prev_software_version = ub->ub_software_version; - error = dsl_pool_open(spa, spa->spa_first_txg, &spa->spa_dsl_pool); + error = dsl_pool_init(spa, spa->spa_first_txg, &spa->spa_dsl_pool); if (error) return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); spa->spa_meta_objset = spa->spa_dsl_pool->dp_meta_objset; @@ -2031,6 +2216,89 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, if (spa_dir_prop(spa, DMU_POOL_CONFIG, &spa->spa_config_object) != 0) return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); + if (spa_version(spa) >= SPA_VERSION_FEATURES) { + boolean_t missing_feat_read = B_FALSE; + nvlist_t *unsup_feat, *enabled_feat; + + if (spa_dir_prop(spa, DMU_POOL_FEATURES_FOR_READ, + &spa->spa_feat_for_read_obj) != 0) { + return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); + } + + if (spa_dir_prop(spa, DMU_POOL_FEATURES_FOR_WRITE, + &spa->spa_feat_for_write_obj) != 0) { + return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); + } + + if (spa_dir_prop(spa, DMU_POOL_FEATURE_DESCRIPTIONS, + &spa->spa_feat_desc_obj) != 0) { + return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); + } + + enabled_feat = fnvlist_alloc(); + unsup_feat = fnvlist_alloc(); + + if (!feature_is_supported(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_desc_obj, + unsup_feat, enabled_feat)) + missing_feat_read = B_TRUE; + + if (spa_writeable(spa) || state == SPA_LOAD_TRYIMPORT) { + if (!feature_is_supported(spa->spa_meta_objset, + spa->spa_feat_for_write_obj, spa->spa_feat_desc_obj, + unsup_feat, enabled_feat)) { + missing_feat_write = B_TRUE; + } + } + + fnvlist_add_nvlist(spa->spa_load_info, + ZPOOL_CONFIG_ENABLED_FEAT, enabled_feat); + + if (!nvlist_empty(unsup_feat)) { + fnvlist_add_nvlist(spa->spa_load_info, + ZPOOL_CONFIG_UNSUP_FEAT, unsup_feat); + } + + fnvlist_free(enabled_feat); + fnvlist_free(unsup_feat); + + if (!missing_feat_read) { + fnvlist_add_boolean(spa->spa_load_info, + ZPOOL_CONFIG_CAN_RDONLY); + } + + /* + * If the state is SPA_LOAD_TRYIMPORT, our objective is + * twofold: to determine whether the pool is available for + * import in read-write mode and (if it is not) whether the + * pool is available for import in read-only mode. If the pool + * is available for import in read-write mode, it is displayed + * as available in userland; if it is not available for import + * in read-only mode, it is displayed as unavailable in + * userland. If the pool is available for import in read-only + * mode but not read-write mode, it is displayed as unavailable + * in userland with a special note that the pool is actually + * available for open in read-only mode. + * + * As a result, if the state is SPA_LOAD_TRYIMPORT and we are + * missing a feature for write, we must first determine whether + * the pool can be opened read-only before returning to + * userland in order to know whether to display the + * abovementioned note. + */ + if (missing_feat_read || (missing_feat_write && + spa_writeable(spa))) { + return (spa_vdev_err(rvd, VDEV_AUX_UNSUP_FEAT, + ENOTSUP)); + } + } + + spa->spa_is_initializing = B_TRUE; + error = dsl_pool_open(spa->spa_dsl_pool); + spa->spa_is_initializing = B_FALSE; + if (error != 0) + return (spa_vdev_err(rvd, VDEV_AUX_CORRUPT_DATA, EIO)); + if (!mosconfig) { uint64_t hostid; nvlist_t *policy = NULL, *nvconfig; @@ -2248,7 +2516,7 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, nvlist_free(nvconfig); /* - * Now that we've validate the config, check the state of the + * Now that we've validated the config, check the state of the * root vdev. If it can't be opened, it indicates one or * more toplevel vdevs are faulted. */ @@ -2261,6 +2529,17 @@ spa_load_impl(spa_t *spa, uint64_t pool_guid, nvlist_t *config, } } + if (missing_feat_write) { + ASSERT(state == SPA_LOAD_TRYIMPORT); + + /* + * At this point, we know that we can open the pool in + * read-only mode but not read-write mode. We now have enough + * information and can return to userland. + */ + return (spa_vdev_err(rvd, VDEV_AUX_UNSUP_FEAT, ENOTSUP)); + } + /* * We've successfully opened the pool, verify that we're ready * to start pushing transactions. @@ -2370,10 +2649,18 @@ spa_load_retry(spa_t *spa, spa_load_state_t state, int mosconfig) return (spa_load(spa, state, SPA_IMPORT_EXISTING, mosconfig)); } +/* + * If spa_load() fails this function will try loading prior txg's. If + * 'state' is SPA_LOAD_RECOVER and one of these loads succeeds the pool + * will be rewound to that txg. If 'state' is not SPA_LOAD_RECOVER this + * function will not rewind the pool and will return the same error as + * spa_load(). + */ static int spa_load_best(spa_t *spa, spa_load_state_t state, int mosconfig, uint64_t max_request, int rewind_flags) { + nvlist_t *loadinfo = NULL; nvlist_t *config = NULL; int load_error, rewind_error; uint64_t safe_rewind_txg; @@ -2402,9 +2689,18 @@ spa_load_best(spa_t *spa, spa_load_state_t state, int mosconfig, return (load_error); } - /* Price of rolling back is discarding txgs, including log */ - if (state == SPA_LOAD_RECOVER) + if (state == SPA_LOAD_RECOVER) { + /* Price of rolling back is discarding txgs, including log */ spa_set_log_state(spa, SPA_LOG_CLEAR); + } else { + /* + * If we aren't rolling back save the load info from our first + * import attempt so that we can restore it after attempting + * to rewind. + */ + loadinfo = spa->spa_load_info; + spa->spa_load_info = fnvlist_alloc(); + } spa->spa_load_max_txg = spa->spa_last_ubsync_txg; safe_rewind_txg = spa->spa_last_ubsync_txg - TXG_DEFER_SIZE; @@ -2428,7 +2724,20 @@ spa_load_best(spa_t *spa, spa_load_state_t state, int mosconfig, if (config && (rewind_error || state != SPA_LOAD_RECOVER)) spa_config_set(spa, config); - return (state == SPA_LOAD_RECOVER ? rewind_error : load_error); + if (state == SPA_LOAD_RECOVER) { + ASSERT3P(loadinfo, ==, NULL); + return (rewind_error); + } else { + /* Store the rewind info as part of the initial load info */ + fnvlist_add_nvlist(loadinfo, ZPOOL_CONFIG_REWIND_INFO, + spa->spa_load_info); + + /* Restore the initial load info */ + fnvlist_free(spa->spa_load_info); + spa->spa_load_info = loadinfo; + + return (load_error); + } } /* @@ -2707,8 +3016,50 @@ spa_add_l2cache(spa_t *spa, nvlist_t *config) } } +static void +spa_add_feature_stats(spa_t *spa, nvlist_t *config) +{ + nvlist_t *features; + zap_cursor_t zc; + zap_attribute_t za; + + ASSERT(spa_config_held(spa, SCL_CONFIG, RW_READER)); + VERIFY(nvlist_alloc(&features, NV_UNIQUE_NAME, KM_SLEEP) == 0); + + if (spa->spa_feat_for_read_obj != 0) { + for (zap_cursor_init(&zc, spa->spa_meta_objset, + spa->spa_feat_for_read_obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + ASSERT(za.za_integer_length == sizeof (uint64_t) && + za.za_num_integers == 1); + VERIFY3U(0, ==, nvlist_add_uint64(features, za.za_name, + za.za_first_integer)); + } + zap_cursor_fini(&zc); + } + + if (spa->spa_feat_for_write_obj != 0) { + for (zap_cursor_init(&zc, spa->spa_meta_objset, + spa->spa_feat_for_write_obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + ASSERT(za.za_integer_length == sizeof (uint64_t) && + za.za_num_integers == 1); + VERIFY3U(0, ==, nvlist_add_uint64(features, za.za_name, + za.za_first_integer)); + } + zap_cursor_fini(&zc); + } + + VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_FEATURE_STATS, + features) == 0); + nvlist_free(features); +} + int -spa_get_stats(const char *name, nvlist_t **config, char *altroot, size_t buflen) +spa_get_stats(const char *name, nvlist_t **config, + char *altroot, size_t buflen) { int error; spa_t *spa; @@ -2743,6 +3094,7 @@ spa_get_stats(const char *name, nvlist_t **config, char *altroot, size_t buflen) spa_add_spares(spa, *config); spa_add_l2cache(spa, *config); + spa_add_feature_stats(spa, *config); } } @@ -2963,6 +3315,7 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props, nvlist_t **spares, **l2cache; uint_t nspares, nl2cache; uint64_t version, obj; + boolean_t has_features; /* * If this pool already exists, return failure. @@ -2988,10 +3341,18 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props, return (error); } - if (nvlist_lookup_uint64(props, zpool_prop_to_name(ZPOOL_PROP_VERSION), - &version) != 0) + has_features = B_FALSE; + for (nvpair_t *elem = nvlist_next_nvpair(props, NULL); + elem != NULL; elem = nvlist_next_nvpair(props, elem)) { + if (zpool_prop_feature(nvpair_name(elem))) + has_features = B_TRUE; + } + + if (has_features || nvlist_lookup_uint64(props, + zpool_prop_to_name(ZPOOL_PROP_VERSION), &version) != 0) { version = SPA_VERSION; - ASSERT(version <= SPA_VERSION); + } + ASSERT(SPA_VERSION_IS_SUPPORTED(version)); spa->spa_first_txg = txg; spa->spa_uberblock.ub_txg = txg - 1; @@ -3067,8 +3428,10 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props, spa->spa_l2cache.sav_sync = B_TRUE; } + spa->spa_is_initializing = B_TRUE; spa->spa_dsl_pool = dp = dsl_pool_create(spa, zplprops, txg); spa->spa_meta_objset = dp->dp_meta_objset; + spa->spa_is_initializing = B_FALSE; /* * Create DDTs (dedup tables). @@ -3092,6 +3455,9 @@ spa_create(const char *pool, nvlist_t *nvroot, nvlist_t *props, cmn_err(CE_PANIC, "failed to add pool config"); } + if (spa_version(spa) >= SPA_VERSION_FEATURES) + spa_feature_create_zap_objects(spa, tx); + if (zap_add(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_CREATION_VERSION, sizeof (uint64_t), 1, &version, tx) != 0) { @@ -3283,7 +3649,7 @@ spa_import_rootpool(char *devpath, char *devid) } #endif if (config == NULL) { - cmn_err(CE_NOTE, "Can not read the pool label from '%s'", + cmn_err(CE_NOTE, "Cannot read the pool label from '%s'", devpath); return (EIO); } @@ -3726,6 +4092,8 @@ spa_tryimport(nvlist_t *tryconfig) state) == 0); VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_TIMESTAMP, spa->spa_uberblock.ub_timestamp) == 0); + VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_LOAD_INFO, + spa->spa_load_info) == 0); /* * If the bootfs property exists on this pool then we @@ -4835,7 +5203,7 @@ spa_vdev_remove_evacuate(spa_t *spa, vdev_t *vd) * The evacuation succeeded. Remove any remaining MOS metadata * associated with this vdev, and wait for these changes to sync. */ - ASSERT3U(vd->vdev_stat.vs_alloc, ==, 0); + ASSERT0(vd->vdev_stat.vs_alloc); txg = spa_vdev_config_enter(spa); vd->vdev_removing = B_TRUE; vdev_dirty(vd, 0, NULL, txg); @@ -5451,7 +5819,7 @@ spa_sync_nvlist(spa_t *spa, uint64_t obj, nvlist_t *nv, dmu_tx_t *tx) * information. This avoids the dbuf_will_dirty() path and * saves us a pre-read to get data we don't actually care about. */ - bufsize = P2ROUNDUP(nvsize, SPA_CONFIG_BLOCKSIZE); + bufsize = P2ROUNDUP((uint64_t)nvsize, SPA_CONFIG_BLOCKSIZE); packed = kmem_alloc(bufsize, KM_SLEEP); VERIFY(nvlist_pack(nv, &packed, &nvsize, NV_ENCODE_XDR, @@ -5527,6 +5895,14 @@ spa_sync_config_object(spa_t *spa, dmu_tx_t *tx) config = spa_config_generate(spa, spa->spa_root_vdev, dmu_tx_get_txg(tx), B_FALSE); + /* + * If we're upgrading the spa version then make sure that + * the config object gets updated with the correct version. + */ + if (spa->spa_ubsync.ub_version < spa->spa_uberblock.ub_version) + fnvlist_add_uint64(config, ZPOOL_CONFIG_VERSION, + spa->spa_uberblock.ub_version); + spa_config_exit(spa, SCL_STATE, FTAG); if (spa->spa_config_syncing) @@ -5536,6 +5912,24 @@ spa_sync_config_object(spa_t *spa, dmu_tx_t *tx) spa_sync_nvlist(spa, spa->spa_config_object, config, tx); } +static void +spa_sync_version(void *arg1, void *arg2, dmu_tx_t *tx) +{ + spa_t *spa = arg1; + uint64_t version = *(uint64_t *)arg2; + + /* + * Setting the version is special cased when first creating the pool. + */ + ASSERT(tx->tx_txg != TXG_INITIAL); + + ASSERT(version <= SPA_VERSION); + ASSERT(version >= spa_version(spa)); + + spa->spa_uberblock.ub_version = version; + vdev_config_dirty(spa->spa_root_vdev); +} + /* * Set zpool properties. */ @@ -5545,32 +5939,38 @@ spa_sync_props(void *arg1, void *arg2, dmu_tx_t *tx) spa_t *spa = arg1; objset_t *mos = spa->spa_meta_objset; nvlist_t *nvp = arg2; - nvpair_t *elem; - uint64_t intval; - char *strval; - zpool_prop_t prop; - const char *propname; - zprop_type_t proptype; + nvpair_t *elem = NULL; mutex_enter(&spa->spa_props_lock); - elem = NULL; while ((elem = nvlist_next_nvpair(nvp, elem))) { + uint64_t intval; + char *strval, *fname; + zpool_prop_t prop; + const char *propname; + zprop_type_t proptype; + zfeature_info_t *feature; + switch (prop = zpool_name_to_prop(nvpair_name(elem))) { + case ZPROP_INVAL: + /* + * We checked this earlier in spa_prop_validate(). + */ + ASSERT(zpool_prop_feature(nvpair_name(elem))); + + fname = strchr(nvpair_name(elem), '@') + 1; + VERIFY3U(0, ==, zfeature_lookup_name(fname, &feature)); + + spa_feature_enable(spa, feature, tx); + break; + case ZPOOL_PROP_VERSION: + VERIFY(nvpair_value_uint64(elem, &intval) == 0); /* - * Only set version for non-zpool-creation cases - * (set/import). spa_create() needs special care - * for version setting. + * The version is synced seperatly before other + * properties and should be correct by now. */ - if (tx->tx_txg != TXG_INITIAL) { - VERIFY(nvpair_value_uint64(elem, - &intval) == 0); - ASSERT(intval <= SPA_VERSION); - ASSERT(intval >= spa_version(spa)); - spa->spa_uberblock.ub_version = intval; - vdev_config_dirty(spa->spa_root_vdev); - } + ASSERT3U(spa_version(spa), >=, intval); break; case ZPOOL_PROP_ALTROOT: @@ -5607,14 +6007,10 @@ spa_sync_props(void *arg1, void *arg2, dmu_tx_t *tx) * Set pool property values in the poolprops mos object. */ if (spa->spa_pool_props_object == 0) { - VERIFY((spa->spa_pool_props_object = - zap_create(mos, DMU_OT_POOL_PROPS, - DMU_OT_NONE, 0, tx)) > 0); - - VERIFY(zap_update(mos, + spa->spa_pool_props_object = + zap_create_link(mos, DMU_OT_POOL_PROPS, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_PROPS, - 8, 1, &spa->spa_pool_props_object, tx) - == 0); + tx); } /* normalize the property name */ @@ -5713,6 +6109,11 @@ spa_sync_upgrades(spa_t *spa, dmu_tx_t *tx) /* Keeping the freedir open increases spa_minref */ spa->spa_minref += 3; } + + if (spa->spa_ubsync.ub_version < SPA_VERSION_FEATURES && + spa->spa_uberblock.ub_version >= SPA_VERSION_FEATURES) { + spa_feature_create_zap_objects(spa, tx); + } } /* @@ -5803,7 +6204,7 @@ spa_sync(spa_t *spa, uint64_t txg) zio_t *zio = zio_root(spa, NULL, NULL, 0); VERIFY3U(bpobj_iterate(defer_bpo, spa_free_sync_cb, zio, tx), ==, 0); - VERIFY3U(zio_wait(zio), ==, 0); + VERIFY0(zio_wait(zio)); } /* @@ -5883,6 +6284,9 @@ spa_sync(spa_t *spa, uint64_t txg) rvd->vdev_children, txg, B_TRUE); } + if (error == 0) + spa->spa_last_synced_guid = rvd->vdev_guid; + spa_config_exit(spa, SCL_STATE, FTAG); if (error == 0) diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c index a07223352..3dae0ba1e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c @@ -22,7 +22,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -35,6 +35,7 @@ #include #include #include +#include #ifdef _KERNEL #include #include @@ -407,6 +408,12 @@ spa_config_generate(spa_t *spa, vdev_t *vd, uint64_t txg, int getstats) VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_VDEV_TREE, nvroot) == 0); nvlist_free(nvroot); + /* + * Store what's necessary for reading the MOS in the label. + */ + VERIFY(nvlist_add_nvlist(config, ZPOOL_CONFIG_FEATURES_FOR_READ, + spa->spa_label_features) == 0); + if (getstats && spa_load_state(spa) == SPA_LOAD_NONE) { ddt_histogram_t *ddh; ddt_stat_t *dds; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c index a853de76e..5cd5a9dc7 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c @@ -306,6 +306,9 @@ spa_history_log(spa_t *spa, const char *history_str, history_log_type_t what) ASSERT(what != LOG_INTERNAL); + if (spa_version(spa) < SPA_VERSION_ZPOOL_HISTORY || !spa_writeable(spa)) + return (EINVAL); + tx = dmu_tx_create_dd(spa_get_dsl(spa)->dp_mos_dir); err = dmu_tx_assign(tx, TXG_WAIT); if (err) { @@ -435,8 +438,9 @@ log_internal(history_internal_events_t event, spa_t *spa, /* * If this is part of creating a pool, not everything is * initialized yet, so don't bother logging the internal events. + * Likewise if the pool is not writeable. */ - if (tx->tx_txg == TXG_INITIAL) + if (tx->tx_txg == TXG_INITIAL || !spa_writeable(spa)) return; va_copy(adx2, adx); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c index 63424527b..ae58ae939 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. */ @@ -48,6 +48,7 @@ #include #include #include "zfs_prop.h" +#include "zfeature_common.h" /* * SPA locking @@ -216,7 +217,7 @@ * Like spa_vdev_enter/exit, these are convenience wrappers -- the actual * locking is, always, based on spa_namespace_lock and spa_config_lock[]. * - * spa_rename() is also implemented within this file since is requires + * spa_rename() is also implemented within this file since it requires * manipulation of the namespace. */ @@ -487,8 +488,22 @@ spa_add(const char *name, nvlist_t *config, const char *altroot) VERIFY(nvlist_alloc(&spa->spa_load_info, NV_UNIQUE_NAME, KM_SLEEP) == 0); - if (config != NULL) + if (config != NULL) { + nvlist_t *features; + + if (nvlist_lookup_nvlist(config, ZPOOL_CONFIG_FEATURES_FOR_READ, + &features) == 0) { + VERIFY(nvlist_dup(features, &spa->spa_label_features, + 0) == 0); + } + VERIFY(nvlist_dup(config, &spa->spa_config, 0) == 0); + } + + if (spa->spa_label_features == NULL) { + VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME, + KM_SLEEP) == 0); + } return (spa); } @@ -525,6 +540,7 @@ spa_remove(spa_t *spa) list_destroy(&spa->spa_config_list); + nvlist_free(spa->spa_label_features); nvlist_free(spa->spa_load_info); spa_config_set(spa, NULL); @@ -1033,6 +1049,20 @@ spa_vdev_state_exit(spa_t *spa, vdev_t *vd, int error) * ========================================================================== */ +void +spa_activate_mos_feature(spa_t *spa, const char *feature) +{ + (void) nvlist_add_boolean(spa->spa_label_features, feature); + vdev_config_dirty(spa->spa_root_vdev); +} + +void +spa_deactivate_mos_feature(spa_t *spa, const char *feature) +{ + (void) nvlist_remove_all(spa->spa_label_features, feature); + vdev_config_dirty(spa->spa_root_vdev); +} + /* * Rename a spa_t. */ @@ -1183,12 +1213,22 @@ spa_generate_guid(spa_t *spa) void sprintf_blkptr(char *buf, const blkptr_t *bp) { - char *type = NULL; + char type[256]; char *checksum = NULL; char *compress = NULL; if (bp != NULL) { - type = dmu_ot[BP_GET_TYPE(bp)].ot_name; + if (BP_GET_TYPE(bp) & DMU_OT_NEWTYPE) { + dmu_object_byteswap_t bswap = + DMU_OT_BYTESWAP(BP_GET_TYPE(bp)); + (void) snprintf(type, sizeof (type), "bswap %s %s", + DMU_OT_IS_METADATA(BP_GET_TYPE(bp)) ? + "metadata" : "data", + dmu_ot_byteswap[bswap].ob_name); + } else { + (void) strlcpy(type, dmu_ot[BP_GET_TYPE(bp)].ot_name, + sizeof (type)); + } checksum = zio_checksum_table[BP_GET_CHECKSUM(bp)].ci_name; compress = zio_compress_table[BP_GET_COMPRESS(bp)].ci_name; } @@ -1270,6 +1310,12 @@ spa_get_dsl(spa_t *spa) return (spa->spa_dsl_pool); } +boolean_t +spa_is_initializing(spa_t *spa) +{ + return (spa->spa_is_initializing); +} + blkptr_t * spa_get_rootblkptr(spa_t *spa) { @@ -1306,16 +1352,29 @@ spa_name(spa_t *spa) uint64_t spa_guid(spa_t *spa) { + dsl_pool_t *dp = spa_get_dsl(spa); + uint64_t guid; + /* * If we fail to parse the config during spa_load(), we can go through * the error path (which posts an ereport) and end up here with no root * vdev. We stash the original pool guid in 'spa_config_guid' to handle * this case. */ - if (spa->spa_root_vdev != NULL) + if (spa->spa_root_vdev == NULL) + return (spa->spa_config_guid); + + guid = spa->spa_last_synced_guid != 0 ? + spa->spa_last_synced_guid : spa->spa_root_vdev->vdev_guid; + + /* + * Return the most recently synced out guid unless we're + * in syncing context. + */ + if (dp && dsl_pool_sync_context(dp)) return (spa->spa_root_vdev->vdev_guid); else - return (spa->spa_config_guid); + return (guid); } uint64_t @@ -1545,6 +1604,19 @@ spa_init(int mode) spa_mode_global = mode; +#ifdef illumos +#ifndef _KERNEL + if (spa_mode_global != FREAD && dprintf_find_string("watch")) { + arc_procfd = open("/proc/self/ctl", O_WRONLY); + if (arc_procfd == -1) { + perror("could not enable watchpoints: " + "opening /proc/self/ctl failed: "); + } else { + arc_watch = B_TRUE; + } + } +#endif +#endif /* illumos */ refcount_sysinit(); unique_init(); zio_init(); @@ -1553,6 +1625,7 @@ spa_init(int mode) vdev_cache_stat_init(); zfs_prop_init(); zpool_prop_init(); + zpool_feature_init(); spa_config_load(); l2arc_start(); } diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c index 51dd149ad..f5c79d193 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c @@ -22,6 +22,9 @@ * Copyright 2009 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ #include #include @@ -73,7 +76,7 @@ void space_map_destroy(space_map_t *sm) { ASSERT(!sm->sm_loaded && !sm->sm_loading); - VERIFY3U(sm->sm_space, ==, 0); + VERIFY0(sm->sm_space); avl_destroy(&sm->sm_root); cv_destroy(&sm->sm_load_cv); } @@ -285,7 +288,7 @@ space_map_load(space_map_t *sm, space_map_ops_t *ops, uint8_t maptype, space = smo->smo_alloc; ASSERT(sm->sm_ops == NULL); - VERIFY3U(sm->sm_space, ==, 0); + VERIFY0(sm->sm_space); if (maptype == SM_FREE) { space_map_add(sm, sm->sm_start, sm->sm_size); @@ -474,7 +477,7 @@ space_map_sync(space_map_t *sm, uint8_t maptype, zio_buf_free(entry_map, bufsize); - VERIFY3U(sm->sm_space, ==, 0); + VERIFY0(sm->sm_space); } void diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h index 8f189c62d..85d88d74f 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_ARC_H @@ -135,6 +136,13 @@ void l2arc_fini(void); void l2arc_start(void); void l2arc_stop(void); +#ifdef illumos +#ifndef _KERNEL +extern boolean_t arc_watch; +extern int arc_procfd; +#endif +#endif /* illumos */ + #ifdef __cplusplus } #endif diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bpobj.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bpobj.h index 3771a9541..af975c734 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bpobj.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bpobj.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_BPOBJ_H @@ -67,7 +68,9 @@ typedef struct bpobj { typedef int bpobj_itor_t(void *arg, const blkptr_t *bp, dmu_tx_t *tx); uint64_t bpobj_alloc(objset_t *mos, int blocksize, dmu_tx_t *tx); +uint64_t bpobj_alloc_empty(objset_t *os, int blocksize, dmu_tx_t *tx); void bpobj_free(objset_t *os, uint64_t obj, dmu_tx_t *tx); +void bpobj_decr_empty(objset_t *os, dmu_tx_t *tx); int bpobj_open(bpobj_t *bpo, objset_t *mos, uint64_t object); void bpobj_close(bpobj_t *bpo); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bptree.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bptree.h new file mode 100644 index 000000000..971507211 --- /dev/null +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/bptree.h @@ -0,0 +1,64 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#ifndef _SYS_BPTREE_H +#define _SYS_BPTREE_H + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +typedef struct bptree_phys { + uint64_t bt_begin; + uint64_t bt_end; + uint64_t bt_bytes; + uint64_t bt_comp; + uint64_t bt_uncomp; +} bptree_phys_t; + +typedef struct bptree_entry_phys { + blkptr_t be_bp; + uint64_t be_birth_txg; /* only delete blocks born after this txg */ + zbookmark_t be_zb; /* holds traversal resume point if needed */ +} bptree_entry_phys_t; + +typedef int bptree_itor_t(void *arg, const blkptr_t *bp, dmu_tx_t *tx); + +uint64_t bptree_alloc(objset_t *os, dmu_tx_t *tx); +int bptree_free(objset_t *os, uint64_t obj, dmu_tx_t *tx); + +void bptree_add(objset_t *os, uint64_t obj, blkptr_t *bp, uint64_t birth_txg, + uint64_t bytes, uint64_t comp, uint64_t uncomp, dmu_tx_t *tx); + +int bptree_iterate(objset_t *os, uint64_t obj, boolean_t free, + bptree_itor_t func, void *arg, dmu_tx_t *tx); + +#ifdef __cplusplus +} +#endif + +#endif /* _SYS_BPTREE_H */ diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h index 4d24f5754..ef9d0ccc6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h @@ -18,11 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. - */ -/* + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2012, Joyent, Inc. All rights reserved. */ @@ -75,6 +74,53 @@ typedef struct objset objset_t; typedef struct dmu_tx dmu_tx_t; typedef struct dsl_dir dsl_dir_t; +typedef enum dmu_object_byteswap { + DMU_BSWAP_UINT8, + DMU_BSWAP_UINT16, + DMU_BSWAP_UINT32, + DMU_BSWAP_UINT64, + DMU_BSWAP_ZAP, + DMU_BSWAP_DNODE, + DMU_BSWAP_OBJSET, + DMU_BSWAP_ZNODE, + DMU_BSWAP_OLDACL, + DMU_BSWAP_ACL, + /* + * Allocating a new byteswap type number makes the on-disk format + * incompatible with any other format that uses the same number. + * + * Data can usually be structured to work with one of the + * DMU_BSWAP_UINT* or DMU_BSWAP_ZAP types. + */ + DMU_BSWAP_NUMFUNCS +} dmu_object_byteswap_t; + +#define DMU_OT_NEWTYPE 0x80 +#define DMU_OT_METADATA 0x40 +#define DMU_OT_BYTESWAP_MASK 0x3f + +/* + * Defines a uint8_t object type. Object types specify if the data + * in the object is metadata (boolean) and how to byteswap the data + * (dmu_object_byteswap_t). + */ +#define DMU_OT(byteswap, metadata) \ + (DMU_OT_NEWTYPE | \ + ((metadata) ? DMU_OT_METADATA : 0) | \ + ((byteswap) & DMU_OT_BYTESWAP_MASK)) + +#define DMU_OT_IS_VALID(ot) (((ot) & DMU_OT_NEWTYPE) ? \ + ((ot) & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS : \ + (ot) < DMU_OT_NUMTYPES) + +#define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \ + ((ot) & DMU_OT_METADATA) : \ + dmu_ot[(ot)].ot_metadata) + +#define DMU_OT_BYTESWAP(ot) (((ot) & DMU_OT_NEWTYPE) ? \ + ((ot) & DMU_OT_BYTESWAP_MASK) : \ + dmu_ot[(ot)].ot_byteswap) + typedef enum dmu_object_type { DMU_OT_NONE, /* general: */ @@ -139,7 +185,35 @@ typedef enum dmu_object_type { DMU_OT_DEADLIST_HDR, /* UINT64 */ DMU_OT_DSL_CLONES, /* ZAP */ DMU_OT_BPOBJ_SUBOBJ, /* UINT64 */ - DMU_OT_NUMTYPES + /* + * Do not allocate new object types here. Doing so makes the on-disk + * format incompatible with any other format that uses the same object + * type number. + * + * When creating an object which does not have one of the above types + * use the DMU_OTN_* type with the correct byteswap and metadata + * values. + * + * The DMU_OTN_* types do not have entries in the dmu_ot table, + * use the DMU_OT_IS_METDATA() and DMU_OT_BYTESWAP() macros instead + * of indexing into dmu_ot directly (this works for both DMU_OT_* types + * and DMU_OTN_* types). + */ + DMU_OT_NUMTYPES, + + /* + * Names for valid types declared with DMU_OT(). + */ + DMU_OTN_UINT8_DATA = DMU_OT(DMU_BSWAP_UINT8, B_FALSE), + DMU_OTN_UINT8_METADATA = DMU_OT(DMU_BSWAP_UINT8, B_TRUE), + DMU_OTN_UINT16_DATA = DMU_OT(DMU_BSWAP_UINT16, B_FALSE), + DMU_OTN_UINT16_METADATA = DMU_OT(DMU_BSWAP_UINT16, B_TRUE), + DMU_OTN_UINT32_DATA = DMU_OT(DMU_BSWAP_UINT32, B_FALSE), + DMU_OTN_UINT32_METADATA = DMU_OT(DMU_BSWAP_UINT32, B_TRUE), + DMU_OTN_UINT64_DATA = DMU_OT(DMU_BSWAP_UINT64, B_FALSE), + DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE), + DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE), + DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE), } dmu_object_type_t; typedef enum dmu_objset_type { @@ -221,6 +295,9 @@ typedef void dmu_buf_evict_func_t(struct dmu_buf *db, void *user_ptr); */ #define DMU_POOL_DIRECTORY_OBJECT 1 #define DMU_POOL_CONFIG "config" +#define DMU_POOL_FEATURES_FOR_WRITE "features_for_write" +#define DMU_POOL_FEATURES_FOR_READ "features_for_read" +#define DMU_POOL_FEATURE_DESCRIPTIONS "feature_descriptions" #define DMU_POOL_ROOT_DATASET "root_dataset" #define DMU_POOL_SYNC_BPOBJ "sync_bplist" #define DMU_POOL_ERRLOG_SCRUB "errlog_scrub" @@ -236,6 +313,8 @@ typedef void dmu_buf_evict_func_t(struct dmu_buf *db, void *user_ptr); #define DMU_POOL_CREATION_VERSION "creation_version" #define DMU_POOL_SCAN "scan" #define DMU_POOL_FREE_BPOBJ "free_bpobj" +#define DMU_POOL_BPTREE_OBJ "bptree_obj" +#define DMU_POOL_EMPTY_BPOBJ "empty_bpobj" /* * Allocate an object from this objset. The range of object numbers @@ -496,7 +575,7 @@ void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, /* * Free up the data blocks for a defined range of a file. If size is - * zero, the range from offset to end-of-file is freed. + * -1, the range from offset to end-of-file is freed. */ int dmu_free_range(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, dmu_tx_t *tx); @@ -566,12 +645,18 @@ typedef struct dmu_object_info { typedef void arc_byteswap_func_t(void *buf, size_t size); typedef struct dmu_object_type_info { - arc_byteswap_func_t *ot_byteswap; + dmu_object_byteswap_t ot_byteswap; boolean_t ot_metadata; char *ot_name; } dmu_object_type_info_t; +typedef struct dmu_object_byteswap_info { + arc_byteswap_func_t *ob_func; + char *ob_name; +} dmu_object_byteswap_info_t; + extern const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES]; +extern const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS]; /* * Get information on a DMU object. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_objset.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_objset.h index d687642b3..fac0f9f2b 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_objset.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_objset.h @@ -161,7 +161,6 @@ timestruc_t dmu_objset_snap_cmtime(objset_t *os); /* called from dsl */ void dmu_objset_sync(objset_t *os, zio_t *zio, dmu_tx_t *tx); boolean_t dmu_objset_is_dirty(objset_t *os, uint64_t txg); -boolean_t dmu_objset_is_dirty_anywhere(objset_t *os); objset_t *dmu_objset_create_impl(spa_t *spa, struct dsl_dataset *ds, blkptr_t *bp, dmu_objset_type_t type, dmu_tx_t *tx); int dmu_objset_open_impl(spa_t *spa, struct dsl_dataset *ds, blkptr_t *bp, diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_traverse.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_traverse.h index 5b326cd99..3cbf42f56 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_traverse.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu_traverse.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_DMU_TRAVERSE_H @@ -54,6 +55,9 @@ typedef int (blkptr_cb_t)(spa_t *spa, zilog_t *zilog, const blkptr_t *bp, int traverse_dataset(struct dsl_dataset *ds, uint64_t txg_start, int flags, blkptr_cb_t func, void *arg); +int traverse_dataset_destroyed(spa_t *spa, blkptr_t *blkptr, + uint64_t txg_start, zbookmark_t *resume, int flags, + blkptr_cb_t func, void *arg); int traverse_pool(spa_t *spa, uint64_t txg_start, int flags, blkptr_cb_t func, void *arg); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h index 9ad4be36b..9f9134d8c 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dnode.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_DNODE_H @@ -276,7 +277,6 @@ void dnode_byteswap(dnode_phys_t *dnp); void dnode_buf_byteswap(void *buf, size_t size); void dnode_verify(dnode_t *dn); int dnode_set_blksz(dnode_t *dn, uint64_t size, int ibs, dmu_tx_t *tx); -uint64_t dnode_current_max_length(dnode_t *dn); void dnode_free_range(dnode_t *dn, uint64_t off, uint64_t len, dmu_tx_t *tx); void dnode_clear_range(dnode_t *dn, uint64_t blkid, uint64_t nblks, dmu_tx_t *tx); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h index 56d2f6b75..d030cd779 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h @@ -22,7 +22,7 @@ * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011 Pawel Jakub Dawidek . * All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2012, Joyent, Inc. All rights reserved. */ @@ -88,7 +88,12 @@ typedef struct dsl_dataset_phys { uint64_t ds_creation_time; /* seconds since 1970 */ uint64_t ds_creation_txg; uint64_t ds_deadlist_obj; /* DMU_OT_DEADLIST */ - uint64_t ds_used_bytes; + /* + * ds_referenced_bytes, ds_compressed_bytes, and ds_uncompressed_bytes + * include all blocks referenced by this dataset, including those + * shared with any other datasets. + */ + uint64_t ds_referenced_bytes; uint64_t ds_compressed_bytes; uint64_t ds_uncompressed_bytes; uint64_t ds_unique_bytes; /* only relevant to snapshots */ @@ -266,6 +271,7 @@ int dsl_dataset_space_written(dsl_dataset_t *oldsnap, dsl_dataset_t *new, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp); int dsl_dataset_space_wouldfree(dsl_dataset_t *firstsnap, dsl_dataset_t *last, uint64_t *usedp, uint64_t *compp, uint64_t *uncompp); +boolean_t dsl_dataset_is_dirty(dsl_dataset_t *ds); int dsl_dsobj_to_dsname(char *pname, uint64_t obj, char *buf); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h index 7d25bd7c0..f8c98edc2 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_DSL_POOL_H @@ -34,6 +35,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -48,7 +50,8 @@ struct dsl_scan; /* These macros are for indexing into the zfs_all_blkstats_t. */ #define DMU_OT_DEFERRED DMU_OT_NONE -#define DMU_OT_TOTAL DMU_OT_NUMTYPES +#define DMU_OT_OTHER DMU_OT_NUMTYPES /* place holder for DMU_OT() types */ +#define DMU_OT_TOTAL (DMU_OT_NUMTYPES + 1) typedef struct zfs_blkstat { uint64_t zb_count; @@ -79,12 +82,13 @@ typedef struct dsl_pool { /* No lock needed - sync context only */ blkptr_t dp_meta_rootbp; - list_t dp_synced_datasets; hrtime_t dp_read_overhead; uint64_t dp_throughput; /* bytes per millisec */ uint64_t dp_write_limit; uint64_t dp_tmp_userrefs_obj; bpobj_t dp_free_bpobj; + uint64_t dp_bptree_obj; + uint64_t dp_empty_bpobj; struct dsl_scan *dp_scan; @@ -92,10 +96,14 @@ typedef struct dsl_pool { kmutex_t dp_lock; uint64_t dp_space_towrite[TXG_SIZE]; uint64_t dp_tempreserved[TXG_SIZE]; + uint64_t dp_mos_used_delta; + uint64_t dp_mos_compressed_delta; + uint64_t dp_mos_uncompressed_delta; /* Has its own locking */ tx_state_t dp_tx; txg_list_t dp_dirty_datasets; + txg_list_t dp_dirty_zilogs; txg_list_t dp_dirty_dirs; txg_list_t dp_sync_tasks; @@ -110,7 +118,8 @@ typedef struct dsl_pool { zfs_all_blkstats_t *dp_blkstats; } dsl_pool_t; -int dsl_pool_open(spa_t *spa, uint64_t txg, dsl_pool_t **dpp); +int dsl_pool_init(spa_t *spa, uint64_t txg, dsl_pool_t **dpp); +int dsl_pool_open(dsl_pool_t *dp); void dsl_pool_close(dsl_pool_t *dp); dsl_pool_t *dsl_pool_create(spa_t *spa, nvlist_t *zplprops, uint64_t txg); void dsl_pool_sync(dsl_pool_t *dp, uint64_t txg); @@ -134,6 +143,8 @@ int dsl_read_nolock(zio_t *pio, spa_t *spa, const blkptr_t *bpp, void dsl_pool_create_origin(dsl_pool_t *dp, dmu_tx_t *tx); void dsl_pool_upgrade_clones(dsl_pool_t *dp, dmu_tx_t *tx); void dsl_pool_upgrade_dir_clones(dsl_pool_t *dp, dmu_tx_t *tx); +void dsl_pool_mos_diduse_space(dsl_pool_t *dp, + int64_t used, int64_t comp, int64_t uncomp); taskq_t *dsl_pool_vnrele_taskq(dsl_pool_t *dp); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_scan.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_scan.h index c79666e67..5691f4d14 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_scan.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_scan.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_DSL_SCAN_H @@ -79,6 +80,9 @@ typedef struct dsl_scan { uint64_t scn_sync_start_time; zio_t *scn_zio_root; + /* for freeing blocks */ + boolean_t scn_is_bptree; + /* for debugging / information */ uint64_t scn_visited_this_txg; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/sa_impl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/sa_impl.h index 6661e47cf..8ae05ce36 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/sa_impl.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/sa_impl.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_SA_IMPL_H @@ -181,7 +182,7 @@ typedef struct sa_hdr_phys { */ #define SA_HDR_LAYOUT_NUM(hdr) BF32_GET(hdr->sa_layout_info, 0, 10) -#define SA_HDR_SIZE(hdr) BF32_GET_SB(hdr->sa_layout_info, 10, 16, 3, 0) +#define SA_HDR_SIZE(hdr) BF32_GET_SB(hdr->sa_layout_info, 10, 6, 3, 0) #define SA_HDR_LAYOUT_INFO_ENCODE(x, num, size) \ { \ BF32_SET_SB(x, 10, 6, 3, 0, size); \ diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h index 1644345b0..71f329147 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. */ @@ -94,7 +94,7 @@ struct dsl_pool; /* * Size of block to hold the configuration data (a packed nvlist) */ -#define SPA_CONFIG_BLOCKSIZE (1 << 14) +#define SPA_CONFIG_BLOCKSIZE (1ULL << 14) /* * The DVA size encodings for LSIZE and PSIZE support blocks up to 32MB. @@ -262,7 +262,7 @@ typedef struct blkptr { DVA_GET_ASIZE(&(bp)->blk_dva[2])) #define BP_GET_UCSIZE(bp) \ - ((BP_GET_LEVEL(bp) > 0 || dmu_ot[BP_GET_TYPE(bp)].ot_metadata) ? \ + ((BP_GET_LEVEL(bp) > 0 || DMU_OT_IS_METADATA(BP_GET_TYPE(bp))) ? \ BP_GET_PSIZE(bp) : BP_GET_LSIZE(bp)) #define BP_GET_NDVAS(bp) \ @@ -403,8 +403,8 @@ typedef struct blkptr { #include #define BP_GET_BUFC_TYPE(bp) \ - (((BP_GET_LEVEL(bp) > 0) || (dmu_ot[BP_GET_TYPE(bp)].ot_metadata)) ? \ - ARC_BUFC_METADATA : ARC_BUFC_DATA); + (((BP_GET_LEVEL(bp) > 0) || (DMU_OT_IS_METADATA(BP_GET_TYPE(bp)))) ? \ + ARC_BUFC_METADATA : ARC_BUFC_DATA) typedef enum spa_import_type { SPA_IMPORT_EXISTING, @@ -415,8 +415,8 @@ typedef enum spa_import_type { extern int spa_open(const char *pool, spa_t **, void *tag); extern int spa_open_rewind(const char *pool, spa_t **, void *tag, nvlist_t *policy, nvlist_t **config); -extern int spa_get_stats(const char *pool, nvlist_t **config, - char *altroot, size_t buflen); +extern int spa_get_stats(const char *pool, nvlist_t **config, char *altroot, + size_t buflen); extern int spa_create(const char *pool, nvlist_t *config, nvlist_t *props, const char *history_str, nvlist_t *zplprops); #if defined(sun) @@ -577,6 +577,7 @@ extern void spa_claim_notify(zio_t *zio); /* Accessor functions */ extern boolean_t spa_shutting_down(spa_t *spa); extern struct dsl_pool *spa_get_dsl(spa_t *spa); +extern boolean_t spa_is_initializing(spa_t *spa); extern blkptr_t *spa_get_rootblkptr(spa_t *spa); extern void spa_set_rootblkptr(spa_t *spa, const blkptr_t *bp); extern void spa_altroot(spa_t *, char *, size_t); @@ -608,6 +609,8 @@ extern uint64_t spa_delegation(spa_t *spa); extern objset_t *spa_meta_objset(spa_t *spa); /* Miscellaneous support routines */ +extern void spa_activate_mos_feature(spa_t *spa, const char *feature); +extern void spa_deactivate_mos_feature(spa_t *spa, const char *feature); extern int spa_rename(const char *oldname, const char *newname); extern spa_t *spa_by_guid(uint64_t pool_guid, uint64_t device_guid); extern boolean_t spa_guid_exists(uint64_t pool_guid, uint64_t device_guid); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h index 88d147739..da31a2243 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. */ @@ -127,6 +127,7 @@ struct spa { uint64_t spa_import_flags; /* import specific flags */ taskq_t *spa_zio_taskq[ZIO_TYPES][ZIO_TASKQ_TYPES]; dsl_pool_t *spa_dsl_pool; + boolean_t spa_is_initializing; /* true while opening pool */ metaslab_class_t *spa_normal_class; /* normal data class */ metaslab_class_t *spa_log_class; /* intent log data class */ uint64_t spa_first_txg; /* first txg after spa_open() */ @@ -140,10 +141,12 @@ struct spa { vdev_t *spa_root_vdev; /* top-level vdev container */ uint64_t spa_config_guid; /* config pool guid */ uint64_t spa_load_guid; /* spa_load initialized guid */ + uint64_t spa_last_synced_guid; /* last synced guid */ list_t spa_config_dirty_list; /* vdevs with dirty config */ list_t spa_state_dirty_list; /* vdevs with dirty state */ spa_aux_vdev_t spa_spares; /* hot spares */ spa_aux_vdev_t spa_l2cache; /* L2ARC cache devices */ + nvlist_t *spa_label_features; /* Features for reading MOS */ uint64_t spa_config_object; /* MOS object for pool config */ uint64_t spa_config_generation; /* config generation number */ uint64_t spa_syncing_txg; /* txg currently syncing */ @@ -220,7 +223,10 @@ struct spa { boolean_t spa_autoreplace; /* autoreplace set in open */ int spa_vdev_locks; /* locks grabbed */ uint64_t spa_creation_version; /* version at pool creation */ - uint64_t spa_prev_software_version; + uint64_t spa_prev_software_version; /* See ub_software_version */ + uint64_t spa_feat_for_write_obj; /* required to write to pool */ + uint64_t spa_feat_for_read_obj; /* required to read from pool */ + uint64_t spa_feat_desc_obj; /* Feature descriptions */ /* * spa_refcnt & spa_config_lock must be the last elements * because refcount_t changes size based on compilation options. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/txg.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/txg.h index e323d5efa..1287f09c7 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/txg.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/txg.h @@ -22,6 +22,9 @@ * Copyright 2010 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ #ifndef _SYS_TXG_H #define _SYS_TXG_H @@ -115,7 +118,7 @@ extern boolean_t txg_sync_waiting(struct dsl_pool *dp); extern void txg_list_create(txg_list_t *tl, size_t offset); extern void txg_list_destroy(txg_list_t *tl); -extern int txg_list_empty(txg_list_t *tl, uint64_t txg); +extern boolean_t txg_list_empty(txg_list_t *tl, uint64_t txg); extern int txg_list_add(txg_list_t *tl, void *p, uint64_t txg); extern int txg_list_add_tail(txg_list_t *tl, void *p, uint64_t txg); extern void *txg_list_remove(txg_list_t *tl, uint64_t txg); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h index aa6559c80..7e34889b6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h @@ -18,6 +18,7 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012 by Delphix. All rights reserved. @@ -141,8 +142,8 @@ extern nvlist_t *vdev_config_generate(spa_t *spa, vdev_t *vd, struct uberblock; extern uint64_t vdev_label_offset(uint64_t psize, int l, uint64_t offset); extern int vdev_label_number(uint64_t psise, uint64_t offset); -extern nvlist_t *vdev_label_read_config(vdev_t *vd); -extern void vdev_uberblock_load(zio_t *zio, vdev_t *vd, struct uberblock *ub); +extern nvlist_t *vdev_label_read_config(vdev_t *vd, uint64_t txg); +extern void vdev_uberblock_load(vdev_t *, struct uberblock *, nvlist_t **); typedef enum { VDEV_LABEL_CREATE, /* create/add a new device */ diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h index 019d2bef9..7eed4a188 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h @@ -202,7 +202,7 @@ struct vdev { * For DTrace to work in userland (libzpool) context, these fields must * remain at the end of the structure. DTrace will use the kernel's * CTF definition for 'struct vdev', and since the size of a kmutex_t is - * larger in userland, the offsets for the rest fields would be + * larger in userland, the offsets for the rest of the fields would be * incorrect. */ kmutex_t vdev_dtl_lock; /* vdev_dtl_{map,resilver} */ @@ -257,6 +257,7 @@ typedef struct vdev_label { #define VDEV_LABEL_START_SIZE (2 * sizeof (vdev_label_t) + VDEV_BOOT_SIZE) #define VDEV_LABEL_END_SIZE (2 * sizeof (vdev_label_t)) #define VDEV_LABELS 4 +#define VDEV_BEST_LABEL VDEV_LABELS #define VDEV_ALLOC_LOAD 0 #define VDEV_ALLOC_ADD 1 diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap.h index a1130bbba..092669c8b 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zap.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_ZAP_H @@ -132,6 +133,8 @@ uint64_t zap_create_norm(objset_t *ds, int normflags, dmu_object_type_t ot, uint64_t zap_create_flags(objset_t *os, int normflags, zap_flags_t flags, dmu_object_type_t ot, int leaf_blockshift, int indirect_blockshift, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx); +uint64_t zap_create_link(objset_t *os, dmu_object_type_t ot, + uint64_t parent_obj, const char *name, dmu_tx_t *tx); /* * Create a new zapobj with no attributes from the given (unallocated) @@ -297,15 +300,11 @@ int zap_increment_int(objset_t *os, uint64_t obj, uint64_t key, int64_t delta, /* Here the key is an int and the value is a different int. */ int zap_add_int_key(objset_t *os, uint64_t obj, uint64_t key, uint64_t value, dmu_tx_t *tx); +int zap_update_int_key(objset_t *os, uint64_t obj, + uint64_t key, uint64_t value, dmu_tx_t *tx); int zap_lookup_int_key(objset_t *os, uint64_t obj, uint64_t key, uint64_t *valuep); -/* - * They name is a stringified version of key; increment its value by - * delta. Zero values will be zap_remove()-ed. - */ -int zap_increment_int(objset_t *os, uint64_t obj, uint64_t key, int64_t delta, - dmu_tx_t *tx); int zap_increment(objset_t *os, uint64_t obj, const char *name, int64_t delta, dmu_tx_t *tx); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfeature.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfeature.h new file mode 100644 index 000000000..481e85b1b --- /dev/null +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfeature.h @@ -0,0 +1,52 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#ifndef _SYS_ZFEATURE_H +#define _SYS_ZFEATURE_H + +#include +#include +#include "zfeature_common.h" + +#ifdef __cplusplus +extern "C" { +#endif + +extern boolean_t feature_is_supported(objset_t *os, uint64_t obj, + uint64_t desc_obj, nvlist_t *unsup_feat, nvlist_t *enabled_feat); + +struct spa; +extern void spa_feature_create_zap_objects(struct spa *, dmu_tx_t *); +extern void spa_feature_enable(struct spa *, zfeature_info_t *, dmu_tx_t *); +extern void spa_feature_incr(struct spa *, zfeature_info_t *, dmu_tx_t *); +extern void spa_feature_decr(struct spa *, zfeature_info_t *, dmu_tx_t *); +extern boolean_t spa_feature_is_enabled(struct spa *, zfeature_info_t *); +extern boolean_t spa_feature_is_active(struct spa *, zfeature_info_t *); + +#ifdef __cplusplus +} +#endif + +#endif /* _SYS_ZFEATURE_H */ diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_debug.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_debug.h index 31af18462..43d037d7b 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_debug.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_debug.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_ZFS_DEBUG_H @@ -77,6 +78,12 @@ extern void zfs_dbgmsg_init(void); extern void zfs_dbgmsg_fini(void); extern void zfs_dbgmsg(const char *fmt, ...); +#ifdef illumos +#ifndef _KERNEL +extern int dprintf_find_string(const char *string); +#endif +#endif /* illumos */ + #ifdef __cplusplus } #endif diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h index a4c5575b2..e52c65bb7 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* Portions Copyright 2010 Robert Milkowski */ @@ -395,6 +396,7 @@ extern void zil_replay(objset_t *os, void *arg, zil_replay_func_t *replay_func[TX_MAX_TYPE]); extern boolean_t zil_replaying(zilog_t *zilog, dmu_tx_t *tx); extern void zil_destroy(zilog_t *zilog, boolean_t keep_first); +extern void zil_destroy_sync(zilog_t *zilog, dmu_tx_t *tx); extern void zil_rollback_destroy(zilog_t *zilog, dmu_tx_t *tx); extern itx_t *zil_itx_create(uint64_t txtype, size_t lrsize); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h index 1d4c0cc6c..58566203b 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* Portions Copyright 2010 Robert Milkowski */ @@ -130,6 +131,7 @@ struct zilog { zil_header_t zl_old_header; /* debugging aid */ uint_t zl_prev_blks[ZIL_PREV_BLKS]; /* size - sector rounded */ uint_t zl_prev_rotor; /* rotor for zl_prev[] */ + txg_node_t zl_dirty_link; /* protected by dp_dirty_zilogs list */ }; typedef struct zil_bp_node { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h index e8372f711..a554cc9a7 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h @@ -21,6 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _ZIO_H @@ -270,6 +271,14 @@ typedef struct zbookmark { #define ZB_ZIL_OBJECT (0ULL) #define ZB_ZIL_LEVEL (-2LL) +#define ZB_IS_ZERO(zb) \ + ((zb)->zb_objset == 0 && (zb)->zb_object == 0 && \ + (zb)->zb_level == 0 && (zb)->zb_blkid == 0) +#define ZB_IS_ROOT(zb) \ + ((zb)->zb_object == ZB_ROOT_OBJECT && \ + (zb)->zb_level == ZB_ROOT_LEVEL && \ + (zb)->zb_blkid == ZB_ROOT_BLKID) + typedef struct zio_prop { enum zio_checksum zp_checksum; enum zio_compress zp_compress; @@ -287,6 +296,7 @@ typedef void zio_cksum_finish_f(zio_cksum_report_t *rep, typedef void zio_cksum_free_f(void *cbdata, size_t size); struct zio_bad_cksum; /* defined in zio_checksum.h */ +struct dnode_phys; struct zio_cksum_report { struct zio_cksum_report *zcr_next; @@ -558,6 +568,10 @@ extern void zfs_ereport_post_checksum(spa_t *spa, vdev_t *vd, /* Called from spa_sync(), but primarily an injection handler */ extern void spa_handle_ignored_writes(spa_t *spa); +/* zbookmark functions */ +boolean_t zbookmark_is_before(const struct dnode_phys *dnp, + const zbookmark_t *zb1, const zbookmark_t *zb2); + #ifdef __cplusplus } #endif diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h index d90bd8bd5..9a58ac048 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h @@ -61,7 +61,7 @@ enum zio_stage { ZIO_STAGE_READY = 1 << 15, /* RWFCI */ ZIO_STAGE_VDEV_IO_START = 1 << 16, /* RW--I */ - ZIO_STAGE_VDEV_IO_DONE = 1 << 17, /* RW--I */ + ZIO_STAGE_VDEV_IO_DONE = 1 << 17, /* RW--- */ ZIO_STAGE_VDEV_IO_ASSESS = 1 << 18, /* RW--I */ ZIO_STAGE_CHECKSUM_VERIFY = 1 << 19, /* R---- */ diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c index 2e35e9eb4..dac619bea 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c @@ -21,6 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Portions Copyright 2011 Martin Matuska + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -596,7 +597,7 @@ txg_list_destroy(txg_list_t *tl) mutex_destroy(&tl->tl_lock); } -int +boolean_t txg_list_empty(txg_list_t *tl, uint64_t txg) { return (tl->tl_head[txg & TXG_MASK] == NULL); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c index ffcefb9eb..9eaf94ca4 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c @@ -597,9 +597,9 @@ vdev_free(vdev_t *vd) metaslab_group_destroy(vd->vdev_mg); } - ASSERT3U(vd->vdev_stat.vs_space, ==, 0); - ASSERT3U(vd->vdev_stat.vs_dspace, ==, 0); - ASSERT3U(vd->vdev_stat.vs_alloc, ==, 0); + ASSERT0(vd->vdev_stat.vs_space); + ASSERT0(vd->vdev_stat.vs_dspace); + ASSERT0(vd->vdev_stat.vs_alloc); /* * Remove this vdev from its parent's child list. @@ -1328,8 +1328,9 @@ vdev_validate(vdev_t *vd, boolean_t strict) if (vd->vdev_ops->vdev_op_leaf && vdev_readable(vd)) { uint64_t aux_guid = 0; nvlist_t *nvl; + uint64_t txg = strict ? spa->spa_config_txg : -1ULL; - if ((label = vdev_label_read_config(vd)) == NULL) { + if ((label = vdev_label_read_config(vd, txg)) == NULL) { vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, VDEV_AUX_BAD_LABEL); return (0); @@ -1511,7 +1512,7 @@ vdev_reopen(vdev_t *vd) !l2arc_vdev_present(vd)) l2arc_add_vdev(spa, vd); } else { - (void) vdev_validate(vd, B_TRUE); + (void) vdev_validate(vd, spa_last_synced_txg(spa)); } /* @@ -1805,7 +1806,7 @@ vdev_dtl_sync(vdev_t *vd, uint64_t txg) if (vd->vdev_detached) { if (smo->smo_object != 0) { int err = dmu_object_free(mos, smo->smo_object, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); smo->smo_object = 0; } dmu_tx_commit(tx); @@ -1970,14 +1971,14 @@ vdev_validate_aux(vdev_t *vd) if (!vdev_readable(vd)) return (0); - if ((label = vdev_label_read_config(vd)) == NULL) { + if ((label = vdev_label_read_config(vd, -1ULL)) == NULL) { vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, VDEV_AUX_CORRUPT_DATA); return (-1); } if (nvlist_lookup_uint64(label, ZPOOL_CONFIG_VERSION, &version) != 0 || - version > SPA_VERSION || + !SPA_VERSION_IS_SUPPORTED(version) || nvlist_lookup_uint64(label, ZPOOL_CONFIG_GUID, &guid) != 0 || guid != vd->vdev_guid || nvlist_lookup_uint64(label, ZPOOL_CONFIG_POOL_STATE, &state) != 0) { @@ -2005,7 +2006,7 @@ vdev_remove(vdev_t *vd, uint64_t txg) tx = dmu_tx_create_assigned(spa_get_dsl(spa), txg); if (vd->vdev_dtl_smo.smo_object) { - ASSERT3U(vd->vdev_dtl_smo.smo_alloc, ==, 0); + ASSERT0(vd->vdev_dtl_smo.smo_alloc); (void) dmu_object_free(mos, vd->vdev_dtl_smo.smo_object, tx); vd->vdev_dtl_smo.smo_object = 0; } @@ -2017,7 +2018,7 @@ vdev_remove(vdev_t *vd, uint64_t txg) if (msp == NULL || msp->ms_smo.smo_object == 0) continue; - ASSERT3U(msp->ms_smo.smo_alloc, ==, 0); + ASSERT0(msp->ms_smo.smo_alloc); (void) dmu_object_free(mos, msp->ms_smo.smo_object, tx); msp->ms_smo.smo_object = 0; } @@ -2295,7 +2296,7 @@ top: (void) spa_vdev_state_exit(spa, vd, 0); goto top; } - ASSERT3U(tvd->vdev_stat.vs_alloc, ==, 0); + ASSERT0(tvd->vdev_stat.vs_alloc); } /* diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c index c08ed8ba0..e573feb5d 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c @@ -18,8 +18,10 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -121,6 +123,8 @@ * txg Transaction group in which this label was written * pool_guid Unique identifier for this pool * vdev_tree An nvlist describing vdev tree. + * features_for_read + * An nvlist of the features necessary for reading the MOS. * * Each leaf device label also contains the following: * @@ -428,13 +432,23 @@ vdev_top_config_generate(spa_t *spa, nvlist_t *config) kmem_free(array, rvd->vdev_children * sizeof (uint64_t)); } +/* + * Returns the configuration from the label of the given vdev. For vdevs + * which don't have a txg value stored on their label (i.e. spares/cache) + * or have not been completely initialized (txg = 0) just return + * the configuration from the first valid label we find. Otherwise, + * find the most up-to-date label that does not exceed the specified + * 'txg' value. + */ nvlist_t * -vdev_label_read_config(vdev_t *vd) +vdev_label_read_config(vdev_t *vd, uint64_t txg) { spa_t *spa = vd->vdev_spa; nvlist_t *config = NULL; vdev_phys_t *vp; zio_t *zio; + uint64_t best_txg = 0; + int error = 0; int flags = ZIO_FLAG_CONFIG_WRITER | ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE; @@ -447,6 +461,7 @@ vdev_label_read_config(vdev_t *vd) retry: for (int l = 0; l < VDEV_LABELS; l++) { + nvlist_t *label = NULL; zio = zio_root(spa, NULL, NULL, flags); @@ -456,12 +471,31 @@ retry: if (zio_wait(zio) == 0 && nvlist_unpack(vp->vp_nvlist, sizeof (vp->vp_nvlist), - &config, 0) == 0) - break; + &label, 0) == 0) { + uint64_t label_txg = 0; + + /* + * Auxiliary vdevs won't have txg values in their + * labels and newly added vdevs may not have been + * completely initialized so just return the + * configuration from the first valid label we + * encounter. + */ + error = nvlist_lookup_uint64(label, + ZPOOL_CONFIG_POOL_TXG, &label_txg); + if ((error || label_txg == 0) && !config) { + config = label; + break; + } else if (label_txg <= txg && label_txg > best_txg) { + best_txg = label_txg; + nvlist_free(config); + config = fnvlist_dup(label); + } + } - if (config != NULL) { - nvlist_free(config); - config = NULL; + if (label != NULL) { + nvlist_free(label); + label = NULL; } } @@ -496,7 +530,7 @@ vdev_inuse(vdev_t *vd, uint64_t crtxg, vdev_labeltype_t reason, /* * Read the label, if any, and perform some basic sanity checks. */ - if ((label = vdev_label_read_config(vd)) == NULL) + if ((label = vdev_label_read_config(vd, -1ULL)) == NULL) return (B_FALSE); (void) nvlist_lookup_uint64(label, ZPOOL_CONFIG_CREATE_TXG, @@ -833,7 +867,7 @@ retry: * come back up, we fail to see the uberblock for txg + 1 because, say, * it was on a mirrored device and the replica to which we wrote txg + 1 * is now offline. If we then make some changes and sync txg + 1, and then - * the missing replica comes back, then for a new seconds we'll have two + * the missing replica comes back, then for a few seconds we'll have two * conflicting uberblocks on disk with the same txg. The solution is simple: * among uberblocks with equal txg, choose the one with the latest timestamp. */ @@ -853,46 +887,47 @@ vdev_uberblock_compare(uberblock_t *ub1, uberblock_t *ub2) return (0); } +struct ubl_cbdata { + uberblock_t *ubl_ubbest; /* Best uberblock */ + vdev_t *ubl_vd; /* vdev associated with the above */ +}; + static void vdev_uberblock_load_done(zio_t *zio) { + vdev_t *vd = zio->io_vd; spa_t *spa = zio->io_spa; zio_t *rio = zio->io_private; uberblock_t *ub = zio->io_data; - uberblock_t *ubbest = rio->io_private; + struct ubl_cbdata *cbp = rio->io_private; - ASSERT3U(zio->io_size, ==, VDEV_UBERBLOCK_SIZE(zio->io_vd)); + ASSERT3U(zio->io_size, ==, VDEV_UBERBLOCK_SIZE(vd)); if (zio->io_error == 0 && uberblock_verify(ub) == 0) { mutex_enter(&rio->io_lock); if (ub->ub_txg <= spa->spa_load_max_txg && - vdev_uberblock_compare(ub, ubbest) > 0) - *ubbest = *ub; + vdev_uberblock_compare(ub, cbp->ubl_ubbest) > 0) { + /* + * Keep track of the vdev in which this uberblock + * was found. We will use this information later + * to obtain the config nvlist associated with + * this uberblock. + */ + *cbp->ubl_ubbest = *ub; + cbp->ubl_vd = vd; + } mutex_exit(&rio->io_lock); } zio_buf_free(zio->io_data, zio->io_size); } -void -vdev_uberblock_load(zio_t *zio, vdev_t *vd, uberblock_t *ubbest) +static void +vdev_uberblock_load_impl(zio_t *zio, vdev_t *vd, int flags, + struct ubl_cbdata *cbp) { - spa_t *spa = vd->vdev_spa; - vdev_t *rvd = spa->spa_root_vdev; - int flags = ZIO_FLAG_CONFIG_WRITER | ZIO_FLAG_CANFAIL | - ZIO_FLAG_SPECULATIVE | ZIO_FLAG_TRYHARD; - - if (vd == rvd) { - ASSERT(zio == NULL); - spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER); - zio = zio_root(spa, NULL, ubbest, flags); - bzero(ubbest, sizeof (uberblock_t)); - } - - ASSERT(zio != NULL); - for (int c = 0; c < vd->vdev_children; c++) - vdev_uberblock_load(zio, vd->vdev_child[c], ubbest); + vdev_uberblock_load_impl(zio, vd->vdev_child[c], flags, cbp); if (vd->vdev_ops->vdev_op_leaf && vdev_readable(vd)) { for (int l = 0; l < VDEV_LABELS; l++) { @@ -905,11 +940,46 @@ vdev_uberblock_load(zio_t *zio, vdev_t *vd, uberblock_t *ubbest) } } } +} - if (vd == rvd) { - (void) zio_wait(zio); - spa_config_exit(spa, SCL_ALL, FTAG); - } +/* + * Reads the 'best' uberblock from disk along with its associated + * configuration. First, we read the uberblock array of each label of each + * vdev, keeping track of the uberblock with the highest txg in each array. + * Then, we read the configuration from the same vdev as the best uberblock. + */ +void +vdev_uberblock_load(vdev_t *rvd, uberblock_t *ub, nvlist_t **config) +{ + zio_t *zio; + spa_t *spa = rvd->vdev_spa; + struct ubl_cbdata cb; + int flags = ZIO_FLAG_CONFIG_WRITER | ZIO_FLAG_CANFAIL | + ZIO_FLAG_SPECULATIVE | ZIO_FLAG_TRYHARD; + + ASSERT(ub); + ASSERT(config); + + bzero(ub, sizeof (uberblock_t)); + *config = NULL; + + cb.ubl_ubbest = ub; + cb.ubl_vd = NULL; + + spa_config_enter(spa, SCL_ALL, FTAG, RW_WRITER); + zio = zio_root(spa, NULL, &cb, flags); + vdev_uberblock_load_impl(zio, rvd, flags, &cb); + (void) zio_wait(zio); + + /* + * It's possible that the best uberblock was discovered on a label + * that has a configuration which was written in a future txg. + * Search all labels on this vdev to find the configuration that + * matches the txg for our uberblock. + */ + if (cb.ubl_vd != NULL) + *config = vdev_label_read_config(cb.ubl_vd, ub->ub_txg); + spa_config_exit(spa, SCL_ALL, FTAG); } /* diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c index 030ea4293..efae53425 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c @@ -281,7 +281,7 @@ vdev_raidz_map_free_vsd(zio_t *zio) { raidz_map_t *rm = zio->io_vsd; - ASSERT3U(rm->rm_freed, ==, 0); + ASSERT0(rm->rm_freed); rm->rm_freed = 1; if (rm->rm_reports == 0) @@ -1134,7 +1134,7 @@ vdev_raidz_matrix_invert(raidz_map_t *rm, int n, int nmissing, int *missing, */ for (i = 0; i < nmissing; i++) { for (j = 0; j < missing[i]; j++) { - ASSERT3U(rows[i][j], ==, 0); + ASSERT0(rows[i][j]); } ASSERT3U(rows[i][missing[i]], !=, 0); @@ -1175,7 +1175,7 @@ vdev_raidz_matrix_invert(raidz_map_t *rm, int n, int nmissing, int *missing, if (j == missing[i]) { ASSERT3U(rows[i][j], ==, 1); } else { - ASSERT3U(rows[i][j], ==, 0); + ASSERT0(rows[i][j]); } } } diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c index 288a4d99a..009f0a4dd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* @@ -161,7 +162,7 @@ zap_table_grow(zap_t *zap, zap_table_phys_t *tbl, } else { newblk = zap_allocate_blocks(zap, tbl->zt_numblks * 2); tbl->zt_nextblk = newblk; - ASSERT3U(tbl->zt_blks_copied, ==, 0); + ASSERT0(tbl->zt_blks_copied); dmu_prefetch(zap->zap_objset, zap->zap_object, tbl->zt_blk << bs, tbl->zt_numblks << bs); } @@ -338,7 +339,7 @@ zap_grow_ptrtbl(zap_t *zap, dmu_tx_t *tx) ASSERT3U(zap->zap_f.zap_phys->zap_ptrtbl.zt_shift, ==, ZAP_EMBEDDED_PTRTBL_SHIFT(zap)); - ASSERT3U(zap->zap_f.zap_phys->zap_ptrtbl.zt_blk, ==, 0); + ASSERT0(zap->zap_f.zap_phys->zap_ptrtbl.zt_blk); newblk = zap_allocate_blocks(zap, 1); err = dmu_buf_hold(zap->zap_objset, zap->zap_object, @@ -474,7 +475,7 @@ zap_open_leaf(uint64_t blkid, dmu_buf_t *db) * chain. There should be no chained leafs (as we have removed * support for them). */ - ASSERT3U(l->l_phys->l_hdr.lh_pad1, ==, 0); + ASSERT0(l->l_phys->l_hdr.lh_pad1); /* * There should be more hash entries than there can be @@ -657,9 +658,9 @@ zap_expand_leaf(zap_name_t *zn, zap_leaf_t *l, dmu_tx_t *tx, zap_leaf_t **lp) zap_leaf_split(l, nl, zap->zap_normflags != 0); /* set sibling pointers */ - for (i = 0; i < (1ULL<l_blkid, tx); - ASSERT3U(err, ==, 0); /* we checked for i/o errors above */ + ASSERT0(err); /* we checked for i/o errors above */ } if (hash & (1ULL << (64 - l->l_phys->l_hdr.lh_prefix_len))) { @@ -946,6 +947,19 @@ fzap_prefetch(zap_name_t *zn) * Helper functions for consumers. */ +uint64_t +zap_create_link(objset_t *os, dmu_object_type_t ot, uint64_t parent_obj, + const char *name, dmu_tx_t *tx) +{ + uint64_t new_obj; + + VERIFY((new_obj = zap_create(os, ot, DMU_OT_NONE, 0, tx)) > 0); + VERIFY(zap_add(os, parent_obj, name, sizeof (uint64_t), 1, &new_obj, + tx) == 0); + + return (new_obj); +} + int zap_value_search(objset_t *os, uint64_t zapobj, uint64_t value, uint64_t mask, char *name) @@ -1079,6 +1093,16 @@ zap_add_int_key(objset_t *os, uint64_t obj, return (zap_add(os, obj, name, 8, 1, &value, tx)); } +int +zap_update_int_key(objset_t *os, uint64_t obj, + uint64_t key, uint64_t value, dmu_tx_t *tx) +{ + char name[20]; + + (void) snprintf(name, sizeof (name), "%llx", (longlong_t)key); + return (zap_update(os, obj, name, 8, 1, &value, tx)); +} + int zap_lookup_int_key(objset_t *os, uint64_t obj, uint64_t key, uint64_t *valuep) { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c index 6e506a4ee..05f890b89 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -472,7 +472,7 @@ zap_lockdir(objset_t *os, uint64_t obj, dmu_tx_t *tx, { dmu_object_info_t doi; dmu_object_info_from_db(db, &doi); - ASSERT(dmu_ot[doi.doi_type].ot_byteswap == zap_byteswap); + ASSERT3U(DMU_OT_BYTESWAP(doi.doi_type), ==, DMU_BSWAP_ZAP); } #endif @@ -517,7 +517,7 @@ zap_lockdir(objset_t *os, uint64_t obj, dmu_tx_t *tx, return (mzap_upgrade(zapp, tx, 0)); } err = dmu_object_set_blocksize(os, obj, newsz, 0, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); zap->zap_m.zap_num_chunks = db->db_size / MZAP_ENT_LEN - 1; } @@ -596,7 +596,7 @@ mzap_create_impl(objset_t *os, uint64_t obj, int normflags, zap_flags_t flags, { dmu_object_info_t doi; dmu_object_info_from_db(db, &doi); - ASSERT(dmu_ot[doi.doi_type].ot_byteswap == zap_byteswap); + ASSERT3U(DMU_OT_BYTESWAP(doi.doi_type), ==, DMU_BSWAP_ZAP); } #endif diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfeature.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfeature.c new file mode 100644 index 000000000..d532ccc63 --- /dev/null +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfeature.c @@ -0,0 +1,424 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include "zfeature_common.h" +#include + +/* + * ZFS Feature Flags + * ----------------- + * + * ZFS feature flags are used to provide fine-grained versioning to the ZFS + * on-disk format. Once enabled on a pool feature flags replace the old + * spa_version() number. + * + * Each new on-disk format change will be given a uniquely identifying string + * guid rather than a version number. This avoids the problem of different + * organizations creating new on-disk formats with the same version number. To + * keep feature guids unique they should consist of the reverse dns name of the + * organization which implemented the feature and a short name for the feature, + * separated by a colon (e.g. com.delphix:async_destroy). + * + * Reference Counts + * ---------------- + * + * Within each pool features can be in one of three states: disabled, enabled, + * or active. These states are differentiated by a reference count stored on + * disk for each feature: + * + * 1) If there is no reference count stored on disk the feature is disabled. + * 2) If the reference count is 0 a system administrator has enabled the + * feature, but the feature has not been used yet, so no on-disk + * format changes have been made. + * 3) If the reference count is greater than 0 the feature is active. + * The format changes required by the feature are currently on disk. + * Note that if the feature's format changes are reversed the feature + * may choose to set its reference count back to 0. + * + * Feature flags makes no differentiation between non-zero reference counts + * for an active feature (e.g. a reference count of 1 means the same thing as a + * reference count of 27834721), but feature implementations may choose to use + * the reference count to store meaningful information. For example, a new RAID + * implementation might set the reference count to the number of vdevs using + * it. If all those disks are removed from the pool the feature goes back to + * having a reference count of 0. + * + * It is the responsibility of the individual features to maintain a non-zero + * reference count as long as the feature's format changes are present on disk. + * + * Dependencies + * ------------ + * + * Each feature may depend on other features. The only effect of this + * relationship is that when a feature is enabled all of its dependencies are + * automatically enabled as well. Any future work to support disabling of + * features would need to ensure that features cannot be disabled if other + * enabled features depend on them. + * + * On-disk Format + * -------------- + * + * When feature flags are enabled spa_version() is set to SPA_VERSION_FEATURES + * (5000). In order for this to work the pool is automatically upgraded to + * SPA_VERSION_BEFORE_FEATURES (28) first, so all pre-feature flags on disk + * format changes will be in use. + * + * Information about features is stored in 3 ZAP objects in the pool's MOS. + * These objects are linked to by the following names in the pool directory + * object: + * + * 1) features_for_read: feature guid -> reference count + * Features needed to open the pool for reading. + * 2) features_for_write: feature guid -> reference count + * Features needed to open the pool for writing. + * 3) feature_descriptions: feature guid -> descriptive string + * A human readable string. + * + * All enabled features appear in either features_for_read or + * features_for_write, but not both. + * + * To open a pool in read-only mode only the features listed in + * features_for_read need to be supported. + * + * To open the pool in read-write mode features in both features_for_read and + * features_for_write need to be supported. + * + * Some features may be required to read the ZAP objects containing feature + * information. To allow software to check for compatibility with these features + * before the pool is opened their names must be stored in the label in a + * new "features_for_read" entry (note that features that are only required + * to write to a pool never need to be stored in the label since the + * features_for_write ZAP object can be read before the pool is written to). + * To save space in the label features must be explicitly marked as needing to + * be written to the label. Also, reference counts are not stored in the label, + * instead any feature whose reference count drops to 0 is removed from the + * label. + * + * Adding New Features + * ------------------- + * + * Features must be registered in zpool_feature_init() function in + * zfeature_common.c using the zfeature_register() function. This function + * has arguments to specify if the feature should be stored in the + * features_for_read or features_for_write ZAP object and if it needs to be + * written to the label when active. + * + * Once a feature is registered it will appear as a "feature@" + * property which can be set by an administrator. Feature implementors should + * use the spa_feature_is_enabled() and spa_feature_is_active() functions to + * query the state of a feature and the spa_feature_incr() and + * spa_feature_decr() functions to change an enabled feature's reference count. + * Reference counts may only be updated in the syncing context. + * + * Features may not perform enable-time initialization. Instead, any such + * initialization should occur when the feature is first used. This design + * enforces that on-disk changes be made only when features are used. Code + * should only check if a feature is enabled using spa_feature_is_enabled(), + * not by relying on any feature specific metadata existing. If a feature is + * enabled, but the feature's metadata is not on disk yet then it should be + * created as needed. + * + * As an example, consider the com.delphix:async_destroy feature. This feature + * relies on the existence of a bptree in the MOS that store blocks for + * asynchronous freeing. This bptree is not created when async_destroy is + * enabled. Instead, when a dataset is destroyed spa_feature_is_enabled() is + * called to check if async_destroy is enabled. If it is and the bptree object + * does not exist yet, the bptree object is created as part of the dataset + * destroy and async_destroy's reference count is incremented to indicate it + * has made an on-disk format change. Later, after the destroyed dataset's + * blocks have all been asynchronously freed there is no longer any use for the + * bptree object, so it is destroyed and async_destroy's reference count is + * decremented back to 0 to indicate that it has undone its on-disk format + * changes. + */ + +typedef enum { + FEATURE_ACTION_ENABLE, + FEATURE_ACTION_INCR, + FEATURE_ACTION_DECR, +} feature_action_t; + +/* + * Checks that the features active in the specified object are supported by + * this software. Adds each unsupported feature (name -> description) to + * the supplied nvlist. + */ +boolean_t +feature_is_supported(objset_t *os, uint64_t obj, uint64_t desc_obj, + nvlist_t *unsup_feat, nvlist_t *enabled_feat) +{ + boolean_t supported; + zap_cursor_t zc; + zap_attribute_t za; + + supported = B_TRUE; + for (zap_cursor_init(&zc, os, obj); + zap_cursor_retrieve(&zc, &za) == 0; + zap_cursor_advance(&zc)) { + ASSERT(za.za_integer_length == sizeof (uint64_t) && + za.za_num_integers == 1); + + if (NULL != enabled_feat) { + fnvlist_add_uint64(enabled_feat, za.za_name, + za.za_first_integer); + } + + if (za.za_first_integer != 0 && + !zfeature_is_supported(za.za_name)) { + supported = B_FALSE; + + if (NULL != unsup_feat) { + char *desc = ""; + char buf[MAXPATHLEN]; + + if (zap_lookup(os, desc_obj, za.za_name, + 1, sizeof (buf), buf) == 0) + desc = buf; + + VERIFY(nvlist_add_string(unsup_feat, za.za_name, + desc) == 0); + } + } + } + zap_cursor_fini(&zc); + + return (supported); +} + +static int +feature_get_refcount(objset_t *os, uint64_t read_obj, uint64_t write_obj, + zfeature_info_t *feature, uint64_t *res) +{ + int err; + uint64_t refcount; + uint64_t zapobj = feature->fi_can_readonly ? write_obj : read_obj; + + /* + * If the pool is currently being created, the feature objects may not + * have been allocated yet. Act as though all features are disabled. + */ + if (zapobj == 0) + return (ENOTSUP); + + err = zap_lookup(os, zapobj, feature->fi_guid, sizeof (uint64_t), 1, + &refcount); + if (err != 0) { + if (err == ENOENT) + return (ENOTSUP); + else + return (err); + } + *res = refcount; + return (0); +} + +static int +feature_do_action(objset_t *os, uint64_t read_obj, uint64_t write_obj, + uint64_t desc_obj, zfeature_info_t *feature, feature_action_t action, + dmu_tx_t *tx) +{ + int error; + uint64_t refcount; + uint64_t zapobj = feature->fi_can_readonly ? write_obj : read_obj; + + ASSERT(0 != zapobj); + ASSERT(zfeature_is_valid_guid(feature->fi_guid)); + + error = zap_lookup(os, zapobj, feature->fi_guid, + sizeof (uint64_t), 1, &refcount); + + /* + * If we can't ascertain the status of the specified feature, an I/O + * error occurred. + */ + if (error != 0 && error != ENOENT) + return (error); + + switch (action) { + case FEATURE_ACTION_ENABLE: + /* + * If the feature is already enabled, ignore the request. + */ + if (error == 0) + return (0); + refcount = 0; + break; + case FEATURE_ACTION_INCR: + if (error == ENOENT) + return (ENOTSUP); + if (refcount == UINT64_MAX) + return (EOVERFLOW); + refcount++; + break; + case FEATURE_ACTION_DECR: + if (error == ENOENT) + return (ENOTSUP); + if (refcount == 0) + return (EOVERFLOW); + refcount--; + break; + default: + ASSERT(0); + break; + } + + if (action == FEATURE_ACTION_ENABLE) { + int i; + + for (i = 0; feature->fi_depends[i] != NULL; i++) { + zfeature_info_t *dep = feature->fi_depends[i]; + + error = feature_do_action(os, read_obj, write_obj, + desc_obj, dep, FEATURE_ACTION_ENABLE, tx); + if (error != 0) + return (error); + } + } + + error = zap_update(os, zapobj, feature->fi_guid, + sizeof (uint64_t), 1, &refcount, tx); + if (error != 0) + return (error); + + if (action == FEATURE_ACTION_ENABLE) { + error = zap_update(os, desc_obj, + feature->fi_guid, 1, strlen(feature->fi_desc) + 1, + feature->fi_desc, tx); + if (error != 0) + return (error); + } + + if (action == FEATURE_ACTION_INCR && refcount == 1 && feature->fi_mos) { + spa_activate_mos_feature(dmu_objset_spa(os), feature->fi_guid); + } + + if (action == FEATURE_ACTION_DECR && refcount == 0) { + spa_deactivate_mos_feature(dmu_objset_spa(os), + feature->fi_guid); + } + + return (0); +} + +void +spa_feature_create_zap_objects(spa_t *spa, dmu_tx_t *tx) +{ + /* + * We create feature flags ZAP objects in two instances: during pool + * creation and during pool upgrade. + */ + ASSERT(dsl_pool_sync_context(spa_get_dsl(spa)) || (!spa->spa_sync_on && + tx->tx_txg == TXG_INITIAL)); + + spa->spa_feat_for_read_obj = zap_create_link(spa->spa_meta_objset, + DMU_OTN_ZAP_METADATA, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_FEATURES_FOR_READ, tx); + spa->spa_feat_for_write_obj = zap_create_link(spa->spa_meta_objset, + DMU_OTN_ZAP_METADATA, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_FEATURES_FOR_WRITE, tx); + spa->spa_feat_desc_obj = zap_create_link(spa->spa_meta_objset, + DMU_OTN_ZAP_METADATA, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_FEATURE_DESCRIPTIONS, tx); +} + +/* + * Enable any required dependencies, then enable the requested feature. + */ +void +spa_feature_enable(spa_t *spa, zfeature_info_t *feature, dmu_tx_t *tx) +{ + ASSERT3U(spa_version(spa), >=, SPA_VERSION_FEATURES); + VERIFY3U(0, ==, feature_do_action(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_for_write_obj, + spa->spa_feat_desc_obj, feature, FEATURE_ACTION_ENABLE, tx)); +} + +/* + * If the specified feature has not yet been enabled, this function returns + * ENOTSUP; otherwise, this function increments the feature's refcount (or + * returns EOVERFLOW if the refcount cannot be incremented). This function must + * be called from syncing context. + */ +void +spa_feature_incr(spa_t *spa, zfeature_info_t *feature, dmu_tx_t *tx) +{ + ASSERT3U(spa_version(spa), >=, SPA_VERSION_FEATURES); + VERIFY3U(0, ==, feature_do_action(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_for_write_obj, + spa->spa_feat_desc_obj, feature, FEATURE_ACTION_INCR, tx)); +} + +/* + * If the specified feature has not yet been enabled, this function returns + * ENOTSUP; otherwise, this function decrements the feature's refcount (or + * returns EOVERFLOW if the refcount is already 0). This function must + * be called from syncing context. + */ +void +spa_feature_decr(spa_t *spa, zfeature_info_t *feature, dmu_tx_t *tx) +{ + ASSERT3U(spa_version(spa), >=, SPA_VERSION_FEATURES); + VERIFY3U(0, ==, feature_do_action(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_for_write_obj, + spa->spa_feat_desc_obj, feature, FEATURE_ACTION_DECR, tx)); +} + +boolean_t +spa_feature_is_enabled(spa_t *spa, zfeature_info_t *feature) +{ + int err; + uint64_t refcount; + + if (spa_version(spa) < SPA_VERSION_FEATURES) + return (B_FALSE); + + err = feature_get_refcount(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_for_write_obj, + feature, &refcount); + ASSERT(err == 0 || err == ENOTSUP); + return (err == 0); +} + +boolean_t +spa_feature_is_active(spa_t *spa, zfeature_info_t *feature) +{ + int err; + uint64_t refcount; + + if (spa_version(spa) < SPA_VERSION_FEATURES) + return (B_FALSE); + + err = feature_get_refcount(spa->spa_meta_objset, + spa->spa_feat_for_read_obj, spa->spa_feat_for_write_obj, + feature, &refcount); + ASSERT(err == 0 || err == ENOTSUP); + return (err == 0 && refcount > 0); +} diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_debug.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_debug.c index d0f411a99..44824e15a 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_debug.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_debug.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #include @@ -48,12 +49,12 @@ zfs_dbgmsg_fini(void) zfs_dbgmsg_size -= size; } mutex_destroy(&zfs_dbgmsgs_lock); - ASSERT3U(zfs_dbgmsg_size, ==, 0); + ASSERT0(zfs_dbgmsg_size); } /* * Print these messages by running: - * echo ::zfs_dbgmsg | mdb -k + * echo ::zfs_dbgmsg | mdb -k * * Monitor these messages by running: * dtrace -q -n 'zfs-dbgmsg{printf("%s\n", stringof(arg0))}' diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c index 36cc5e907..3f94e3f0e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c @@ -18,6 +18,7 @@ * * CDDL HEADER END */ + /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011-2012 Pawel Jakub Dawidek . @@ -1157,6 +1158,8 @@ getzfsvfs(const char *dsname, zfsvfs_t **zfvp) /* * Find a zfsvfs_t for a mounted filesystem, or create our own, in which * case its z_vfs will be NULL, and it will be opened as the owner. + * If 'writer' is set, the z_teardown_lock will be held for RW_WRITER, + * which prevents all vnode ops from running. */ static int zfsvfs_hold(const char *name, void *tag, zfsvfs_t **zfvp, boolean_t writer) @@ -1220,7 +1223,7 @@ zfs_ioc_pool_create(zfs_cmd_t *zc) (void) nvlist_lookup_uint64(props, zpool_prop_to_name(ZPOOL_PROP_VERSION), &version); - if (version < SPA_VERSION_INITIAL || version > SPA_VERSION) { + if (!SPA_VERSION_IS_SUPPORTED(version)) { error = EINVAL; goto pool_props_bad; } @@ -1344,6 +1347,15 @@ zfs_ioc_pool_configs(zfs_cmd_t *zc) return (error); } +/* + * inputs: + * zc_name name of the pool + * + * outputs: + * zc_cookie real errno + * zc_nvlist_dst config nvlist + * zc_nvlist_dst_size size of config nvlist + */ static int zfs_ioc_pool_stats(zfs_cmd_t *zc) { @@ -1445,7 +1457,8 @@ zfs_ioc_pool_upgrade(zfs_cmd_t *zc) if ((error = spa_open(zc->zc_name, &spa, FTAG)) != 0) return (error); - if (zc->zc_cookie < spa_version(spa) || zc->zc_cookie > SPA_VERSION) { + if (zc->zc_cookie < spa_version(spa) || + !SPA_VERSION_IS_SUPPORTED(zc->zc_cookie)) { spa_close(spa, FTAG); return (EINVAL); } @@ -1804,7 +1817,7 @@ zfs_ioc_objset_stats_impl(zfs_cmd_t *zc, objset_t *os) error = zvol_get_stats(os, nv); if (error == EIO) return (error); - VERIFY3S(error, ==, 0); + VERIFY0(error); } error = put_nvlist(zc, nv); nvlist_free(nv); @@ -4137,7 +4150,17 @@ zfs_ioc_pool_reopen(zfs_cmd_t *zc) return (error); spa_vdev_state_enter(spa, SCL_NONE); + + /* + * If a resilver is already in progress then set the + * spa_scrub_reopen flag to B_TRUE so that we don't restart + * the scan as a side effect of the reopen. Otherwise, let + * vdev_open() decided if a resilver is required. + */ + spa->spa_scrub_reopen = dsl_scan_resilvering(spa->spa_dsl_pool); vdev_reopen(spa->spa_root_vdev); + spa->spa_scrub_reopen = B_FALSE; + (void) spa_vdev_state_exit(spa, NULL, 0); spa_close(spa, FTAG); return (0); @@ -5453,7 +5476,7 @@ zfs_modevent(module_t mod, int type, void *unused __unused) tsd_create(&zfs_fsyncer_key, NULL); tsd_create(&rrw_tsd_key, NULL); - printf("ZFS storage pool version " SPA_VERSION_STRING "\n"); + printf("ZFS storage pool version: features support (" SPA_VERSION_STRING ")\n"); root_mount_rel(zfs_root_token); zfsdev_init(); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c index 7fd8f6020..08f88b80d 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c @@ -22,6 +22,9 @@ * Copyright 2010 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ /* * This file contains the code to implement file range locking in @@ -481,9 +484,9 @@ zfs_range_unlock_reader(znode_t *zp, rl_t *remove) cv_destroy(&remove->r_rd_cv); } } else { - ASSERT3U(remove->r_cnt, ==, 0); - ASSERT3U(remove->r_write_wanted, ==, 0); - ASSERT3U(remove->r_read_wanted, ==, 0); + ASSERT0(remove->r_cnt); + ASSERT0(remove->r_write_wanted); + ASSERT0(remove->r_read_wanted); /* * Find start proxy representing this reader lock, * then decrement ref count on all proxies diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c index d00e8ca40..48a73a684 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #include #include @@ -59,7 +60,6 @@ #include #include #include -#include #include "zfs_comutil.h" struct mtx zfs_debug_mtx; @@ -549,7 +549,6 @@ static int zfs_space_delta_cb(dmu_object_type_t bonustype, void *data, uint64_t *userp, uint64_t *groupp) { - znode_phys_t *znp = data; int error = 0; /* @@ -568,20 +567,18 @@ zfs_space_delta_cb(dmu_object_type_t bonustype, void *data, return (EEXIST); if (bonustype == DMU_OT_ZNODE) { + znode_phys_t *znp = data; *userp = znp->zp_uid; *groupp = znp->zp_gid; } else { int hdrsize; + sa_hdr_phys_t *sap = data; + sa_hdr_phys_t sa = *sap; + boolean_t swap = B_FALSE; ASSERT(bonustype == DMU_OT_SA); - hdrsize = sa_hdrsize(data); - if (hdrsize != 0) { - *userp = *((uint64_t *)((uintptr_t)data + hdrsize + - SA_UID_OFFSET)); - *groupp = *((uint64_t *)((uintptr_t)data + hdrsize + - SA_GID_OFFSET)); - } else { + if (sa.sa_magic == 0) { /* * This should only happen for newly created * files that haven't had the znode data filled @@ -589,6 +586,25 @@ zfs_space_delta_cb(dmu_object_type_t bonustype, void *data, */ *userp = 0; *groupp = 0; + return (0); + } + if (sa.sa_magic == BSWAP_32(SA_MAGIC)) { + sa.sa_magic = SA_MAGIC; + sa.sa_layout_info = BSWAP_16(sa.sa_layout_info); + swap = B_TRUE; + } else { + VERIFY3U(sa.sa_magic, ==, SA_MAGIC); + } + + hdrsize = sa_hdrsize(&sa); + VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t)); + *userp = *((uint64_t *)((uintptr_t)data + hdrsize + + SA_UID_OFFSET)); + *groupp = *((uint64_t *)((uintptr_t)data + hdrsize + + SA_GID_OFFSET)); + if (swap) { + *userp = BSWAP_64(*userp); + *groupp = BSWAP_64(*groupp); } } return (error); @@ -1870,9 +1886,9 @@ zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting) /* * Evict cached data */ - if (dmu_objset_is_dirty_anywhere(zfsvfs->z_os)) - if (!(zfsvfs->z_vfs->vfs_flag & VFS_RDONLY)) - txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0); + if (dsl_dataset_is_dirty(dmu_objset_ds(zfsvfs->z_os)) && + !(zfsvfs->z_vfs->vfs_flag & VFS_RDONLY)) + txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0); (void) dmu_objset_evict_dbufs(zfsvfs->z_os); return (0); @@ -2305,7 +2321,7 @@ void zfs_init(void) { - printf("ZFS filesystem version " ZPL_VERSION_STRING "\n"); + printf("ZFS filesystem version: " ZPL_VERSION_STRING "\n"); /* * Initialize .zfs directory structures @@ -2389,7 +2405,7 @@ zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers) error = zap_add(os, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1, &sa_obj, tx); - ASSERT3U(error, ==, 0); + ASSERT0(error); VERIFY(0 == sa_set_sa_object(os, sa_obj)); sa_register_update_callback(os, zfs_sa_upgrade); diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c index 874bab80b..dda5879c0 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* Portions Copyright 2007 Jeremy Teo */ @@ -1851,7 +1852,7 @@ top: &xattr_obj, sizeof (xattr_obj)); if (error == 0 && xattr_obj) { error = zfs_zget(zfsvfs, xattr_obj, &xzp); - ASSERT3U(error, ==, 0); + ASSERT0(error); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE); } @@ -1929,11 +1930,11 @@ top: error = sa_update(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &null_xattr, sizeof (uint64_t), tx); - ASSERT3U(error, ==, 0); + ASSERT0(error); } VI_LOCK(vp); vp->v_count--; - ASSERT3U(vp->v_count, ==, 0); + ASSERT0(vp->v_count); VI_UNLOCK(vp); mutex_exit(&zp->z_lock); zfs_znode_delete(zp, tx); @@ -3369,7 +3370,7 @@ top: zp->z_mode = new_mode; ASSERT3U((uintptr_t)aclp, !=, 0); err = zfs_aclset_common(zp, aclp, cr, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); if (zp->z_acl_cached) zfs_acl_free(zp->z_acl_cached); zp->z_acl_cached = aclp; @@ -3897,7 +3898,7 @@ top: error = sa_update(szp->z_sa_hdl, SA_ZPL_FLAGS(zfsvfs), (void *)&szp->z_pflags, sizeof (uint64_t), tx); - ASSERT3U(error, ==, 0); + ASSERT0(error); error = zfs_link_destroy(sdl, szp, tx, ZRENAMING, NULL); if (error == 0) { diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c index b25108333..e047cdcbc 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* Portions Copyright 2007 Jeremy Teo */ @@ -835,7 +836,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr, err = zap_create_claim_norm(zfsvfs->z_os, obj, zfsvfs->z_norm, DMU_OT_DIRECTORY_CONTENTS, obj_type, bonuslen, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); } else { obj = zap_create_norm(zfsvfs->z_os, zfsvfs->z_norm, DMU_OT_DIRECTORY_CONTENTS, @@ -846,7 +847,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr, err = dmu_object_claim(zfsvfs->z_os, obj, DMU_OT_PLAIN_FILE_CONTENTS, 0, obj_type, bonuslen, tx); - ASSERT3U(err, ==, 0); + ASSERT0(err); } else { obj = dmu_object_alloc(zfsvfs->z_os, DMU_OT_PLAIN_FILE_CONTENTS, 0, @@ -1028,7 +1029,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr, if (obj_type == DMU_OT_ZNODE || acl_ids->z_aclp->z_version < ZFS_ACL_VERSION_FUID) { err = zfs_aclset_common(*zpp, acl_ids->z_aclp, cr, tx); - ASSERT3P(err, ==, 0); + ASSERT0(err); } if (!(flag & IS_ROOT_NODE)) { vnode_t *vp; @@ -1524,7 +1525,7 @@ zfs_grow_blocksize(znode_t *zp, uint64_t size, dmu_tx_t *tx) if (error == ENOTSUP) return; - ASSERT3U(error, ==, 0); + ASSERT0(error); /* What blocksize did we actually get? */ dmu_object_size_from_db(sa_get_db(zp->z_sa_hdl), &zp->z_blksz, &dummy); @@ -2048,13 +2049,16 @@ zfs_release_sa_handle(sa_handle_t *hdl, dmu_buf_t *db, void *tag) * or not the object is an extended attribute directory. */ static int -zfs_obj_to_pobj(sa_handle_t *hdl, sa_attr_type_t *sa_table, uint64_t *pobjp, - int *is_xattrdir) +zfs_obj_to_pobj(objset_t *osp, sa_handle_t *hdl, sa_attr_type_t *sa_table, + uint64_t *pobjp, int *is_xattrdir) { uint64_t parent; uint64_t pflags; uint64_t mode; + uint64_t parent_mode; sa_bulk_attr_t bulk[3]; + sa_handle_t *sa_hdl; + dmu_buf_t *sa_db; int count = 0; int error; @@ -2068,9 +2072,32 @@ zfs_obj_to_pobj(sa_handle_t *hdl, sa_attr_type_t *sa_table, uint64_t *pobjp, if ((error = sa_bulk_lookup(hdl, bulk, count)) != 0) return (error); - *pobjp = parent; + /* + * When a link is removed its parent pointer is not changed and will + * be invalid. There are two cases where a link is removed but the + * file stays around, when it goes to the delete queue and when there + * are additional links. + */ + error = zfs_grab_sa_handle(osp, parent, &sa_hdl, &sa_db, FTAG); + if (error != 0) + return (error); + + error = sa_lookup(sa_hdl, ZPL_MODE, &parent_mode, sizeof (parent_mode)); + zfs_release_sa_handle(sa_hdl, sa_db, FTAG); + if (error != 0) + return (error); + *is_xattrdir = ((pflags & ZFS_XATTR) != 0) && S_ISDIR(mode); + /* + * Extended attributes can be applied to files, directories, etc. + * Otherwise the parent must be a directory. + */ + if (!*is_xattrdir && !S_ISDIR(parent_mode)) + return (EINVAL); + + *pobjp = parent; + return (0); } @@ -2119,7 +2146,7 @@ zfs_obj_to_path_impl(objset_t *osp, uint64_t obj, sa_handle_t *hdl, if (prevdb) zfs_release_sa_handle(prevhdl, prevdb, FTAG); - if ((error = zfs_obj_to_pobj(sa_hdl, sa_table, &pobj, + if ((error = zfs_obj_to_pobj(osp, sa_hdl, sa_table, &pobj, &is_xattrdir)) != 0) break; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c index 515e613c2..64230fff9 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2011 by Delphix. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ /* Portions Copyright 2010 Robert Milkowski */ @@ -461,6 +461,37 @@ zil_alloc_lwb(zilog_t *zilog, blkptr_t *bp, uint64_t txg) return (lwb); } +/* + * Called when we create in-memory log transactions so that we know + * to cleanup the itxs at the end of spa_sync(). + */ +void +zilog_dirty(zilog_t *zilog, uint64_t txg) +{ + dsl_pool_t *dp = zilog->zl_dmu_pool; + dsl_dataset_t *ds = dmu_objset_ds(zilog->zl_os); + + if (dsl_dataset_is_snapshot(ds)) + panic("dirtying snapshot!"); + + if (txg_list_add(&dp->dp_dirty_zilogs, zilog, txg) == 0) { + /* up the hold count until we can be written out */ + dmu_buf_add_ref(ds->ds_dbuf, zilog); + } +} + +boolean_t +zilog_is_dirty(zilog_t *zilog) +{ + dsl_pool_t *dp = zilog->zl_dmu_pool; + + for (int t = 0; t < TXG_SIZE; t++) { + if (txg_list_member(&dp->dp_dirty_zilogs, zilog, t)) + return (B_TRUE); + } + return (B_FALSE); +} + /* * Create an on-disk intent log. */ @@ -577,14 +608,21 @@ zil_destroy(zilog_t *zilog, boolean_t keep_first) kmem_cache_free(zil_lwb_cache, lwb); } } else if (!keep_first) { - (void) zil_parse(zilog, zil_free_log_block, - zil_free_log_record, tx, zh->zh_claim_txg); + zil_destroy_sync(zilog, tx); } mutex_exit(&zilog->zl_lock); dmu_tx_commit(tx); } +void +zil_destroy_sync(zilog_t *zilog, dmu_tx_t *tx) +{ + ASSERT(list_is_empty(&zilog->zl_lwb_list)); + (void) zil_parse(zilog, zil_free_log_block, + zil_free_log_record, tx, zilog->zl_header->zh_claim_txg); +} + int zil_claim(const char *osname, void *txarg) { @@ -998,6 +1036,8 @@ zil_lwb_commit(zilog_t *zilog, itx_t *itx, lwb_t *lwb) return (NULL); ASSERT(lwb->lwb_buf != NULL); + ASSERT(zilog_is_dirty(zilog) || + spa_freeze_txg(zilog->zl_spa) != UINT64_MAX); if (lrc->lrc_txtype == TX_WRITE && itx->itx_wr_state == WR_NEED_COPY) dlen = P2ROUNDUP_TYPED( @@ -1069,7 +1109,7 @@ zil_lwb_commit(zilog_t *zilog, itx_t *itx, lwb_t *lwb) lwb->lwb_nused += reclen + dlen; lwb->lwb_max_txg = MAX(lwb->lwb_max_txg, txg); ASSERT3U(lwb->lwb_nused, <=, lwb->lwb_sz); - ASSERT3U(P2PHASE(lwb->lwb_nused, sizeof (uint64_t)), ==, 0); + ASSERT0(P2PHASE(lwb->lwb_nused, sizeof (uint64_t))); return (lwb); } @@ -1218,7 +1258,7 @@ zil_itx_assign(zilog_t *zilog, itx_t *itx, dmu_tx_t *tx) if ((itx->itx_lr.lrc_txtype & ~TX_CI) == TX_RENAME) zil_async_to_sync(zilog, itx->itx_oid); - if (spa_freeze_txg(zilog->zl_spa) != UINT64_MAX) + if (spa_freeze_txg(zilog->zl_spa) != UINT64_MAX) txg = ZILTEST_TXG; else txg = dmu_tx_get_txg(tx); @@ -1269,6 +1309,7 @@ zil_itx_assign(zilog_t *zilog, itx_t *itx, dmu_tx_t *tx) } itx->itx_lr.lrc_txg = dmu_tx_get_txg(tx); + zilog_dirty(zilog, txg); mutex_exit(&itxg->itxg_lock); /* Release the old itxs now we've dropped the lock */ @@ -1278,7 +1319,10 @@ zil_itx_assign(zilog_t *zilog, itx_t *itx, dmu_tx_t *tx) /* * If there are any in-memory intent log transactions which have now been - * synced then start up a taskq to free them. + * synced then start up a taskq to free them. We should only do this after we + * have written out the uberblocks (i.e. txg has been comitted) so that + * don't inadvertently clean out in-memory log records that would be required + * by zil_commit(). */ void zil_clean(zilog_t *zilog, uint64_t synced_txg) @@ -1746,6 +1790,7 @@ zil_close(zilog_t *zilog) mutex_exit(&zilog->zl_lock); if (txg) txg_wait_synced(zilog->zl_dmu_pool, txg); + ASSERT(!zilog_is_dirty(zilog)); taskq_destroy(zilog->zl_clean_taskq); zilog->zl_clean_taskq = NULL; diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c index 3eeeb58ee..0772f133e 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c @@ -126,11 +126,23 @@ zio_init(void) while (p2 & (p2 - 1)) p2 &= p2 - 1; +#ifdef illumos +#ifndef _KERNEL + /* + * If we are using watchpoints, put each buffer on its own page, + * to eliminate the performance overhead of trapping to the + * kernel when modifying a non-watched buffer that shares the + * page with a watched buffer. + */ + if (arc_watch && !IS_P2ALIGNED(size, PAGESIZE)) + continue; +#endif +#endif /* illumos */ if (size <= 4 * SPA_MINBLOCKSIZE) { align = SPA_MINBLOCKSIZE; - } else if (P2PHASE(size, PAGESIZE) == 0) { + } else if (IS_P2ALIGNED(size, PAGESIZE)) { align = PAGESIZE; - } else if (P2PHASE(size, p2 >> 2) == 0) { + } else if (IS_P2ALIGNED(size, p2 >> 2)) { align = p2 >> 2; } @@ -635,7 +647,7 @@ zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp, zp->zp_checksum < ZIO_CHECKSUM_FUNCTIONS && zp->zp_compress >= ZIO_COMPRESS_OFF && zp->zp_compress < ZIO_COMPRESS_FUNCTIONS && - zp->zp_type < DMU_OT_NUMTYPES && + DMU_OT_IS_VALID(zp->zp_type) && zp->zp_level < 32 && zp->zp_copies > 0 && zp->zp_copies <= spa_max_replication(spa) && @@ -919,7 +931,7 @@ zio_read_bp_init(zio_t *zio) zio_push_transform(zio, cbuf, psize, psize, zio_decompress); } - if (!dmu_ot[BP_GET_TYPE(bp)].ot_metadata && BP_GET_LEVEL(bp) == 0) + if (!DMU_OT_IS_METADATA(BP_GET_TYPE(bp)) && BP_GET_LEVEL(bp) == 0) zio->io_flags |= ZIO_FLAG_DONT_CACHE; if (BP_GET_TYPE(bp) == DMU_OT_DDT_ZAP) @@ -2154,7 +2166,7 @@ zio_dva_allocate(zio_t *zio) } ASSERT(BP_IS_HOLE(bp)); - ASSERT3U(BP_GET_NDVAS(bp), ==, 0); + ASSERT0(BP_GET_NDVAS(bp)); ASSERT3U(zio->io_prop.zp_copies, >, 0); ASSERT3U(zio->io_prop.zp_copies, <=, spa_max_replication(spa)); ASSERT3U(zio->io_size, ==, BP_GET_PSIZE(bp)); @@ -3010,3 +3022,45 @@ static zio_pipe_stage_t *zio_pipeline[] = { zio_checksum_verify, zio_done }; + +/* dnp is the dnode for zb1->zb_object */ +boolean_t +zbookmark_is_before(const dnode_phys_t *dnp, const zbookmark_t *zb1, + const zbookmark_t *zb2) +{ + uint64_t zb1nextL0, zb2thisobj; + + ASSERT(zb1->zb_objset == zb2->zb_objset); + ASSERT(zb2->zb_level == 0); + + /* + * A bookmark in the deadlist is considered to be after + * everything else. + */ + if (zb2->zb_object == DMU_DEADLIST_OBJECT) + return (B_TRUE); + + /* The objset_phys_t isn't before anything. */ + if (dnp == NULL) + return (B_FALSE); + + zb1nextL0 = (zb1->zb_blkid + 1) << + ((zb1->zb_level) * (dnp->dn_indblkshift - SPA_BLKPTRSHIFT)); + + zb2thisobj = zb2->zb_object ? zb2->zb_object : + zb2->zb_blkid << (DNODE_BLOCK_SHIFT - DNODE_SHIFT); + + if (zb1->zb_object == DMU_META_DNODE_OBJECT) { + uint64_t nextobj = zb1nextL0 * + (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT) >> DNODE_SHIFT; + return (nextobj <= zb2thisobj); + } + + if (zb1->zb_object < zb2thisobj) + return (B_TRUE); + if (zb1->zb_object > zb2thisobj) + return (B_FALSE); + if (zb2->zb_object == DMU_META_DNODE_OBJECT) + return (B_FALSE); + return (zb1nextL0 <= zb2->zb_blkid); +} diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c index 69ed1d4fe..89e74b864 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c @@ -689,8 +689,18 @@ zvol_last_close(zvol_state_t *zv) { zil_close(zv->zv_zilog); zv->zv_zilog = NULL; + dmu_buf_rele(zv->zv_dbuf, zvol_tag); zv->zv_dbuf = NULL; + + /* + * Evict cached data + */ + if (dsl_dataset_is_dirty(dmu_objset_ds(zv->zv_objset)) && + !(zv->zv_flags & ZVOL_RDONLY)) + txg_wait_synced(dmu_objset_pool(zv->zv_objset), 0); + (void) dmu_objset_evict_dbufs(zv->zv_objset); + dmu_objset_disown(zv->zv_objset, zvol_tag); zv->zv_objset = NULL; } diff --git a/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h b/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h index 6467781ce..1bba5ca7d 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h +++ b/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h @@ -23,6 +23,10 @@ * Use is subject to license terms. */ +/* + * Copyright (c) 2012 by Delphix. All rights reserved. + */ + /* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */ /* All Rights Reserved */ @@ -113,14 +117,18 @@ _NOTE(CONSTCOND) } while (0) #define VERIFY3S(x, y, z) VERIFY3_IMPL(x, y, z, int64_t) #define VERIFY3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t) #define VERIFY3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) +#define VERIFY0(x) VERIFY3_IMPL(x, ==, 0, uintmax_t) + #ifdef DEBUG #define ASSERT3S(x, y, z) VERIFY3_IMPL(x, y, z, int64_t) #define ASSERT3U(x, y, z) VERIFY3_IMPL(x, y, z, uint64_t) #define ASSERT3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) +#define ASSERT0(x) VERIFY3_IMPL(x, ==, 0, uintmax_t) #else #define ASSERT3S(x, y, z) ((void)0) #define ASSERT3U(x, y, z) ((void)0) #define ASSERT3P(x, y, z) ((void)0) +#define ASSERT0(x) ((void)0) #endif #ifdef _KERNEL diff --git a/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h b/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h index 1b23dc2be..64fd2e64b 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h +++ b/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h @@ -172,6 +172,7 @@ typedef enum { ZPOOL_PROP_READONLY, ZPOOL_PROP_COMMENT, ZPOOL_PROP_EXPANDSZ, + ZPOOL_PROP_FREEING, ZPOOL_NUM_PROPS } zpool_prop_t; @@ -245,6 +246,8 @@ const char *zpool_prop_to_name(zpool_prop_t); const char *zpool_prop_default_string(zpool_prop_t); uint64_t zpool_prop_default_numeric(zpool_prop_t); boolean_t zpool_prop_readonly(zpool_prop_t); +boolean_t zpool_prop_feature(const char *); +boolean_t zpool_prop_unsupported(const char *name); int zpool_prop_index_to_string(zpool_prop_t, uint64_t, const char **); int zpool_prop_string_to_index(zpool_prop_t, const char *, uint64_t *); uint64_t zpool_prop_random_value(zpool_prop_t, uint64_t seed); @@ -352,6 +355,7 @@ typedef enum { #define SPA_VERSION_26 26ULL #define SPA_VERSION_27 27ULL #define SPA_VERSION_28 28ULL +#define SPA_VERSION_5000 5000ULL /* * When bumping up SPA_VERSION, make sure GRUB ZFS understands the on-disk @@ -359,8 +363,8 @@ typedef enum { * and do the appropriate changes. Also bump the version number in * usr/src/grub/capability. */ -#define SPA_VERSION SPA_VERSION_28 -#define SPA_VERSION_STRING "28" +#define SPA_VERSION SPA_VERSION_5000 +#define SPA_VERSION_STRING "5000" /* * Symbolic names for the changes that caused a SPA_VERSION switch. @@ -411,6 +415,12 @@ typedef enum { #define SPA_VERSION_DEADLISTS SPA_VERSION_26 #define SPA_VERSION_FAST_SNAP SPA_VERSION_27 #define SPA_VERSION_MULTI_REPLACE SPA_VERSION_28 +#define SPA_VERSION_BEFORE_FEATURES SPA_VERSION_28 +#define SPA_VERSION_FEATURES SPA_VERSION_5000 + +#define SPA_VERSION_IS_SUPPORTED(v) \ + (((v) >= SPA_VERSION_INITIAL && (v) <= SPA_VERSION_BEFORE_FEATURES) || \ + ((v) >= SPA_VERSION_FEATURES && (v) <= SPA_VERSION)) /* * ZPL version - rev'd whenever an incompatible on-disk format change @@ -508,6 +518,12 @@ typedef struct zpool_rewind_policy { #define ZPOOL_CONFIG_BOOTFS "bootfs" /* not stored on disk */ #define ZPOOL_CONFIG_MISSING_DEVICES "missing_vdevs" /* not stored on disk */ #define ZPOOL_CONFIG_LOAD_INFO "load_info" /* not stored on disk */ +#define ZPOOL_CONFIG_REWIND_INFO "rewind_info" /* not stored on disk */ +#define ZPOOL_CONFIG_UNSUP_FEAT "unsup_feat" /* not stored on disk */ +#define ZPOOL_CONFIG_ENABLED_FEAT "enabled_feat" /* not stored on disk */ +#define ZPOOL_CONFIG_CAN_RDONLY "can_rdonly" /* not stored on disk */ +#define ZPOOL_CONFIG_FEATURES_FOR_READ "features_for_read" +#define ZPOOL_CONFIG_FEATURE_STATS "feature_stats" /* not stored on disk */ /* * The persistent vdev state is stored as separate values rather than a single * 'vdev_state' entry. This is because a device can be in multiple states, such @@ -586,6 +602,7 @@ typedef enum vdev_aux { VDEV_AUX_BAD_LABEL, /* the label is OK but invalid */ VDEV_AUX_VERSION_NEWER, /* on-disk version is too new */ VDEV_AUX_VERSION_OLDER, /* on-disk version is too old */ + VDEV_AUX_UNSUP_FEAT, /* unsupported features */ VDEV_AUX_SPARED, /* hot spare used in another pool */ VDEV_AUX_ERR_EXCEEDED, /* too many errors */ VDEV_AUX_IO_FAILURE, /* experienced I/O failure */ diff --git a/sys/cddl/contrib/opensolaris/uts/common/sys/nvpair.h b/sys/cddl/contrib/opensolaris/uts/common/sys/nvpair.h index abf84cf59..3062dd95a 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/sys/nvpair.h +++ b/sys/cddl/contrib/opensolaris/uts/common/sys/nvpair.h @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2012 by Delphix. All rights reserved. */ #ifndef _SYS_NVPAIR_H @@ -274,6 +275,73 @@ int nvpair_value_hrtime(nvpair_t *, hrtime_t *); int nvpair_value_double(nvpair_t *, double *); #endif +nvlist_t *fnvlist_alloc(void); +void fnvlist_free(nvlist_t *); +size_t fnvlist_size(nvlist_t *); +char *fnvlist_pack(nvlist_t *, size_t *); +void fnvlist_pack_free(char *, size_t); +nvlist_t *fnvlist_unpack(char *, size_t); +nvlist_t *fnvlist_dup(nvlist_t *); +void fnvlist_merge(nvlist_t *, nvlist_t *); + +void fnvlist_add_boolean(nvlist_t *, const char *); +void fnvlist_add_boolean_value(nvlist_t *, const char *, boolean_t); +void fnvlist_add_byte(nvlist_t *, const char *, uchar_t); +void fnvlist_add_int8(nvlist_t *, const char *, int8_t); +void fnvlist_add_uint8(nvlist_t *, const char *, uint8_t); +void fnvlist_add_int16(nvlist_t *, const char *, int16_t); +void fnvlist_add_uint16(nvlist_t *, const char *, uint16_t); +void fnvlist_add_int32(nvlist_t *, const char *, int32_t); +void fnvlist_add_uint32(nvlist_t *, const char *, uint32_t); +void fnvlist_add_int64(nvlist_t *, const char *, int64_t); +void fnvlist_add_uint64(nvlist_t *, const char *, uint64_t); +void fnvlist_add_string(nvlist_t *, const char *, const char *); +void fnvlist_add_nvlist(nvlist_t *, const char *, nvlist_t *); +void fnvlist_add_nvpair(nvlist_t *, nvpair_t *); +void fnvlist_add_boolean_array(nvlist_t *, const char *, boolean_t *, uint_t); +void fnvlist_add_byte_array(nvlist_t *, const char *, uchar_t *, uint_t); +void fnvlist_add_int8_array(nvlist_t *, const char *, int8_t *, uint_t); +void fnvlist_add_uint8_array(nvlist_t *, const char *, uint8_t *, uint_t); +void fnvlist_add_int16_array(nvlist_t *, const char *, int16_t *, uint_t); +void fnvlist_add_uint16_array(nvlist_t *, const char *, uint16_t *, uint_t); +void fnvlist_add_int32_array(nvlist_t *, const char *, int32_t *, uint_t); +void fnvlist_add_uint32_array(nvlist_t *, const char *, uint32_t *, uint_t); +void fnvlist_add_int64_array(nvlist_t *, const char *, int64_t *, uint_t); +void fnvlist_add_uint64_array(nvlist_t *, const char *, uint64_t *, uint_t); +void fnvlist_add_string_array(nvlist_t *, const char *, char * const *, uint_t); +void fnvlist_add_nvlist_array(nvlist_t *, const char *, nvlist_t **, uint_t); + +void fnvlist_remove(nvlist_t *, const char *); +void fnvlist_remove_nvpair(nvlist_t *, nvpair_t *); + +nvpair_t *fnvlist_lookup_nvpair(nvlist_t *nvl, const char *name); +boolean_t fnvlist_lookup_boolean(nvlist_t *nvl, const char *name); +boolean_t fnvlist_lookup_boolean_value(nvlist_t *nvl, const char *name); +uchar_t fnvlist_lookup_byte(nvlist_t *nvl, const char *name); +int8_t fnvlist_lookup_int8(nvlist_t *nvl, const char *name); +int16_t fnvlist_lookup_int16(nvlist_t *nvl, const char *name); +int32_t fnvlist_lookup_int32(nvlist_t *nvl, const char *name); +int64_t fnvlist_lookup_int64(nvlist_t *nvl, const char *name); +uint8_t fnvlist_lookup_uint8_t(nvlist_t *nvl, const char *name); +uint16_t fnvlist_lookup_uint16(nvlist_t *nvl, const char *name); +uint32_t fnvlist_lookup_uint32(nvlist_t *nvl, const char *name); +uint64_t fnvlist_lookup_uint64(nvlist_t *nvl, const char *name); +char *fnvlist_lookup_string(nvlist_t *nvl, const char *name); +nvlist_t *fnvlist_lookup_nvlist(nvlist_t *nvl, const char *name); + +boolean_t fnvpair_value_boolean_value(nvpair_t *nvp); +uchar_t fnvpair_value_byte(nvpair_t *nvp); +int8_t fnvpair_value_int8(nvpair_t *nvp); +int16_t fnvpair_value_int16(nvpair_t *nvp); +int32_t fnvpair_value_int32(nvpair_t *nvp); +int64_t fnvpair_value_int64(nvpair_t *nvp); +uint8_t fnvpair_value_uint8_t(nvpair_t *nvp); +uint16_t fnvpair_value_uint16(nvpair_t *nvp); +uint32_t fnvpair_value_uint32(nvpair_t *nvp); +uint64_t fnvpair_value_uint64(nvpair_t *nvp); +char *fnvpair_value_string(nvpair_t *nvp); +nvlist_t *fnvpair_value_nvlist(nvpair_t *nvp); + #ifdef __cplusplus } #endif diff --git a/sys/modules/zfs/Makefile b/sys/modules/zfs/Makefile index 50a58e7eb..89a4e4c5f 100644 --- a/sys/modules/zfs/Makefile +++ b/sys/modules/zfs/Makefile @@ -13,6 +13,7 @@ SRCS+= avl.c .PATH: ${SUNW}/common/nvpair SRCS+= nvpair.c SRCS+= nvpair_alloc_fixed.c +SRCS+= fnvpair.c .PATH: ${.CURDIR}/../../cddl/contrib/opensolaris/common/unicode SRCS+= u8_textprep.c -- 2.45.0