marius [Wed, 22 Sep 2010 20:17:33 +0000 (20:17 +0000)]
MFC: r212729
Merge from powerpc:
- Change putc_func_t to use a char instead of an int for the character.
- Make functions and variables not used outside of this source file static.
- Remove unused prototypes and variables.
- The OFW read and seek methods take 3 and not 4 input arguments.
marius [Wed, 22 Sep 2010 20:15:34 +0000 (20:15 +0000)]
MFC: r212725
Merge r207585 (MFC'ed to stable/8 in r208086) from cas(4):
- Don't probe for PHYs if we already know to use a SERDES. Unlike as with
cas(4) this only serves to speed up the the device attach though and can
only be determined via the OFW device tree but not from the VPD.
- Don't touch the MIF when using a SERDES.
- Add some missing bus space barriers, mainly in the PCS code path.
marius [Wed, 22 Sep 2010 20:03:59 +0000 (20:03 +0000)]
MFC: rr212709, r212730
Add a VIS-based block copy function for SPARC64 V and later, which
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.
marius [Wed, 22 Sep 2010 19:59:11 +0000 (19:59 +0000)]
MFC: r212676
Sync with other platforms:
- make dflt_lock() always panic,
- add kludge to use contigmalloc() when the alignment is larger than the size
and print a diagnostic when we didn't satisfy the alignment.
marius [Wed, 22 Sep 2010 19:55:37 +0000 (19:55 +0000)]
MFC: r212663
- Update the comment in swi_vm() regarding busdma bounce buffers; it's
unlikely that support for these ever will be implemented on sparc64 as
the IOMMUs are able to translate to up to the maximum physical address
supported by the respective machine, bypassing the IOMMU is affected
by hardware errata and being able to support DMA engines which cannot
do at least 32-bit DMA does not justify the costs.
- The page zeroing in uma_small_alloc() may use the VIS-based block zero
function so take advantage of it.
MFC r197804 (rwatson):
Add basename_r(3) to complement basename(3). basename_r(3) which accepts
a caller-allocated buffer of at least MAXPATHLEN, rather than using a
global buffer.
Note about semantics: while this interface is not POSIXy, there's
another major platform that uses it (Android) and the semantics between
the two platforms are pretty much the same.
GCC defines built-ins for atomic instructions found on i486 and higher.
Because FreeBSD no longer supports the 80386 cpu all code targeting
FreeBSD/i386 necessarily runs on i486 or higher so the compiler
built-ins can be used by default inside libstdc++ and in C++ headers.
This allows newly compiled C++ code to inline some atomic operations.
Old binaries continue to use libstdc++ functions.
ed [Tue, 21 Sep 2010 07:01:00 +0000 (07:01 +0000)]
MFC r211598:
Add support for whiteouts on tmpfs.
Right now unionfs only allows filesystems to be mounted on top of
another if it supports whiteouts. Even though I have sent a patch to
daichi@ to let unionfs work without it, we'd better also add support for
whiteouts to tmpfs.
This patch implements .vop_whiteout and makes necessary changes to
lookup() and readdir() to take them into account. We must also make sure
that when adding or removing a file, we honour the componentname's
DOWHITEOUT and ISWHITEOUT, to prevent duplicate filenames.
Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.
The barrier semantics of bioq_insert_tail() were broken in two ways:
o In bioq_disksort(), an added bio could be inserted at the head of
the queue, even when a barrier was present, if the sort key for
the new entry was less than that of the last queued barrier bio.
o The last_offset used to generate the sort key for newly queued bios
did not stay at the position of the barrier until either the
barrier was de-queued, or a new barrier (which updates last_offset)
was queued. When a barrier is in effect, we know that the disk
will pass through the barrier position just before the
"blocked bios" are released, so using the barrier's offset for
last_offset is the optimal choice.
sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
o Update last_offset in bioq_insert_tail().
o Only update last_offset in bioq_remove() if the removed bio is
at the head of the queue (typically due to a call via
bioq_takefirst()) and no barrier is active.
o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
set prev to the barrier and cur to it's next element. Now that
last_offset is kept at the barrier position, this change isn't
strictly necessary, but since we have to take a decision branch
anyway, it does avoid one, no-op, loop iteration in the while
loop that immediately follows.
o In bioq_disksort(), bypass the normal sort for bios with the
BIO_ORDERED attribute and instead insert them into the queue
with bioq_insert_tail(). bioq_insert_tail() not only gives
the desired command order during insertion, but also provides
barrier semantics so that commands disksorted in the future
cannot pass the just enqueued transaction.
sys/sys/bio.h:
Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.
sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
Use an ordered command for SCSI/ATA-NCQ commands issued in
response to bios with the BIO_ORDERED flag set.
sys/cam/scsi/scsi_da.c
Use an ordered tag when issuing a synchronize cache command.
Wrap some lines to 80 columns.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
Mark bios with the BIO_FLUSH command as BIO_ORDERED.
MFC 212293:
Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache.
To preserve the ABI of 'struct nfsnode', the existing timestamp fields
are left with 'n_unusedX' placeholders along with the unused 'n_expiry'
field. The larger n_ctime and n_dmtime fields are added to the end of
the structure.
MFC 209907,212326,212368,212749:
- Provide more defines for PCI-Express device ctrl.
- Add register definitions related to extended capability IDs in
PCI-express. I used PCIZ_* for ID constants (plain capability IDs use
PCIY_*).
- Add register definitions for the Advanced Error Reporting, Virtual
Channels, and Device Serial Number extended capabilities.
- Teach pciconf -c to list extended as well as plain capabilities for
PCI-express devices. Adds more detailed parsing for AER, VC, and
device serial numbers.
MFC 212369:
- Use 'sta' to hold the PCIR_STATUS register value instead of 'cmd' when
walking the capability list.
- Use constants for PCI header types instead of magic numbers.
Correct logic bug in aicasm's undefined register bit access detection code.
The code in question verifies that all register write operations only change
bits that are defined (in the register definition file) for that effected
register. The bug effectively disabled this checking.
o Fix the check by testing the opcode against all supported read ("and" based)
operands.
o Add missing bit definitions to the aic7xxx and aic79xx register definition
files so that the warning (treated as a fatal error) does not spuriously
fire.
When using pf routing options, properly handle IP fragmentation
for interfaces with TSO enabled, otherwise one would see an extra
ICMP unreach, frag needed pre matching packet on lo0.
This syncs pf code to ip_output.c r162084.
Submitted by: yongari via mlaier
Reviewed by: eri
Tested by: kib
PR: kern/144311
MFC r207831: sh(1): Fix "reserved word" vs "keyword" inconsistency.
Use "keyword" everywhere, like the output of the 'type' builtin, and only
mention "reserved word" once to say it is the same thing.
MFC r212417: sh(1): Remove xrefs for expr(1) and getopt(1).
expr(1) should usually not be used as various forms of parameter expansion
and arithmetic expansion replicate most of its functionality in an easier
way.
getopt(1) should not be used at all in new code. Instead, getopts(1) or
entirely manual parsing should be used.
MFC: r212217
Change the code in ncl_bioread() in the experimental NFS
client to return an error when rabp is not set, so it
behaves the same way as the regular NFS client for this
case. It does not affect NFSv4, since nfs_getcacheblk()
only fails for "intr" mounts and NFSv4 can't use the
"intr" mount option.
MFC: r212216
Disable use of the NLM in the experimental NFS client, since
it will crash the kernel because it uses the nfsmount and
nfsnode structures of the regular NFS client.
r210002:
In the example for how to create a VLAN, also include an example of
setting the IP address. While it is documented earlier in rc.conf(5)
that the '.' in the VLAN name becomes a '_' in rc.conf, this may not be
easy to find when just using rc.conf(5) as reference documentation.
r210676:
Fix a bunch of typos and spelling mistakes.
r210812:
Update references from nonexistent usbconfig(1) to usbconfig(8).
r210826:
Correctly sort usbconfig(8) within the SEE ALSO section.
PR: docs/149221
Submitted by: Lars Hartmann (lars at chaotika dot org)
Help with mergeinfo from: kib
Reviewed by: kib, hrs (IPv6 parts that should not be
merged to stable/8)
marius [Sat, 18 Sep 2010 08:25:12 +0000 (08:25 +0000)]
MFC: r212621
Use saner nsegments and maxsegsz parameters when creating certain DMA tags;
tags for 1-byte allocations cannot possibly be split across 2 segments and
maxsegsz must not exceed maxsize.
malo and mwl use the firmware framework to access firmware images.
Depending on the firmware modules itself is not required and in this
case even wrong because no modules with those names exist.
marius [Sat, 18 Sep 2010 08:20:36 +0000 (08:20 +0000)]
MFC: r212620
Remove a KASSERT which will also trigger for perfectly valid combinations
of small maxsize and "large" (including BUS_SPACE_UNRESTRICTED) nsegments
parameters. Generally using a presz of 0 (which indeed might indicate the
use of bogus parameters for DMA tag creation) is not fatal, it just means
that no additional DVMA space will be preallocated.
MFC r212176:
When an asm location cannot be resolved to a function the cost
will be spread as small value and then filtered by the threshold.
As a first step solution display the number of event that cannot
be resolved as a valid function location.
MFhead r210539:
Document that the "ngtee" action no longer accepts packet, and
thus don't depend on one_pass flag anymore.
This is a POLA violation, but it is quite difficult to restore
the old behavior with new code. Also, the new behavior matches
behavior of the older "tee" action, and this is more intuitive.
MFhead 210529:
When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.
This lead to losing LLE_PUB and/or LLE_PROXY flags.
We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.
MFC r212146:
Add workaround for SiI3114 and SiI3512 chips bug, which caused sending
R_ERR in response to DMA activate FIS under certain circumstances. This is
recommended fix from chip datasheet. If triggered, this bug most likely
caused write command timeout.
MFC r212145:
SATA1.x SiliconImage controllers on power-on reset TFD Status register into
value 0xff. On hot-plug this value confuses ata_generic_reset() device
presence detection logic. As soon as we already know drive presence from
SATA hard reset, hint ata_generic_reset() to wait for device signature
until success or full timeout.
MFC: r212125, r212126
Modify lib/libstand/nfs.c to use NFSv3 instead of NFSv2.
This allows the nfs_getrootfh() function to return the
correct file handle size to pxe.c for pxeboot. It also
results in NFSv2 no longer being used by default anywhere
in FreeBSD. If built with OLD_NFSV2 defined, the old
code that predated this patch will be built and NFSv2
will be used. pxe.c is also modified to use this version
of nfs_getrootfh() so that pxeboot will use NFSv3 and
work for non-FreeBSD as well as FreeBSD NFS servers.
MFC: r212123
Modify nfs_diskless.c to recognize the environment variable
boot.nfsroot.nfshandlelen and set the diskless root fs to
use NFSv3 and this file handle length when it is set. If
this environment variable is not set, the diskless root fs
will use NFSv2 and the same defaults as before. This fixes
the problem where the diskless nfs root fs had to be on a
FreeBSD server for NFSv3 to work, because it did not know
the correct file handle length and assumed the size used
by FreeBSD.
marius [Wed, 15 Sep 2010 20:17:18 +0000 (20:17 +0000)]
MFC: r210601 (partial), r211071, r211073
- As it is not possible for sched_bind(9) to context switch with
td_critnest > 1 when not already running on the desired CPU read the
TICK counter of the BSP via a direct cross trap request in that case
instead.
- Provide a STICK based timecounter.
Make sure to only pickup hid_input items when parsing input reports.
As it turns out, libusbhid(3) also picks up hic_collection items even
though we explicitly requested hid_input items only.
marius [Wed, 15 Sep 2010 19:27:30 +0000 (19:27 +0000)]
MFC: r211050 (partial)
- Introduce a cpu_ipi_single() function pointer in order to send IPIs
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
- Factor out the Jalapeno support from the Cheetah IPI send functions
in order to be able to more easily and efficiently implement support
for more than 32 target CPUs as well as a workaround for Cheetah+
erratum 25 for the latter.
marius [Wed, 15 Sep 2010 18:51:14 +0000 (18:51 +0000)]
MFC: r211049, r211568
For CPUs which ignore TD_CV and support hardware unaliasing don't
bother doing page coloring. This results in a small but measurable
performance improvement in buildworld times.
marius [Wed, 15 Sep 2010 18:09:16 +0000 (18:09 +0000)]
MFC: r210176
Allocate proper amount of memory for interrupt names on sparc64 and
sun4v, same as done on other architectures. This removes garbage from
`vmstat -ia` output.
mm [Wed, 15 Sep 2010 16:20:24 +0000 (16:20 +0000)]
MFC r211932, r211947, r211948:
MFC r211932:
Import changes from OpenSolaris that provide
- better ACL caching and speedup of ACL permission checks
- faster handling of stat()
- lowered mutex contention in the read/writer lock (rrwlock)
- several related bugfixes
Detailed information (OpenSolaris onnv changesets and Bug IDs):
9749:105f407a2680 6802734 Support for Access Based Enumeration (not used on FreeBSD) 6844861 inconsistent xattr readdir behavior with too-small buffer
9866:ddc5f1d8eb4e 6848431 zfs with rstchown=0 or file_chown_self privilege allows user to "take" ownership
9981:b4907297e740 6775100 stat() performance on files on zfs should be improved 6827779 rrwlock is overly protective of its counters
10143:d2d432dfe597 6857433 memory leaks found at: zfs_acl_alloc/zfs_acl_node_alloc 6860318 truncate() on zfsroot succeeds when file has a component of its path set without access permission
10232:f37b85f7e03e 6865875 zfs sometimes incorrectly giving search access to a dir
10250:b179ceb34b62 6867395 zpool_upgrade_007_pos testcase panic'd with BAD TRAP: type=e (#pf Page fault)
10269:2788675568fd 6868276 zfs_rezget() can be hazardous when znode has a cached ACL
mm [Wed, 15 Sep 2010 16:10:38 +0000 (16:10 +0000)]
MFC r210398:
Enable fake resolving of SMB RIDs by using nulldomain and UID_NOBODY
- fixes panics when Solaris/OpenSolaris pools that contain files
uploaded with the SMB protocol are accessed
Enable seting/unsetting the sharesmb property (dummy action)
- allows users who import pools from Solaris/Opensolaris to unset
the sharesmb property and get rid of annoying messages
8228:51e9ca9ee3a5 6572357 libzfs should do more to avoid mnttab lookups (141909-01) 6572376 zfs_iter_filesystems and zfs_iter_snapshots get objset stats twice (141909-01)
8241:5a60f16123ba 6328632 zpool offline is a bit too conservative (141445-01) 6739487 ASSERT: txg <= spa_final_txg due to scrub/export race (141445-01) 6767129 ASSERT: cvd->vdev_isspare, in spa_vdev_detach() (141445-01) 6747698 checksum failures after offline -t / export / import / scrub (141445-01) 6745863 ZFS writes to disk after it has been offlined (141445-01) 6722540 50% slowdown on scrub/resilver with certain vdev configurations (141445-01) 6759999 resilver logic rewrites ditto blocks on both source and destination (141445-01) 6758107 I/O should never suspend during spa_load() (141445-01) 6776548 codereview(1) runs off the page when faced with multi-line comments (N/A) 6761406 AMD errata 91 workaround doesn't work on 64-bit systems (141445-01)
8242:e46e4b2f0a03 6770866 GRUB/ZFS should require physical path or devid, but not both (141445-01)
8269:03a7e9050cfd 6674216 "zfs share" doesn't work, but "zfs set sharenfs=on" does (141445-01) 6621164 $SRC/cmd/zfs/zfs_main.c seems to have a syntax error in the translation note (141445-01) 6635482 i18n problems in libzfs_dataset.c and zfs_main.c (141445-01) 6595194 "zfs get" VALUE column is as wide as NAME (141445-01) 6722991 vdev_disk.c: error checking for ddi_pathname_to_dev_t() must test for NODEV (141445-01) 6396518 ASSERT strings shouldn't be pre-processed (141445-01)
8274:846b39508aff 6713916 scrub/resilver needlessly decompress data (141445-01)
8343:655db2375fed 6739553 libzfs_status msgid table is out of sync (141445-01) 6784104 libzfs unfairly rejects numerical values greater than 2^63 (141445-01) 6784108 zfs_realloc() should not free original memory on failure (141445-01)
8525:e0e0e525d0f8 6788830 set large value to reservation cause core dump (141445-01) 6791064 want sysevents for ZFS scrub (141445-01) 6791066 need to be able to set cachefile on faulted pools (141445-01) 6791071 zpool_do_import() should not enable datasets on faulted pools (141445-01) 6792134 getting multiple properties on a faulted pool leads to confusion (141445-01)
8845:91af0d9c0790 6800942 smb_session_create() incorrectly stores IP addresses (N/A) 6582163 Access Control List (ACL) for shares (141445-01) 6804954 smb_search - shortname field should be space padded following the NULL terminator (N/A) 6800184 Panic at smb_oplock_conflict+0x35() (N/A)
8876:59d2e67b4b65 6803822 Reboot after replacement of system disk in a ZFS mirror drops to grub> prompt (141445-01)
8924:5af812f84759 6789318 coredump when issue zdb -uuuu poolname/ (141445-01) 6790345 zdb -dddd -e poolname coredump (141445-01) 6797109 zdb: 'zdb -dddddd pool_name/fs_name inode' coredump if the file with inode was deleted (141445-01) 6797118 zdb: 'zdb -dddddd poolname inum' coredump if I miss the fs name (141445-01) 6803343 shareiscsi=on failed, iscsitgtd failed request to share (141445-01)
9030:243fd360d81f 6815893 hang mounting a dataset after booting into a new boot environment (141445-01)
9179:d8fbd96b79b3 6790064 zfs needs to determine uid and gid earlier in create process (141445-01)
9214:8d350e5d04aa 6604992 forced unmount + being in .zfs/snapshot/<snap1> = not happy (141909-01) 6810367 assertion failed: dvp->v_flag & VROOT, file: ../../common/fs/gfs.c, line: 426 (141909-01)
9229:e3f8b41e5db4 6807765 ztest_dsl_dataset_promote_busy needs to clean up after ENOSPC (141445-01)
9230:e4561e3eb1ef 6821169 offlining a device results in checksum errors (141445-01) 6821170 ZFS should not increment error stats for unavailable devices (141445-01) 6824006 need to increase issue and interrupt taskqs threads in zfs (141445-01)
9234:bffdc4fc05c4 6792139 recovering from a suspended pool needs some work (141445-01) 6794830 reboot command hangs on a failed zfs pool (141445-01)
9246:67c03c93c071 6824062 System panicked in zfs_mount due to NULL pointer dereference when running btts and svvs tests (141909-01)
9276:a8a7fc849933 6816124 System crash running zpool destroy on broken zpool (141445-03)
9355:09928982c591 6818183 zfs snapshot -r is slow due to set_snap_props() doing txg_wait_synced() for each new snapshot (141445-03)
9391:413d0661ef33 6710376 log device can show incorrect status when other parts of pool are degraded (141445-03)
9396:f41cf682d0d3 (part already merged) 6501037 want user/group quotas on ZFS (141445-03) 6827260 assertion failed in arc_read(): hdr == pbuf->b_hdr (141445-03) 6815592 panic: No such hold X on refcount Y from zfs_znode_move (141445-03) 6759986 zfs list shows temporary %clone when doing online zfs recv (141445-03)
9404:319573cd93f8 6774713 zfs ignores canmount=noauto when sharenfs property != off (141445-03)
9425:e7ffacaec3a8 6799895 spa_add_spares() needs to be protected by config lock (141445-03) 6826466 want to post sysevents on hot spare activation (141445-03) 6826468 spa 'allowfaulted' needs some work (141445-03) 6826469 kernel support for storing vdev FRU information (141445-03) 6826470 skip posting checksum errors from DTL regions of leaf vdevs (141445-03) 6826471 I/O errors after device remove probe can confuse FMA (141445-03) 6826472 spares should enjoy some of the benefits of cache devices (141445-03)
9443:2a96d8478e95 6833711 gang leaders shouldn't have to be logical (141445-03)
9463:d0bd231c7518 6764124 want zdb to be able to checksum metadata blocks only (141445-03)
10100:4a6965f6bef8 6856634 snv_117 not booting: zfs_parse_bootfs: error2 (141445-07)
10160:a45b03783d44 6861983 zfs should use new name <-> SID interfaces (N/A) 6862984 userquota commands can hang (141445-06)
10299:80845694147f 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (N/A)
10302:a9e3d1987706 6696858 zfs receive of incremental replication stream can dereference NULL pointer and crash (fix lint) (N/A)
10575:2a8816c5173b (partial merge) 6882227 spa_async_remove() shouldn't do a full clear (142901-14)
10800:469478b180d9 6880764 fsync on zfs is broken if writes are greater than 32kb on a hard crash and no log attached (142901-09) 6793430 zdb -ivvvv assertion failure: bp->blk_cksum.zc_word[2] == dmu_objset_id(zilog->zl_os) (N/A)
10890:499786962772 6807339 spurious checksum errors when replacing a vdev (142901-13)
11249:6c30f7dfc97b 6906110 bad trap panic in zil_replay_log_record (142901-13) 6906946 zfs replay isn't handling uid/gid correctly (142901-13)
11454:6e69bacc1a5a 6898245 suspended zpool should not cause rest of the zfs/zpool commands to hang (142901-10)
11546:42ea6be8961b (partial merge) 6833999 3-way deadlock in dsl_dataset_hold_ref() and dsl_sync_task_group_sync() (142901-09)
MFC r211970:
Fix 'zfs allow' (maybe not only) returning:
cannot access dataset system/usr/home: Operation not supported
by including libzfs_impl.h. What libzfs_impl.h does is to redefine ioctl() to
be compatible with OpenSolaris. More specifically OpenSolaris returns ENOMEM
when buffer is too small and sets field zc_nvlist_dst_size to the size that
will be big enough for the data. In FreeBSD case ioctl() doesn't copy data
structure back in case of a failure. We work-around it in kernel and libzfs by
returning 0 from ioctl() and always checking if zc_nvlist_dst_size hasn't
changed. For this work-around to work in pyzfs we need this compatible ioctl()
which is implemented in libzfs_impl.h.
MFC r211971:
Print errors on stderr.
MFC r211972:
Give user a hint what to do when /usr/lib/zfs/pyzfs.py is missing.
MFC r212050:
When upgrading a pool which contain root file system, give user a hint that
he should update boot code.
MFC r212605:
Add missing vop_vector zfsctl_ops_shares
Add missing locks around VOP_READDIR and VOP_GETATTR with z_shares_dir
Fix a bug with sched_affinity() where it checks td_pinned of another
thread in a racy manner, which can lead to attempting to migrate a
thread that is pinned to a CPU. Instead, have sched_switch() determine
which CPU a thread should run on if the current one is not allowed.
KASSERT in sched_bind() that the thread is not yet pinned to a CPU.
KASSERT in sched_switch() that only migratable threads or those moving
due to a sched_bind() are changing CPUs.
Note that this is direct commit as ipi_cpu() only exists in CURRENT.
Fix a variety of race conditions and errors in VSID allocation in both
the 32 and 64-bit PMAP modules, some dating back to the original PMAP
import from NetBSD. This fixes a variety of potential crashes and memory
corruption bugs, especially on SMP systems under heavy load.
MFC: r212043
Add a null_remove() function to nullfs, so that the v_usecount
of the lower level vnode is incremented to greater than 1 when
the upper level vnode's v_usecount is greater than one. This
is necessary for the NFS clients, so that they will do a silly
rename of the file instead of actually removing it when the
file is still in use. It is "racy", since the v_usecount is
incremented in many places in the kernel with
minimal synchronization, but an extraneous silly rename is
preferred to not doing a silly rename when it is required.
The only other file systems that currently check the value
of v_usecount in their VOP_REMOVE() functions are nwfs and
smbfs. These file systems choose to fail a remove when the
v_usecount is greater than 1 and I believe will function
more correctly with this patch, as well.
MFC r211213:
The buffers b_vflags field is not always properly protected by
bufobj lock. If b_bufobj is not NULL, then bufobj lock should be
held when manipulating the flags. Not doing this sometimes leaves
BV_BKGRDINPROG to be erronously set, causing softdep' getdirtybuf() to
stuck indefinitely in "getbuf" sleep, waiting for background write to
finish which is not actually performed.
Add INVARIANTS checking that numfreebufs values are sane. Also add a
per-buf flag to catch if a buf is double-counted in the free count.
This code was useful to debug an instance where a local patch at Isilon
was incorrectly managing numfreebufs for a new buf state.