]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
4 years agofusefs: don't require FUSE_EXPORT_SUPPORT for async invalidation
asomers [Mon, 3 Jun 2019 20:45:32 +0000 (20:45 +0000)]
fusefs: don't require FUSE_EXPORT_SUPPORT for async invalidation

In r348560 I thought that FUSE_EXPORT_SUPPORT was required for cases where
the node to be invalidated (or the parent of the entry to be invalidated)
wasn't cached.  But I realize now that that's not the case.  During entry
invalidation, if the parent isn't in the vfs hash table, then it must've
been reclaimed.  And since fuse_vnop_reclaim does a cache_purge, that means
the entry to be invalidated has already been removed from the namecache.
And during inode invalidation, if the inode to be invalidated isn't in the
vfs hash table, then it too must've been reclaimed.  In that case it will
have no buffer cache to invalidate.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: support asynchronous cache invalidation
asomers [Mon, 3 Jun 2019 17:34:01 +0000 (17:34 +0000)]
fusefs: support asynchronous cache invalidation

Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an inode's data cache and/or attributes.  This commit implements
that mechanism.  Unlike Linux's implementation, ours requires that the file
system also supports FUSE_EXPORT_SUPPORT (NFS-style lookups).  Otherwise the
invalidation operation will return EINVAL.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: support name cache invalidation
asomers [Sat, 1 Jun 2019 00:11:19 +0000 (00:11 +0000)]
fusefs: support name cache invalidation

Protocol 7.12 adds a way for the server to notify the client that it should
invalidate an entry from its name cache.  This commit implements that
mechanism.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: check the vnode cache when looking up files for the NFS server
asomers [Fri, 31 May 2019 21:22:58 +0000 (21:22 +0000)]
fusefs: check the vnode cache when looking up files for the NFS server

FUSE allows entries to be cached for a limited amount of time.  fusefs's
vnop_lookup method already implements that using the timeout functionality
of cache_lookup/cache_enter_time.  However, lookups for the NFS server go
through a separate path: vfs_vget.  That path can't use the same timeout
functionality because cache_lookup/cache_enter_time only work on pathnames,
whereas vfs_vget works by inode number.

This commit adds entry timeout information to the fuse vnode structure, and
checks it during vfs_vget.  This allows the NFS server to take advantage of
cached entries.  It's also the same path that FUSE's asynchronous cache
invalidation operations will use.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: prefer FUSE_ROOT_ID to literal 1 in the tests
asomers [Fri, 31 May 2019 17:02:37 +0000 (17:02 +0000)]
fusefs: prefer FUSE_ROOT_ID to literal 1 in the tests

Sponsored by: The FreeBSD Foundation

4 years agofusefs: raise protocol level to 7.12
asomers [Wed, 29 May 2019 16:39:52 +0000 (16:39 +0000)]
fusefs: raise protocol level to 7.12

This commit raises the protocol level and adds backwards-compatibility code
to handle structure size changes.  It doesn't implement any new features.
The new features added in protocol 7.12 are:

* server-side umask processing (which FreeBSD won't do)
* asynchronous inode and directory entry invalidation (which I'll do next)

Sponsored by: The FreeBSD Foundation

4 years agofusefs: add comments explaining why 7.11 features aren't implemented
asomers [Wed, 29 May 2019 02:03:08 +0000 (02:03 +0000)]
fusefs: add comments explaining why 7.11 features aren't implemented

Protocol 7.11 adds two new features, but neither of them were defined
correctly.  FUSE_IOCTL messages don't work for 32-bit daemons on a 64-bit
host (fixed in protocol 7.16).  FUSE_POLL is basically unusable until 7.21.
Before 7.21, the client can't choose which events to register for; the
client registers for "something" and the server replies to say which events
the client is registered for.  Also, before 7.21 there was no way for a
client to deregister a file handle.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: raise protocol level to 7.11
asomers [Wed, 29 May 2019 00:54:49 +0000 (00:54 +0000)]
fusefs: raise protocol level to 7.11

This commit adds the definitions for protocol 7.11 but doesn't yet implement
the new features.  The new features are optional, so they can come later.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: raise protocol level to 7.10
asomers [Wed, 29 May 2019 00:01:36 +0000 (00:01 +0000)]
fusefs: raise protocol level to 7.10

Protocol version 7.10 has only one new feature, and I'm choosing not to
implement it, so this commit is basically a noop.  The sole new feature is
the FOPEN_NONSEEKABLE flag, which a fuse file system can return to indicate
that a certain file handle cannot be seeked.  However, I'm unaware of any
file system in ports that uses this flag.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: set the flags fields of fuse_write_in and fuse_read_in
asomers [Tue, 28 May 2019 01:09:19 +0000 (01:09 +0000)]
fusefs: set the flags fields of fuse_write_in and fuse_read_in

These fields are supposed to contain the file descriptor flags as supplied
to open(2) or set by fcntl(2).  The feature is kindof useless on FreeBSD
since we don't supply all of these flags to fuse (because of the weak
relationship between struct file and struct vnode).  But we should at least
set the access mode flags (O_RDONLY, etc).

This is the last fusefs change needed to get full protocol 7.9 support.
There are still a few options we don't support for good reason (mandatory
file locking is dumb, flock support is broken in the protocol until 7.17,
etc), but there's nothing else to do at this protocol level.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: flock(2) locks must be implemented in-kernel
asomers [Tue, 28 May 2019 00:03:46 +0000 (00:03 +0000)]
fusefs: flock(2) locks must be implemented in-kernel

If a FUSE file system sets the FUSE_POSIX_LOCKS flag then it can support
fcntl(2)-style locks directly.  However, the protocol does not adequately
support flock(2)-style locks until revision 7.17.  They must be implemented
locally in-kernel instead.  This unfortunately breaks the interoperability
of fcntl(2) and flock(2) locks for file systems that support the former.
C'est la vie.

Prior to this commit flock(2) would get sent to the server as a
fcntl(2)-style lock with the lock owner field set to stack garbage.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: clear fuse_getattr_in.getattr_flags
asomers [Mon, 27 May 2019 22:25:39 +0000 (22:25 +0000)]
fusefs: clear fuse_getattr_in.getattr_flags

Protocol 7.9 adds this field.  We could use it to store the file handle of
the file whose attributes we're requesting.  However, that requires extra
work at runtime to look up a file handle, and I'm not aware of any file
systems that care.  So it's easiest just to clear it.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: fix an alignment issue in the tests on arm
asomers [Mon, 27 May 2019 21:51:43 +0000 (21:51 +0000)]
fusefs: fix an alignment issue in the tests on arm

Sponsored by:   The FreeBSD Foundation

4 years agofusefs: set FUSE_WRITE_CACHE when writing from cache
asomers [Mon, 27 May 2019 21:36:28 +0000 (21:36 +0000)]
fusefs: set FUSE_WRITE_CACHE when writing from cache

This bit tells the server that we're not sure which uid, gid, and/or pid
originated the write.  I don't know of a single file system that cares, but
it's part of the protocol.

Sponsored by: The FreeBSD Foundation

4 years agofusefs: remove obsolete comments in the tests
asomers [Mon, 27 May 2019 17:14:46 +0000 (17:14 +0000)]
fusefs: remove obsolete comments in the tests

Sponsored by: The FreeBSD Foundation

4 years agofusefs: make the tests more cplusplusy
asomers [Mon, 27 May 2019 17:08:16 +0000 (17:08 +0000)]
fusefs: make the tests more cplusplusy

* Prefer std::unique_ptr to raw pointers
* Prefer pass-by-reference to pass-by-pointer
* Prefer static_cast to C-style cast, unless it's too much typing

Reported by: ngie
Sponsored by: The FreeBSD Foundation

5 years agofusefs: more build fixes
asomers [Sun, 26 May 2019 03:52:35 +0000 (03:52 +0000)]
fusefs: more build fixes

* Fix printf format strings on 32-bit OSes
* Fix -Wclass-memaccess violation on GCC-8 caused by using memset on an object
  of non-trivial type.
* Fix memory leak in MockFS::init
* Fix -Wcast-align error on i386 in expect_readdir
* Fix some heterogenous comparison errors on 32-bit OSes.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: misc build fixes
asomers [Sat, 25 May 2019 21:40:27 +0000 (21:40 +0000)]
fusefs: misc build fixes

* Only build the tests on platforms with C++14 support
* Fix an undefined symbol error on lint builds
* Remove an unused function: fiov_clear

Sponsored by: The FreeBSD Foundation

5 years agofusefs: implement FUSE_ASYNC_READ
asomers [Fri, 24 May 2019 05:12:43 +0000 (05:12 +0000)]
fusefs: implement FUSE_ASYNC_READ

If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle.  Otherwise concurrent reads are not allowed.  This commit implements
it.  Previously we unconditionally disallowed concurrent reads.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: fix some garbage left behind by r348209
asomers [Fri, 24 May 2019 00:56:50 +0000 (00:56 +0000)]
fusefs: fix some garbage left behind by r348209

Sponsored by: The FreeBSD Foundation

5 years agofusefs: fix exporting fuse filesystems with nfsd
asomers [Thu, 23 May 2019 23:06:26 +0000 (23:06 +0000)]
fusefs: fix exporting fuse filesystems with nfsd

A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:

* During read and write operations, implicitly do a FUSE_OPEN if there isn't
  already a valid file handle.  That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
  from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
  start from the beginning and read until it encounters the requested
  offset.

I've done only limited testing over NFS, so there are probably still some
more bugs.  Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: assume the mountpoint's generation is 0
asomers [Thu, 23 May 2019 22:57:57 +0000 (22:57 +0000)]
fusefs: assume the mountpoint's generation is 0

This seems to be libfuse's behavior (its documentation notwithstanding).

Sponsored by: The FreeBSD Foundation

5 years agofusefs: Make fuse file systems NFS-exportable
asomers [Thu, 23 May 2019 00:44:01 +0000 (00:44 +0000)]
fusefs: Make fuse file systems NFS-exportable

This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3).  More work is needed to make the in-kernel nfsd work, because of
its stateless nature.  It doesn't open files prior to doing I/O.  Also, the
NFS-related VOPs currently ignore the entry cache.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: improve attribute cacheing
asomers [Thu, 23 May 2019 00:22:03 +0000 (00:22 +0000)]
fusefs: improve attribute cacheing

Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr.  There are still a few
calls elsewhere that happen as a result of a write.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: fix "recursing on non recursive lockmgr" panic
asomers [Wed, 22 May 2019 23:30:51 +0000 (23:30 +0000)]
fusefs: fix "recursing on non recursive lockmgr" panic

When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner.  When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf.  That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.sync_resize syctl, correctly this time
asomers [Wed, 22 May 2019 19:49:25 +0000 (19:49 +0000)]
fusefs: remove the vfs.fusefs.sync_resize syctl, correctly this time

In r347547 I intended to remove the vfs.fusefs.sync_resize sysctl, leaving
fusefs's behavior as though sync_resize had its default value.  But I forgot
that I had already turned off sync_resize in my development system's
/etc/sysctl.conf.

This commit complete removes the optional behavior that was formerly
controlled by sync_resize.  There's no need for explicitly calling
FUSE_SETATTR after every FUSE_WRITE that extends a file.  The daemon can
infer that the file is being extended.  If this sysctl was added as a
workaround for a buggy daemon, there's no clue as to what that daemon may
have been.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: Allow update mounts
asomers [Tue, 21 May 2019 19:34:39 +0000 (19:34 +0000)]
fusefs: Allow update mounts

Allow "mount -u" to change some mount options for fusefs.

Sponsored by: The FreeBSD Foundation

5 years agogetvfsbyname: prefer sizeof to strlen even for constants
asomers [Tue, 21 May 2019 15:59:17 +0000 (15:59 +0000)]
getvfsbyname: prefer sizeof to strlen even for constants

Clang is smart enough to evaluate strlen() of a constant at compile-time.
However, that won't work in the future if we compile libc with
-ffreestanding.

Reported by: kib
Dissenting: ngie, cem
Sponsored by: The FreeBSD Foundation

5 years agofusefs: eliminate a superfluous fuse_node_setparent
asomers [Mon, 20 May 2019 20:55:01 +0000 (20:55 +0000)]
fusefs: eliminate a superfluous fuse_node_setparent

Sponsored by: The FreeBSD Foundation

5 years agofusefs: unset MNT_LOCAL
asomers [Mon, 20 May 2019 20:54:09 +0000 (20:54 +0000)]
fusefs: unset MNT_LOCAL

The kernel can't tell whether or not a fuse file system is truly local.  But
what really matters is two things:

1) Can I/O to a file system block indefinitely?
2) Can the file system bypass the O_BENEATH restriction during lookup?

For fuse, the answer to both of those question is yes.  So as far as the
kernel is concerned, it's a non-local file system.

Sponsored by: The FreeBSD Foundation

5 years agospecial-case getvfsbyname(3) for fusefs(5)
asomers [Mon, 20 May 2019 19:36:36 +0000 (19:36 +0000)]
special-case getvfsbyname(3) for fusefs(5)

fusefs file systems may have a fsname subtype (set by mount_fusefs's "-o
subtype" option) that gets appended to the fsname as returned by statfs(2).
The subtype is set on a per-mount basis so it isn't part of the struct
vfsconf.  Special-case getvfsbyname to match either the full "fusefs.foobar"
or short "fusefs" fsname.

Sponsored by: The FreeBSD Foundation

5 years agomount_fusefs(8): document the -o subtype option.
asomers [Mon, 20 May 2019 15:58:44 +0000 (15:58 +0000)]
mount_fusefs(8): document the -o subtype option.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: forward UTIME_NOW to the server
asomers [Thu, 16 May 2019 23:17:39 +0000 (23:17 +0000)]
fusefs: forward UTIME_NOW to the server

If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means.  Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.

PR: 237181
Sponsored by: The FreeBSD Foundation

5 years agofusefs: allow the server to specify st_blksize
asomers [Thu, 16 May 2019 22:50:04 +0000 (22:50 +0000)]
fusefs: allow the server to specify st_blksize

If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .

Sponsored by: The FreeBSD Foundation

5 years agofusefs: Upgrade FUSE protocol to version 7.9.
asomers [Thu, 16 May 2019 17:24:11 +0000 (17:24 +0000)]
fusefs: Upgrade FUSE protocol to version 7.9.

This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8.  It doesn't
implement any of 7.9's new features yet.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: diff reduction vs the upstream sources
asomers [Wed, 15 May 2019 22:51:25 +0000 (22:51 +0000)]
fusefs: diff reduction vs the upstream sources

fuse_kernel.h defines the structures used by the FUSE protocol.  Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).

The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header.  Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD.  In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: fix more intermittency in the dev_fuse_poll tests
asomers [Wed, 15 May 2019 20:01:41 +0000 (20:01 +0000)]
fusefs: fix more intermittency in the dev_fuse_poll tests

When using poll, kevent, or select there was a race window during which it
would be impossible to shut down the daemon.  The problem was that poll,
kevent, and select don't return when the file descriptor gets closed (or
maybe it was that the file descriptor got closed before those syscalls were
entered?).  The solution is to impose a timeout on those syscalls, and check
m_quit after they time out.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: fix some intermittency in the Kqueue.data test
asomers [Wed, 15 May 2019 19:23:29 +0000 (19:23 +0000)]
fusefs: fix some intermittency in the Kqueue.data test

Expect the FUSE_GETATTR operations for bar and baz to come in either order.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: don't track a file's size in two places
asomers [Wed, 15 May 2019 00:38:52 +0000 (00:38 +0000)]
fusefs: don't track a file's size in two places

fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning.  It was very confusing.  This commit eliminates the former.  It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.

Sponsored by: The FreeBSD Foundation

5 years agomount_fusefs(8): fix inverted condition check from r347544
asomers [Wed, 15 May 2019 00:15:40 +0000 (00:15 +0000)]
mount_fusefs(8): fix inverted condition check from r347544

Sponsored by: The FreeBSD Foundation

5 years agofusefs: eliminate superfluous FUSE_GETATTR when filesize=0
asomers [Mon, 13 May 2019 23:30:06 +0000 (23:30 +0000)]
fusefs: eliminate superfluous FUSE_GETATTR when filesize=0

fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc.  But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero.  Fix by adding a distinct flag value.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.data_cache_invalidate sysctl
asomers [Mon, 13 May 2019 20:57:21 +0000 (20:57 +0000)]
fusefs: remove the vfs.fusefs.data_cache_invalidate sysctl

This sysctl was added > 6.5 years ago and I don't know why.  The description
seems at odds with the code.  While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.mmap_enable sysctl
asomers [Mon, 13 May 2019 20:42:09 +0000 (20:42 +0000)]
fusefs: remove the vfs.fusefs.mmap_enable sysctl

This sysctl was added > 6.5 years ago for no clear reason.  Perhaps it was
intended to gate an unstable feature?  But now there's no reason to globally
disable mmap.  I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.refresh_size sysctl
asomers [Mon, 13 May 2019 20:31:10 +0000 (20:31 +0000)]
fusefs: remove the vfs.fusefs.refresh_size sysctl

This was added > 6.5 years ago with no evident reason why.  It probably had
something to do with the incomplete cached attribute implementation.  But
cache attributes work now.  I see no reason to retain this sysctl.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: commit missing file from r347547
asomers [Mon, 13 May 2019 19:48:57 +0000 (19:48 +0000)]
fusefs: commit missing file from r347547

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.sync_resize syctl
asomers [Mon, 13 May 2019 19:47:31 +0000 (19:47 +0000)]
fusefs: remove the vfs.fusefs.sync_resize syctl

This sysctl was added > 6.5 years ago for no clear purpose.  I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now.  Since there's no clear motivation for
this sysctl, it's best to remove it.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: remove the vfs.fusefs.fix_broken_io sysctl
asomers [Mon, 13 May 2019 19:31:09 +0000 (19:31 +0000)]
fusefs: remove the vfs.fusefs.fix_broken_io sysctl

This looks like it may have been a workaround for a specific buggy FUSE
filesystem.  However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: reap dead sysctls
asomers [Mon, 13 May 2019 19:03:46 +0000 (19:03 +0000)]
fusefs: reap dead sysctls

Remove the "sync_unmount" and "init_backgrounded" sysctls and the associated
options from mount_fusefs.  Add no backwards-compatibility hidden options to
mount_fusefs because these options never had any effect, and are therefore
unlikely to be used.

Sponsored by: The FreeBSD Foundation

5 years agoMFHead @347527
asomers [Mon, 13 May 2019 18:25:55 +0000 (18:25 +0000)]
MFHead @347527

Sponsored by: The FreeBSD Foundation

5 years ago[skip ci] fusefs: remove an obsolete comment
asomers [Mon, 13 May 2019 15:39:54 +0000 (15:39 +0000)]
[skip ci] fusefs: remove an obsolete comment

Sponsored by: The FreeBSD Foundation

5 years agofusefs: enhance an SDT probe added in r346998
asomers [Mon, 13 May 2019 15:39:19 +0000 (15:39 +0000)]
fusefs: enhance an SDT probe added in r346998

Sponsored by: The FreeBSD Foundation

5 years agoDo not leak memory used for binary filter.
ae [Mon, 13 May 2019 14:07:02 +0000 (14:07 +0000)]
Do not leak memory used for binary filter.

5 years agoRework locking in BPF code to remove rwlock from fast path.
ae [Mon, 13 May 2019 13:45:28 +0000 (13:45 +0000)]
Rework locking in BPF code to remove rwlock from fast path.

On high packets rate the contention on rwlock in bpf_*tap*() functions
can lead to packets dropping. To avoid this, migrate this code to use
epoch(9) KPI and ConcurrencyKit's lists.

* all lists changed to use CK_LIST;
* reference counting added to bpf_if and bpf_d;
* now bpf_if references ifnet and releases this reference on destroy;
* each bpf_d descriptor references bpf_if when it is attached;
* new struct bpf_program_buffer introduced to keep BPF filter programs;
* bpf_program_buffer, bpf_d and bpf_if structures are freed by
  epoch_call();
* bpf_freelist and ifnet_departure event are no longer needed, thus
  both are removed;

Reviewed by: melifaro
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D20224

5 years agoRevert r347356 and r347371
manu [Mon, 13 May 2019 12:38:33 +0000 (12:38 +0000)]
Revert r347356 and r347371

passwd related files need to be tagged as config file so pkg update
will attempt merging them when we install a new package.
We should use CONFS for that.
Revert for now until I come up with a better version of this patch as
it breaks pkgbase for users.

5 years agoRevert r347402. After r347429 symlink is no longer needed.
ae [Mon, 13 May 2019 08:34:13 +0000 (08:34 +0000)]
Revert r347402. After r347429 symlink is no longer needed.

5 years agoCatch up with r347241.
markj [Mon, 13 May 2019 01:18:17 +0000 (01:18 +0000)]
Catch up with r347241.

MFC with: r347241

5 years agoAdd support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
br [Sun, 12 May 2019 16:17:05 +0000 (16:17 +0000)]
Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.

The first core on this SoC (hart 0) is a 64-bit microcontroller.

o Pick a hart to run boot process using hart lottery.
  This allows to exclude hart 0 from running the boot process.
  (BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
  Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
  allows to find out the correct destination for IPIs and remote sfence
  calls.

Thanks to SiFive, Inc for the board provided.

Reviewed by: markj
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D20225

5 years agofusefs: Report the number of available ops in kevent(2)
asomers [Sun, 12 May 2019 15:27:18 +0000 (15:27 +0000)]
fusefs: Report the number of available ops in kevent(2)

Just like /dev/devctl, /dev/fuse will now report the number of operations
available for immediate read in the kevent.data field during kevent(2).

Sponsored by: The FreeBSD Foundation

5 years agoarm: allwinner: aw_clk_nm: Don't reparent the clock if we didn't ask
manu [Sun, 12 May 2019 15:27:01 +0000 (15:27 +0000)]
arm: allwinner: aw_clk_nm: Don't reparent the clock if we didn't ask

When looking for the best frequency don't change the clock parent if the
clock wasn't configured to do that.

5 years agocache: fix a brainfart in r347505
mjg [Sun, 12 May 2019 07:56:01 +0000 (07:56 +0000)]
cache: fix a brainfart in r347505

If bumping over the counter goes over the limit we have to decrement it back.

Previous code would only bump the counter after adding the entry (thus allowing
the cache to go over the limit).

Sponsored by: The FreeBSD Foundation

5 years agoseqc: fix sed-introduced typos (seqcuence -> sequence)
mjg [Sun, 12 May 2019 07:13:25 +0000 (07:13 +0000)]
seqc: fix sed-introduced typos (seqcuence -> sequence)

Sponsored by: The FreeBSD Foundation

5 years agoamd64: tidy up pagezero*/pagecopy (movq -> movl)
mjg [Sun, 12 May 2019 07:11:44 +0000 (07:11 +0000)]
amd64: tidy up pagezero*/pagecopy (movq -> movl)

Sponsored by: The FreeBSD Foundation

5 years agocache: bump numcache on entry, while here fix lnumcache type
mjg [Sun, 12 May 2019 06:59:22 +0000 (06:59 +0000)]
cache: bump numcache on entry, while here fix lnumcache type

Sponsored by: The FreeBSD Foundation

5 years agoamd64: fixup MEMMOVE comment (10 -> r10)
mjg [Sun, 12 May 2019 06:42:17 +0000 (06:42 +0000)]
amd64: fixup MEMMOVE comment (10 -> r10)

Sponsored by: The FreeBSD Foundation

5 years agocache: push sdt probes in cache_zap_locked to code doing the work
mjg [Sun, 12 May 2019 06:39:30 +0000 (06:39 +0000)]
cache: push sdt probes in cache_zap_locked to code doing the work

Avoids branching to check which probe to evaluate. Very same check was
being done later to do the actual work.

Sponsored by: The FreeBSD Foundation

5 years agox86: store pending bitmapped IPIs in per-cpu areas
mjg [Sun, 12 May 2019 06:36:54 +0000 (06:36 +0000)]
x86: store pending bitmapped IPIs in per-cpu areas

This gets rid of the global cpu_ipi_pending array.

While replace cmpset with fcmpset in the delivery code and opportunistically
check if given IPI is already pending.

Sponsored by: The FreeBSD Foundation

5 years agoamd64: stop re-reading curpc in suword
mjg [Sun, 12 May 2019 06:34:58 +0000 (06:34 +0000)]
amd64: stop re-reading curpc in suword

Plugs re-reads missed in r341719

Sponsored by: The FreeBSD Foundation

5 years agorandom(4): depessimize arc4random
mjg [Sun, 12 May 2019 06:32:46 +0000 (06:32 +0000)]
random(4): depessimize arc4random

- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number

Reviewed by: cem
Approved by: secteam (delphij)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20242

5 years agofusefs: support kqueue for /dev/fuse
asomers [Sat, 11 May 2019 22:58:25 +0000 (22:58 +0000)]
fusefs: support kqueue for /dev/fuse

/dev/fuse was already pollable with poll and select.  Add support for
kqueue, too.  And add tests for polling with poll, select, and kqueue.

Sponsored by: The FreeBSD Foundation

5 years agoFactor code into two new functions in preparation for a future commit.
rmacklem [Sat, 11 May 2019 22:41:58 +0000 (22:41 +0000)]
Factor code into two new functions in preparation for a future commit.

Factor code into two functions.
read_exportfile() a functon  which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.

MFC after: 1 month

5 years agoCorrect a handful of typos.
schweikh [Sat, 11 May 2019 19:31:54 +0000 (19:31 +0000)]
Correct a handful of typos.

5 years agoSupport the use of the ipsec kld.
cy [Sat, 11 May 2019 17:59:13 +0000 (17:59 +0000)]
Support the use of the ipsec kld.

X-MFC with: r347410

5 years agoA new parameter to blist_alloc specifies an upper bound on the size of
dougm [Sat, 11 May 2019 16:15:13 +0000 (16:15 +0000)]
A new parameter to blist_alloc specifies an upper bound on the size of
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.

I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20001

5 years agorevert r346588 for now
jhibbits [Sat, 11 May 2019 15:17:42 +0000 (15:17 +0000)]
revert r346588 for now

The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5.  Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.

5 years agotwsi: Calculate the clock param based on the bus frequency
manu [Sat, 11 May 2019 15:03:51 +0000 (15:03 +0000)]
twsi: Calculate the clock param based on the bus frequency

Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).

5 years agoallwinner: clk: sun8i_r: Correct resets
manu [Sat, 11 May 2019 15:02:55 +0000 (15:02 +0000)]
allwinner: clk: sun8i_r: Correct resets

The i2c reset wasn't defined and some bits where wrong, correct them.

5 years agoallwinner: clk: prediv_mux: Init the current parent
manu [Sat, 11 May 2019 15:02:20 +0000 (15:02 +0000)]
allwinner: clk: prediv_mux: Init the current parent

Do not init the first parent but read the clock register to find
it's current parent and init this one.

5 years agoUpdate leap-seconds to leap-seconds.3757622400.
delphij [Sat, 11 May 2019 14:22:21 +0000 (14:22 +0000)]
Update leap-seconds to leap-seconds.3757622400.

As per https://datacenter.iers.org/data/latestVersion/16_BULLETIN_C16.txt:

     INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE (IERS)

SERVICE INTERNATIONAL DE LA ROTATION TERRESTRE ET DES SYSTEMES DE REFERENCE

SERVICE DE LA ROTATION TERRESTRE DE L'IERS
OBSERVATOIRE DE PARIS
61, Av. de l'Observatoire 75014 PARIS (France)
Tel.      : +33 1 40 51 23 35
e-mail    : services.iers@obspm.fr
http://hpiers.obspm.fr/eop-pc

                                              Paris, 07 January 2019

                                              Bulletin C 57

                                              To authorities responsible
                                              for the measurement and
                                              distribution of time

                          INFORMATION ON UTC - TAI

 NO leap second will be introduced at the end of June 2019.
 The difference between Coordinated Universal Time UTC and the
 International Atomic Time TAI is :

     from 2017 January 1, 0h UTC, until further notice : UTC-TAI = -37 s

 Leap seconds can be introduced in UTC at the end of the months of December
 or June,  depending on the evolution of UT1-TAI. Bulletin C is mailed every
 six months, either to announce a time step in UTC, or to confirm that there
 will be no time step at the next possible date.

                                            Christian BIZOUARD
                                            Director
                                            Earth Orientation Center of IERS
    Observatoire de Paris, France

Requested by: rgrimes
Obtained from: ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3757622400
MFC after: 3 days

5 years agoCallers of swp_pager_getswapspace get either as many blocks as they
dougm [Sat, 11 May 2019 10:16:43 +0000 (10:16 +0000)]
Callers of swp_pager_getswapspace get either as many blocks as they
requested, or none, and in the latter case it is up to them to pick a
smaller request to make - which they always do by halving the failed
request. This change to swp_pager_getswapspace leaves the task of
downsizing the request to the function and not its caller. It still
does so by halving the original request.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20228

5 years agoWhen bitpos can't be implemented with an inline ffs* instruction,
dougm [Sat, 11 May 2019 09:09:10 +0000 (09:09 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20237

5 years agotuntap: Improve style
kevans [Sat, 11 May 2019 04:18:06 +0000 (04:18 +0000)]
tuntap: Improve style

No functional change.

tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.

5 years agoRevert r347469.
dougm [Sat, 11 May 2019 02:13:52 +0000 (02:13 +0000)]
Revert r347469.

Approved by: kib (mentor)

5 years agoFactor out some exportlist list operations into separate functions.
rmacklem [Fri, 10 May 2019 23:52:17 +0000 (23:52 +0000)]
Factor out some exportlist list operations into separate functions.

This patch moves the code that removes and frees all exportlist elements
out into a separate function called free_exports().
It does the same for the insertion of a new exportlist entry into a list.
It also adds a second argument to ex_search() for the list to use.
None of these changes have any semantic effect. They are being done to
prepare the code for future patches that convert the single linked list
for the exportlist to a hash table of lists and a patch that will do
incremental changes of exports in the kernel.
And it fixes the argument for SLIST_HEAD_INITIALIZER() to be a pointer,
which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use
the argument.

MFC after: 1 month

5 years agonetdump: Ref the interface we're attached to
cem [Fri, 10 May 2019 23:12:59 +0000 (23:12 +0000)]
netdump: Ref the interface we're attached to

Serialize netdump configuration / deconfiguration, and discard our
configuration when the affiliated interface goes away by monitoring
ifnet_departure_event.

Reviewed by: markj, with input from vangyzen@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20206

5 years agoDon't use _Generic, as many systems don't know about it. Go back to a lo-tech switch...
dougm [Fri, 10 May 2019 23:12:37 +0000 (23:12 +0000)]
Don't use _Generic, as many systems don't know about it.  Go back to a lo-tech switch statement.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20235

5 years agonetdump: Fix boot-time configuration typo
cem [Fri, 10 May 2019 23:10:22 +0000 (23:10 +0000)]
netdump: Fix boot-time configuration typo

Boot-time netdump configuration is much more useful if one can configure the
client and gateway addresses.  Fix trivial typo.

(Long-standing bug, I believe it dates to the original netdump commit.)

Spotted by: one of vangyzen@ or markj@
Sponsored by: Dell EMC Isilon

5 years agoImplement linux_pci_unregister_drm_driver in linuxkpi so that drm drivers
johalun [Fri, 10 May 2019 23:10:22 +0000 (23:10 +0000)]
Implement linux_pci_unregister_drm_driver in linuxkpi so that drm drivers
can be unloaded.

This patch is a part of D19565.

Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week

5 years agoWhen bitpos can't be implemented with an inline ffs* instruction,
dougm [Fri, 10 May 2019 22:49:01 +0000 (22:49 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20232

5 years agoAdd a (q)uit option to the subr_blist test program.
dougm [Fri, 10 May 2019 22:02:29 +0000 (22:02 +0000)]
Add a (q)uit option to the subr_blist test program.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20234

5 years agonetdump: Don't store sensitive key data we don't need
cem [Fri, 10 May 2019 21:55:11 +0000 (21:55 +0000)]
netdump: Don't store sensitive key data we don't need

Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf
with embedded diocskerneldump_arg before r347192), were copied in their
entirety to the global 'nd_conf' variable.  Also prior to this revision,
de-configuring netdump would *not* remove the the key material from global
nd_conf.

As part of Encrypted Kernel Crash Dumps (EKCD), which was developed
contemporaneously with netdump but happened to land first, the
diocskerneldump_arg structure will contain sensitive key material
(kda_key[]) when encrypted dumps are configured.

Netdump doesn't have any use for the key data -- encryption is handled in
the core dumper code -- so in this revision, we no longer store it.

Unfortunately, I think this leak dates to the initial import of netdump in
r333283; so it's present in FreeBSD 12.0.

Fortunately, the impact *seems* relatively minor.  Any new *netdump*
configuration would overwrite the key material; for active encrypted netdump
configurations, the key data stored was just a duplicate of the key material
already in the core dumper code; and no user interface (other than
/dev/kmem) actually exposed the leaked material to userspace.

Reviewed by: markj, rpokala (earlier commit message)
MFC after: 2 weeks
Security: yes (minor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20233

5 years agoFix regression from r347375: do not panic when sending an IP multicast
glebius [Fri, 10 May 2019 21:51:17 +0000 (21:51 +0000)]
Fix regression from r347375: do not panic when sending an IP multicast
packet from an interface that doesn't have IPv4 address.

Reported by: Michael Butler <imb protected-networks.net>

5 years agoApply r280991 to ip6_fragment.
jhb [Fri, 10 May 2019 20:15:40 +0000 (20:15 +0000)]
Apply r280991 to ip6_fragment.

This uses m_dup_pkthdr() to copy all of the metadata about a packet to
each of its fragments including VLAN tags, mbuf tags, etc. instead of
hand-copying a few fields.

Reviewed by: bz
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117

5 years agoReplace the expression "-mask & ~mask" with a function call that does
dougm [Fri, 10 May 2019 19:55:29 +0000 (19:55 +0000)]
Replace the expression "-mask & ~mask" with a function call that does
the same thing, but is commented so that it might be better
understood.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20231

5 years agopowerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970
jhibbits [Fri, 10 May 2019 19:36:14 +0000 (19:36 +0000)]
powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970

Since we now have a much larger KVA on powerpc64, it's possible to get SLB
traps earlier in boot, possibly even before the HIOR is properly configured
for us.  Move the HIOR setup to immediately after reset, so that we use our
exception handlers instead of Open Firmware's.

PR: 233863
Submitted by: Mark Millard (partial)
Reported by: Mark Millard
MFC after: 2 weeks

5 years agoblist_next_leaf_alloc walks over all the meta-nodes between one leaf
dougm [Fri, 10 May 2019 18:25:06 +0000 (18:25 +0000)]
blist_next_leaf_alloc walks over all the meta-nodes between one leaf
and the next one, and if blocks are allocated from the next leaf, it
walks back toward where it started, as long as there are interleaving
meta-nodes to be updated on account of the last free blocks under
those meta-nodes being allocated. Only if the walk goes all the way
back to the starting point must we calculate the position of the
meta-node that is the least-comment parent of one leaf and the next,
and update a bit in that meta-node to indicate the allocation of its
last free block.

There's no need to start calculating the position of that least-common
parent until the walk back reaches the original starting point, and
there's no need for a calculation that updates 'radius' to tell us
when we've walked back to the beginning, since comparing scan to next
suffices for that.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20229

5 years agoReplace panic() with KASSERT() and provide more useful information when failure happens.
dougm [Fri, 10 May 2019 18:22:40 +0000 (18:22 +0000)]
Replace panic() with KASSERT() and provide more useful information when failure happens.

Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20226

5 years agofusefs: fix intermittency in the interrupt tests
asomers [Fri, 10 May 2019 18:18:41 +0000 (18:18 +0000)]
fusefs: fix intermittency in the interrupt tests

* In the fatal_signal test, wait for the daemon to receive FUSE_INTERRUPT
  before exiting.
* Explicitly disable restarting syscalls after SIGUSR2.  This fixes
  intermittency in the priority test.  I don't know why, but sometimes that
  test's mkdir would be restarted, and sometimes it would return EINTR.
  ERESTART should be the default.
* Remove a useless copy/pasted sleep in the priority test.

Sponsored by: The FreeBSD Foundation

5 years agofusefs: debugability improvements in the tests
asomers [Fri, 10 May 2019 18:14:39 +0000 (18:14 +0000)]
fusefs: debugability improvements in the tests

Fix a mislocated statement from r347431, and add more detail for FUSE_MKDIR

Sponsored by: The FreeBSD Foundation

5 years agoFix build race with machine links and genoffset.o.
bdrewery [Fri, 10 May 2019 18:09:27 +0000 (18:09 +0000)]
Fix build race with machine links and genoffset.o.

Generate the ilinks for all dependency objects not just the ones
in the CLEAN list.

Possibly related to r345351

Reported by: kmoore
MFC after: 2 weeks
X-MFC-with: r345351
Sponsored by: Dell EMC Isilon

5 years agoFix build issue with clang 8.0.1
luporl [Fri, 10 May 2019 17:05:40 +0000 (17:05 +0000)]
Fix build issue with clang 8.0.1

The algorithm header is needed to use std::remove_if