asomers [Mon, 27 May 2019 17:08:16 +0000 (17:08 +0000)]
fusefs: make the tests more cplusplusy
* Prefer std::unique_ptr to raw pointers
* Prefer pass-by-reference to pass-by-pointer
* Prefer static_cast to C-style cast, unless it's too much typing
Reported by: ngie
Sponsored by: The FreeBSD Foundation
asomers [Sun, 26 May 2019 03:52:35 +0000 (03:52 +0000)]
fusefs: more build fixes
* Fix printf format strings on 32-bit OSes
* Fix -Wclass-memaccess violation on GCC-8 caused by using memset on an object
of non-trivial type.
* Fix memory leak in MockFS::init
* Fix -Wcast-align error on i386 in expect_readdir
* Fix some heterogenous comparison errors on 32-bit OSes.
asomers [Fri, 24 May 2019 05:12:43 +0000 (05:12 +0000)]
fusefs: implement FUSE_ASYNC_READ
If a daemon sets the FUSE_ASYNC_READ flag during initialization, then the
client is allowed to issue multiple concurrent reads for the same file
handle. Otherwise concurrent reads are not allowed. This commit implements
it. Previously we unconditionally disallowed concurrent reads.
asomers [Thu, 23 May 2019 23:06:26 +0000 (23:06 +0000)]
fusefs: fix exporting fuse filesystems with nfsd
A previous commit made fuse exportable via userland NFS servers.
Compatibility with the in-kernel nfsd required two more changes:
* During read and write operations, implicitly do a FUSE_OPEN if there isn't
already a valid file handle. That's because nfsd never calls VOP_OPEN.
* During VOP_READDIR, if an implicit open was necessary, directory offsets
from a previous VOP_READDIR may not be valid, so VOP_READDIR may have to
start from the beginning and read until it encounters the requested
offset.
I've done only limited testing over NFS, so there are probably still some
more bugs. Thanks to rmacklem for all of the readdir changes, which he had
made for his pnfs work.
asomers [Thu, 23 May 2019 00:44:01 +0000 (00:44 +0000)]
fusefs: Make fuse file systems NFS-exportable
This commit adds the VOPs needed by userspace NFS servers (tested with
net/unfs3). More work is needed to make the in-kernel nfsd work, because of
its stateless nature. It doesn't open files prior to doing I/O. Also, the
NFS-related VOPs currently ignore the entry cache.
asomers [Thu, 23 May 2019 00:22:03 +0000 (00:22 +0000)]
fusefs: improve attribute cacheing
Consolidate all calls to fuse_vnode_setsize as a result of a file attribute
change to one location in fuse_internal_setattr. There are still a few
calls elsewhere that happen as a result of a write.
asomers [Wed, 22 May 2019 23:30:51 +0000 (23:30 +0000)]
fusefs: fix "recursing on non recursive lockmgr" panic
When mounted with -o default_permissions and when
vfs.fusefs.data_cache_mode=2, fuse_io_strategy would try to clear the suid
bit after a successful write by a non-owner. When combined with a
not-yet-committed attribute-caching patch I'm working on, and if the
FUSE_SETATTR response indicates an unexpected filesize (legal, if the file
system has other clients), this would end up calling vtruncbuf. That would
panic, because the buffer lock was already held by bufwrite or bufstrategy
or something else upstack from fuse_vnop_strategy.
asomers [Wed, 22 May 2019 19:49:25 +0000 (19:49 +0000)]
fusefs: remove the vfs.fusefs.sync_resize syctl, correctly this time
In r347547 I intended to remove the vfs.fusefs.sync_resize sysctl, leaving
fusefs's behavior as though sync_resize had its default value. But I forgot
that I had already turned off sync_resize in my development system's
/etc/sysctl.conf.
This commit complete removes the optional behavior that was formerly
controlled by sync_resize. There's no need for explicitly calling
FUSE_SETATTR after every FUSE_WRITE that extends a file. The daemon can
infer that the file is being extended. If this sysctl was added as a
workaround for a buggy daemon, there's no clue as to what that daemon may
have been.
asomers [Tue, 21 May 2019 15:59:17 +0000 (15:59 +0000)]
getvfsbyname: prefer sizeof to strlen even for constants
Clang is smart enough to evaluate strlen() of a constant at compile-time.
However, that won't work in the future if we compile libc with
-ffreestanding.
Reported by: kib
Dissenting: ngie, cem
Sponsored by: The FreeBSD Foundation
asomers [Mon, 20 May 2019 19:36:36 +0000 (19:36 +0000)]
special-case getvfsbyname(3) for fusefs(5)
fusefs file systems may have a fsname subtype (set by mount_fusefs's "-o
subtype" option) that gets appended to the fsname as returned by statfs(2).
The subtype is set on a per-mount basis so it isn't part of the struct
vfsconf. Special-case getvfsbyname to match either the full "fusefs.foobar"
or short "fusefs" fsname.
asomers [Thu, 16 May 2019 23:17:39 +0000 (23:17 +0000)]
fusefs: forward UTIME_NOW to the server
If a user sets both atime and mtime to UTIME_NOW when calling a syscall like
utimensat(2), allow the server to choose what "now" means. Due to the
design of FreeBSD's VFS, it's not possible to do this for just one of atime
or mtime; it's all or none.
asomers [Thu, 16 May 2019 22:50:04 +0000 (22:50 +0000)]
fusefs: allow the server to specify st_blksize
If the server sets fuse_attr.blksize to a nonzero value in the response to
FUSE_GETATTR, then the client should use that as the value for
stat.st_blksize .
asomers [Thu, 16 May 2019 17:24:11 +0000 (17:24 +0000)]
fusefs: Upgrade FUSE protocol to version 7.9.
This commit upgrades the FUSE API to protocol 7.9 and adds unit tests for
backwards compatibility with servers built for version 7.8. It doesn't
implement any of 7.9's new features yet.
asomers [Wed, 15 May 2019 22:51:25 +0000 (22:51 +0000)]
fusefs: diff reduction vs the upstream sources
fuse_kernel.h defines the structures used by the FUSE protocol. Originally
it came from libfuse, but the current source of truth is the Linux kernel.
This commit minimizes the diffs between our version and the Linux version as
of 21f3da95d (protocol version 7.8).
The flags field of struct fuse_listxattr_out and fuse_listxattr_in was an
error in our header. Those fields don't exist in Linux or libfuse, and
they've never been used in FreeBSD. In fact, those structs don't even exist
in Linux and libfuse; those projects confusingly overload the identical
fuse_getexattr_in and fuse_getxattr_out structs.
asomers [Wed, 15 May 2019 20:01:41 +0000 (20:01 +0000)]
fusefs: fix more intermittency in the dev_fuse_poll tests
When using poll, kevent, or select there was a race window during which it
would be impossible to shut down the daemon. The problem was that poll,
kevent, and select don't return when the file descriptor gets closed (or
maybe it was that the file descriptor got closed before those syscalls were
entered?). The solution is to impose a timeout on those syscalls, and check
m_quit after they time out.
asomers [Wed, 15 May 2019 00:38:52 +0000 (00:38 +0000)]
fusefs: don't track a file's size in two places
fuse_vnode_data.filesize was mostly redundant with
fuse_vnode_data.cached_attrs.st_size, but didn't have exactly the same
meaning. It was very confusing. This commit eliminates the former. It
also eliminates fuse_vnode_refreshsize, which ignored the cache timeout
value.
asomers [Mon, 13 May 2019 23:30:06 +0000 (23:30 +0000)]
fusefs: eliminate superfluous FUSE_GETATTR when filesize=0
fuse_vnode_refreshsize was using 0 as a flag value for filesize meaning
"uninitialized" (thanks to the malloc(...M_ZERO) in fuse_vnode_alloc. But
this led to unnecessary getattr operations when the filesize legitimately
happened to be zero. Fix by adding a distinct flag value.
asomers [Mon, 13 May 2019 20:57:21 +0000 (20:57 +0000)]
fusefs: remove the vfs.fusefs.data_cache_invalidate sysctl
This sysctl was added > 6.5 years ago and I don't know why. The description
seems at odds with the code. While it's supposed to "discard clean cached
data" during VOP_INACTIVE, it looks like it would discard any cached data,
clean or otherwise.
asomers [Mon, 13 May 2019 20:42:09 +0000 (20:42 +0000)]
fusefs: remove the vfs.fusefs.mmap_enable sysctl
This sysctl was added > 6.5 years ago for no clear reason. Perhaps it was
intended to gate an unstable feature? But now there's no reason to globally
disable mmap. I'm not deleting the -ono_mmap mount option just yet, because
it might be useful as a workaround for bug 237588.
asomers [Mon, 13 May 2019 20:31:10 +0000 (20:31 +0000)]
fusefs: remove the vfs.fusefs.refresh_size sysctl
This was added > 6.5 years ago with no evident reason why. It probably had
something to do with the incomplete cached attribute implementation. But
cache attributes work now. I see no reason to retain this sysctl.
asomers [Mon, 13 May 2019 19:47:31 +0000 (19:47 +0000)]
fusefs: remove the vfs.fusefs.sync_resize syctl
This sysctl was added > 6.5 years ago for no clear purpose. I'm guessing
that it may have had something to do with the incomplete attribute cache.
But the attribute cache works now. Since there's no clear motivation for
this sysctl, it's best to remove it.
asomers [Mon, 13 May 2019 19:31:09 +0000 (19:31 +0000)]
fusefs: remove the vfs.fusefs.fix_broken_io sysctl
This looks like it may have been a workaround for a specific buggy FUSE
filesystem. However, there's no information about what that bug may have
been, and the workaround is > 6.5 years old, so I consider the sysctl to be
unmaintainable.
asomers [Mon, 13 May 2019 19:03:46 +0000 (19:03 +0000)]
fusefs: reap dead sysctls
Remove the "sync_unmount" and "init_backgrounded" sysctls and the associated
options from mount_fusefs. Add no backwards-compatibility hidden options to
mount_fusefs because these options never had any effect, and are therefore
unlikely to be used.
ae [Mon, 13 May 2019 13:45:28 +0000 (13:45 +0000)]
Rework locking in BPF code to remove rwlock from fast path.
On high packets rate the contention on rwlock in bpf_*tap*() functions
can lead to packets dropping. To avoid this, migrate this code to use
epoch(9) KPI and ConcurrencyKit's lists.
* all lists changed to use CK_LIST;
* reference counting added to bpf_if and bpf_d;
* now bpf_if references ifnet and releases this reference on destroy;
* each bpf_d descriptor references bpf_if when it is attached;
* new struct bpf_program_buffer introduced to keep BPF filter programs;
* bpf_program_buffer, bpf_d and bpf_if structures are freed by
epoch_call();
* bpf_freelist and ifnet_departure event are no longer needed, thus
both are removed;
manu [Mon, 13 May 2019 12:38:33 +0000 (12:38 +0000)]
Revert r347356 and r347371
passwd related files need to be tagged as config file so pkg update
will attempt merging them when we install a new package.
We should use CONFS for that.
Revert for now until I come up with a better version of this patch as
it breaks pkgbase for users.
br [Sun, 12 May 2019 16:17:05 +0000 (16:17 +0000)]
Add support for HiFive Unleashed -- the board with a multi-core RISC-V SoC
from SiFive, Inc.
The first core on this SoC (hart 0) is a 64-bit microcontroller.
o Pick a hart to run boot process using hart lottery.
This allows to exclude hart 0 from running the boot process.
(BBL releases hart 0 after the main harts, so it never wins the lottery).
o Renumber CPUs early on boot.
Exclude non-MMU cores. Store the original hart ID in struct pcpu. This
allows to find out the correct destination for IPIs and remote sfence
calls.
mjg [Sun, 12 May 2019 06:32:46 +0000 (06:32 +0000)]
random(4): depessimize arc4random
- __predict_false reseeding on entry as it is almost never true.
- don't blindly atomic_cmpset as on x86 it ends up dirtying the cacheline.
it almost ever succeeds per above
- fetch the timestamp prior to getting the cpu number
Reviewed by: cem
Approved by: secteam (delphij)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20242
rmacklem [Sat, 11 May 2019 22:41:58 +0000 (22:41 +0000)]
Factor code into two new functions in preparation for a future commit.
Factor code into two functions.
read_exportfile() a functon which reads the exports file(s) and calls
get_exportlist_one() to process each of them.
delete_export() a function which deletes the exports in the kernel for a file
system.
The contents of these functions is just the same code as was used to do the
operations, moved into separate functions. As such, there is no semantic change.
This is being done in preparation for a future commit that will add an
option to do incremental changes of kernel exports upon receiving SIGHUP.
dougm [Sat, 11 May 2019 16:15:13 +0000 (16:15 +0000)]
A new parameter to blist_alloc specifies an upper bound on the size of
the allocation request, so that the blocks allocated are from the next
set of free blocks big enough to satisfy the minimum requirements of
the request, and the number of blocks allocated are as many as
possible, up to the specified maximum. The implementation of
swp_pager_getswapspace uses this parameter to ask for a number of
blocks between the new halved request size and the previous failed
request size. Thus a request for 32 blocks may fail, but instead of
getting only 16 blocks instead, the caller asks for 16 to 31 next, and
might get 19 or 27, which is closer to what they originally wanted.
I expect this to lead to bigger block allocations and less block
fragmentation, at least in some cases.
jhibbits [Sat, 11 May 2019 15:17:42 +0000 (15:17 +0000)]
revert r346588 for now
The rewrite of strcmp in assembly uses an instruction added in PowerISA
2.05, making it SIGILL on CPUs older than the POWER6, such as the PPC970 in
the PowerMac G5. Revert this until we get clang+lld, or retire the in-tree
binutils in favor of newer binutils with IFUNC support, whichever comes
first.
manu [Sat, 11 May 2019 15:03:51 +0000 (15:03 +0000)]
twsi: Calculate the clock param based on the bus frequency
Instead of precalculating the different speed, respect the bus frequency
and calculate the clock register parameter based on it.
If the platform didn't register the core clk, fallback on the precomputed
values (This is likely do be the case on Marvell boards).
As per https://datacenter.iers.org/data/latestVersion/16_BULLETIN_C16.txt:
INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE (IERS)
SERVICE INTERNATIONAL DE LA ROTATION TERRESTRE ET DES SYSTEMES DE REFERENCE
SERVICE DE LA ROTATION TERRESTRE DE L'IERS
OBSERVATOIRE DE PARIS
61, Av. de l'Observatoire 75014 PARIS (France)
Tel. : +33 1 40 51 23 35
e-mail : services.iers@obspm.fr
http://hpiers.obspm.fr/eop-pc
Paris, 07 January 2019
Bulletin C 57
To authorities responsible
for the measurement and
distribution of time
INFORMATION ON UTC - TAI
NO leap second will be introduced at the end of June 2019.
The difference between Coordinated Universal Time UTC and the
International Atomic Time TAI is :
from 2017 January 1, 0h UTC, until further notice : UTC-TAI = -37 s
Leap seconds can be introduced in UTC at the end of the months of December
or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
six months, either to announce a time step in UTC, or to confirm that there
will be no time step at the next possible date.
Christian BIZOUARD
Director
Earth Orientation Center of IERS
Observatoire de Paris, France
Requested by: rgrimes
Obtained from: ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3757622400
MFC after: 3 days
dougm [Sat, 11 May 2019 10:16:43 +0000 (10:16 +0000)]
Callers of swp_pager_getswapspace get either as many blocks as they
requested, or none, and in the latter case it is up to them to pick a
smaller request to make - which they always do by halving the failed
request. This change to swp_pager_getswapspace leaves the task of
downsizing the request to the function and not its caller. It still
does so by halving the original request.
dougm [Sat, 11 May 2019 09:09:10 +0000 (09:09 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.
kevans [Sat, 11 May 2019 04:18:06 +0000 (04:18 +0000)]
tuntap: Improve style
No functional change.
tun_flags of the tuntap_driver was renamed to ident_flags to reflect the
fact that it's a subset of the tun_flags that identifies a tuntap device.
This maps more easily (visually) to the TUN_DRIVER_IDENT_MASK that masks off
the bits of tun_flags that are applicable to tuntap driver ident. This is a
purely cosmetic change.
rmacklem [Fri, 10 May 2019 23:52:17 +0000 (23:52 +0000)]
Factor out some exportlist list operations into separate functions.
This patch moves the code that removes and frees all exportlist elements
out into a separate function called free_exports().
It does the same for the insertion of a new exportlist entry into a list.
It also adds a second argument to ex_search() for the list to use.
None of these changes have any semantic effect. They are being done to
prepare the code for future patches that convert the single linked list
for the exportlist to a hash table of lists and a patch that will do
incremental changes of exports in the kernel.
And it fixes the argument for SLIST_HEAD_INITIALIZER() to be a pointer,
which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use
the argument.
cem [Fri, 10 May 2019 23:12:59 +0000 (23:12 +0000)]
netdump: Ref the interface we're attached to
Serialize netdump configuration / deconfiguration, and discard our
configuration when the affiliated interface goes away by monitoring
ifnet_departure_event.
Reviewed by: markj, with input from vangyzen@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20206
dougm [Fri, 10 May 2019 22:49:01 +0000 (22:49 +0000)]
When bitpos can't be implemented with an inline ffs* instruction,
change the binary search so that it does not depend on a single bit
only being set in the bitmask. Use bitpos more generally, and avoid
some clearing of bits to accommodate its current behavior.
cem [Fri, 10 May 2019 21:55:11 +0000 (21:55 +0000)]
netdump: Don't store sensitive key data we don't need
Prior to this revision, struct diocskerneldump_arg (and struct netdump_conf
with embedded diocskerneldump_arg before r347192), were copied in their
entirety to the global 'nd_conf' variable. Also prior to this revision,
de-configuring netdump would *not* remove the the key material from global
nd_conf.
As part of Encrypted Kernel Crash Dumps (EKCD), which was developed
contemporaneously with netdump but happened to land first, the
diocskerneldump_arg structure will contain sensitive key material
(kda_key[]) when encrypted dumps are configured.
Netdump doesn't have any use for the key data -- encryption is handled in
the core dumper code -- so in this revision, we no longer store it.
Unfortunately, I think this leak dates to the initial import of netdump in
r333283; so it's present in FreeBSD 12.0.
Fortunately, the impact *seems* relatively minor. Any new *netdump*
configuration would overwrite the key material; for active encrypted netdump
configurations, the key data stored was just a duplicate of the key material
already in the core dumper code; and no user interface (other than
/dev/kmem) actually exposed the leaked material to userspace.
jhb [Fri, 10 May 2019 20:15:40 +0000 (20:15 +0000)]
Apply r280991 to ip6_fragment.
This uses m_dup_pkthdr() to copy all of the metadata about a packet to
each of its fragments including VLAN tags, mbuf tags, etc. instead of
hand-copying a few fields.
jhibbits [Fri, 10 May 2019 19:36:14 +0000 (19:36 +0000)]
powerpc: Initialize the Hardware Interrupt Offset Register (HIOR) earlier for ppc970
Since we now have a much larger KVA on powerpc64, it's possible to get SLB
traps earlier in boot, possibly even before the HIOR is properly configured
for us. Move the HIOR setup to immediately after reset, so that we use our
exception handlers instead of Open Firmware's.
PR: 233863
Submitted by: Mark Millard (partial)
Reported by: Mark Millard
MFC after: 2 weeks
dougm [Fri, 10 May 2019 18:25:06 +0000 (18:25 +0000)]
blist_next_leaf_alloc walks over all the meta-nodes between one leaf
and the next one, and if blocks are allocated from the next leaf, it
walks back toward where it started, as long as there are interleaving
meta-nodes to be updated on account of the last free blocks under
those meta-nodes being allocated. Only if the walk goes all the way
back to the starting point must we calculate the position of the
meta-node that is the least-comment parent of one leaf and the next,
and update a bit in that meta-node to indicate the allocation of its
last free block.
There's no need to start calculating the position of that least-common
parent until the walk back reaches the original starting point, and
there's no need for a calculation that updates 'radius' to tell us
when we've walked back to the beginning, since comparing scan to next
suffices for that.
asomers [Fri, 10 May 2019 18:18:41 +0000 (18:18 +0000)]
fusefs: fix intermittency in the interrupt tests
* In the fatal_signal test, wait for the daemon to receive FUSE_INTERRUPT
before exiting.
* Explicitly disable restarting syscalls after SIGUSR2. This fixes
intermittency in the priority test. I don't know why, but sometimes that
test's mkdir would be restarted, and sometimes it would return EINTR.
ERESTART should be the default.
* Remove a useless copy/pasted sleep in the priority test.
manu [Fri, 10 May 2019 16:45:17 +0000 (16:45 +0000)]
arm64: rockchip: Don't always put PLL to normal mode
We used to put every PLL in normal mode (meaning that the output would
be the result of the PLL configuration) instead of slow mode (the output
is equal to the external oscillator frequency, 24-26Mhz) but this doesn't
work for most of the PLLs as when we put them into normal mode the registers
configuring the output frequency haven't been set.
Add a normal_mode member in clk_pll_def/clk_pll_sc struct and if it's true
we then set the PLL to normal mode.
For now only set it to the LPLL and BPLL (Little cluster PLL and Big cluster
PLL respectively).
markj [Fri, 10 May 2019 16:43:47 +0000 (16:43 +0000)]
Atomically update the global gMsgId in libnetgraph.
Otherwise concurrently running threads may inadvertently use the same
token for different messages.
Preserve the behaviour of disallowing negative message tokens, but allow
a message token value of zero since this simplifies the code a bit and
tokens are documented to be non-negative.
PR: 234442
Reported and tested by: eugen
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
asomers [Fri, 10 May 2019 15:02:29 +0000 (15:02 +0000)]
fusefs: fix running multiple daemons concurrently
When a FUSE daemon dies or closes /dev/fuse, all of that daemon's pending
requests must be terminated. Previously that was done in /dev/fuse's
.d_close method. However, d_close only gets called on the *last* close of
the device. That means that if multiple daemons were running concurrently,
all but the last daemon to close would leave their I/O hanging around. The
problem was easily visible just by running "kyua -v parallelism=2 test" in
fusefs's test directory.
Fix this bug by terminating a daemon's pending I/O during /dev/fuse's
cdvpriv dtor method instead. That method runs on every close of a file.
Also, fix some potential races in the tests:
* Clear SA_RESTART when registering the daemon's signal handler so read(2)
will return EINTR.
* Wait for the daemon to die before unmounting the mountpoint, so we won't
see an unwanted FUSE_DESTROY operation in the mock file system.
gallatin [Fri, 10 May 2019 13:41:19 +0000 (13:41 +0000)]
Bind TCP HPTS (pacer) threads to NUMA domains
Bind the TCP pacer threads to NUMA domains and build per-domain
pacer-thread lookup tables. These tables allow us to use the
inpcb's NUMA domain information to match an inpcb with a pacer
thread on the same domain.
The motivation for this is to keep the TCP connection local to a
NUMA domain as much as possible.
Thanks to jhb for pre-reviewing an earlier version of the patch.
kevans [Fri, 10 May 2019 13:18:22 +0000 (13:18 +0000)]
ifconfig(8): Add kld mappings for ipsec/enc
Additionally, providing mappings makes the comparison for already loaded
modules a little more strict. This should have been done at initial
introduction, but there was no real reason- however, it proves necessary for
enc which has a standard enc -> if_enc mapping but there also exists an
'enc' module that's actually CAM. The mapping lets us unambiguously
determine the correct module.