John Baldwin [Fri, 11 Jun 2021 21:56:28 +0000 (14:56 -0700)]
Remove 'make update'.
In the CVS days this used be a wrapper around either CVS or CVSup and
used to support updating src, doc, and ports checkouts. With the move
to subversion this only supported updating src and was itself a
wrapper around 'svn update'. With Git, users are probably better off
using appropriate Git commands directly to update without needing an
explicit make target as a wrapper.
Warner Losh [Fri, 11 Jun 2021 20:13:31 +0000 (14:13 -0600)]
mtk: Add printing of CPU model
Add the line that's in other foo_machdep.c file where the CPU model is
reported.
This was part of github pull request 459, but in a different way. The
rest of that pull request was either committed or not relevant. I did it
in a more correct way.
Warner Losh [Fri, 11 Jun 2021 16:37:36 +0000 (10:37 -0600)]
style: Relax 80 column rule
Note that the 80 column rule has been relaxed for some time when things
are clearer when a little longer. Add in that things that people grep
for, such as error messages, shouldn't be broken up which is the most
common reason people exceed 80 columns intentionally.
Warner Losh [Fri, 11 Jun 2021 16:33:04 +0000 (10:33 -0600)]
style: tweak tab after #define advice
Once upon a time, #define<tab> was cultural thing. However, even when it
was promulgated, it was a minority usage. 20 years ago the split was
30k/69k (tab/space) and today the split is 80k/546k (tab/space). Update
guidance to allow either with the usual suggestion to be consistent
within a file.
Mariusz Zaborski [Fri, 11 Jun 2021 15:35:36 +0000 (17:35 +0200)]
libnv: optimize nvlist size calculation
If we had a multiple nvlist, during nvlist_pack, we calculated the size
of every nvlist separately. For example, if we had a nvlist with three
nodes each containing another (A contains B, and B contains C), we first
calculated the size of nvlist A (which contains B, C), then we calculate
the size of B (which contains C, notice that we already did the
calculation of B, when we calculate A), and finally C. This means that
this calculation was O(N!). This was done because each time we pack
nvlist, we have to put its size in the header
(the separate header for A, B, and C).
To not break the ABI and to reduce the complexity of nvlist_size,
instead of calculating the nvlist size when requested,
we track the size of each nvlist.
Randall Stewart [Fri, 11 Jun 2021 15:38:08 +0000 (11:38 -0400)]
tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and
then not including a timestamp. This involved in the input path doing a goto done_with_input
label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN.
This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this
missing m_freem() but rack did not). This then caused the missing m_freem to show
up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs
later (after processing) is not a good idea, even though its only for logging. Best to
copy that off before any frees can take place.
Lutz Donnerhacke [Sat, 15 May 2021 15:35:36 +0000 (17:35 +0200)]
libalias: tidy up housekeeping
Replace current expensive, but sparsly called housekeeping
by a single, repetive action.
This is part of a larger restructure of libalias in order to switch to
more efficient data structures. The whole restructure process is
split into 15 reviews to ease reviewing. All those steps will be
squashed into a single commit for MFC in order to hide the
intermediate states from production systems.
Randall Stewart [Thu, 10 Jun 2021 12:33:57 +0000 (08:33 -0400)]
tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent
commits from Richard that cause the socket buffer lock to be held over
the ip_output() call and then finally culminating in a call to tcp_handle_wakeup()
we get a lot of leaked mbufs. I don't think that this leak is actually caused
by holding the lock or what Richard has done, but is exposing some other
bug that has probably been lying dormant for a long time. I will continue to
look (using his changes) at what is going on to try to root cause out the issue.
In the meantime I can't leave the leaks out for everyone else. So this commit
will revert all of Richards changes and move both Rack and BBR back to just
doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
We may want to look at adding back in Richards changes after I have pinpointed
the root cause of the mbuf leak and fixed it.
Dmitry Chagin [Thu, 10 Jun 2021 12:11:25 +0000 (15:11 +0300)]
Split kern_poll() on two counterparts.
The kern_poll_kfds() operates on clear kernel data, kfds points to an
array in the kernel, while kern_poll() operates on user supplied pollfd.
Move nfds check to kern_poll_maxfds().
No functional changes, it's for future use in the Linux emulation layer.
Casper services expect that the first 3 descriptors (stdin/stdout/stderr)
will point to /dev/null. Which Casper will ensure later. The Casper
services are forked from the original process. If the initial process
closes one of those descriptors, Casper may reuse one of them for it on
purpose. If this is the case, then renumarate the descriptors used by
Casper to higher numbers. This is done already after the fork, so it
doesn't break the parent process.
Warner Losh [Thu, 10 Jun 2021 00:10:12 +0000 (18:10 -0600)]
mk: WITH_FOO=no now generates a warning
Many people are used to gnu configure's behavior of changing
--with-foo=no to --without-foo. At the same time, several folks have
WITH_FOO=no in their config files to enable this ironic form of the
option because of an old meme from IRC, a mailing list or the forums (I
forget which). Add a warning to allow to alert people w/o breaking POLA.
Neel Chauhan [Wed, 9 Jun 2021 21:38:52 +0000 (14:38 -0700)]
linuxkpi: Add macros for might_lock_nested() and lockdep_(re/un/)pin_lock()
In Linux, these are macros to locks in the kernel for scheduling purposes.
But as with other macros in this header, we aren't doing anything with them
so we are doing `do {} while (0)` for now.
Randall Stewart [Wed, 9 Jun 2021 17:58:54 +0000 (13:58 -0400)]
tcp: LRO timestamps have lost their previous precision
Recently we had a rewrite to tcp_lro.c that was tested but one subtle change
was the move to a less precise timestamp. This causes all kinds of chaos
in tcp's that do pacing and needs to be fixed to use the more precise
time that was there before.
Rick Macklem [Wed, 9 Jun 2021 15:00:43 +0000 (08:00 -0700)]
nfscl: Add a "has acquired a delegation" flag for delegations
A problem was reported via email, where a large (130000+) accumulation
of NFSv4 opens on an NFSv4 mount caused significant lock contention
on the mutex used to protect the client mount's open/lock state.
Although the root cause for the accumulation of opens was not
resolved, it is obvious that the NFSv4 client is not designed to
handle 100000+ opens efficiently.
For a common case where delegations are not being issued by the
NFSv4 server, the code acquires the mutex lock for open/lock state,
finds the delegation list empty and just unlocks the mutex and returns.
This patch adds an NFS mount point flag that is set when a delegation
is issued for the mount. Then the patched code checks for this flag
before acquiring the open/lock mutex, avoiding the need to acquire
the lock for the case where delegations are not being issued by the
NFSv4 server.
This change appears to be performance neutral for a small number
of opens, but should reduce lock contention for a large number of opens
for the common case where server is not issuing delegations.
This commit should not affect the high level semantics of delegation
handling.
Test functionality of ng_vlan_rotate(4):
- Rotate 1 to 9 stagged vlans in any possible direction and length
- Rotate random combinations of ethertypes (8100, 88a8, 9100)
- Automatic reverse rotating for backward data flow
- Test too many and to few vlans
Test functionality of ng_hub(4):
- replicting traffic to anything but the sending hook
- persistence
- an unrestricted loop
- implementation limits with many hooks.
Jessica Clarke [Tue, 8 Jun 2021 17:30:59 +0000 (18:30 +0100)]
tip: Fix pointer-vs-integer confusion
Currently IREMOTE assumes that every value is (initially) a pointer to a
long. This is true for NUMBERs, but false for STRINGs, which are instead
pointers to pointers, though on ILP32 and LP64 systems these happen to
have the same representation, but this is still a strict aliasing
violation, and of course breaks on systems where the representations are
not the same, such as CHERI. We do not currently have any BOOLs (short,
curiously) or CHARs used with IREMOTE, though the code should not be
relying on that.
This removes the unused setaddress macro, and the now-unused address
macro due to the above issue.
Jessica Clarke [Tue, 8 Jun 2021 17:30:59 +0000 (18:30 +0100)]
tip: Cast via intptr_t not long when casting between pointer and int
Whilst all FreeBSD architectures have the same representation for
intptr_t and long (even if the former is int on ILP32 architectures),
this is more general and correct, and on CHERI they are not the same so
warnings are generated by default for integer-to-pointer casts that
aren't via (u)intptr_t.
Marcin Wojtas [Thu, 27 May 2021 08:09:04 +0000 (10:09 +0200)]
Add ofw interface support to PCI
Some arm64 SoCs have nodes in their fdts that describe devices
connected to the internal PCI bus. One such SoC is Freescale LS1028A.
In order to access information stored in them we need to add ofw bus
support to pci. Pass devinfo request up to our parent, which
is responsible for parsing all the information.
It allows to use ofw interface on PCI devices that support it.
This method is similar to sys/dev/acpica/acpi_pci.c.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30181
Marcin Wojtas [Thu, 27 May 2021 08:07:19 +0000 (10:07 +0200)]
pci_host_generic_fdt.c: Add support for mapping dts nodes to PCI devices
Some arm64 SoCs have nodes in their fdts that describe devices
connected to the internal PCI bus. One such SoC is Freescale LS1028A.
It expects the nodes to be mapped to devices enumerated using the standard
PCI method. Mapping is done by reading device and function ids from "reg"
property. Information is dts is used to describe MDIO/PHY connected
to a given interface.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30180
Warner Losh [Tue, 8 Jun 2021 14:57:42 +0000 (08:57 -0600)]
updating: add note about vendor/openzfs branch rename
Add a pointer to
https://lists.freebsd.org/archives/freebsd-current/2021-June/000153.html
to explain how to pull a new tree due to the vendor/openzfs branch
being renamed.
Martin Matuska [Tue, 8 Jun 2021 14:48:37 +0000 (16:48 +0200)]
zfs: merge openzfs/zfs@75b4cbf62 (master) into main
Notable upstream pull request merges:
#11710 Allow zfs to send replication streams with missing snapshots
#11751 Avoid taking global lock to destroy zfsdev state
#11786 Ratelimit deadman zevents as with delay zevents
#11803 ZFS traverse_visitbp optimization to limit prefetch
#11813 Allow pool names that look like Solaris disk names
#11822 Atomically check and set dropped zevent count
#11822 Don't scale zfs_zevent_len_max by CPU count
#11833 Refactor zfsdev state init/destroy to share common code
#11837 zfs get -p only outputs 3 columns if "clones" property is empty
#11843 libzutil: zfs_isnumber(): return false if input empty
#11849 Use dsl_scan_setup_check() to setup a scrub
#11861 Improvements to the 'compatibility' property
#11862 cmd/zfs receive: allow dry-run (-n) to check property args
#11864 receive: don't fail inheriting (-x) properties on wrong dataset type
#11877 Combine zio caches if possible
#11881 FreeBSD: use vnlru_free_vfsops if available
#11883 FreeBSD: add support for lockless symlink lookup
#11884 FreeBSD: add missing seqc write begin/end around zfs_acl_chown_setattr
#11896 Fix crash in zio_done error reporting
#11905 zfs-send(8): Restore sorting of flags
#11926 FreeBSD: damage control racing .. lookups in face of mkdir/rmdir
#11930 vdev_mirror: don't scrub/resilver devices that can't be read
#11938 Fix AVX512BW Fletcher code on AVX512-but-not-BW machines
#11955 zfs get: don't lookup mount options when using "-s local"
#11956 libzfs: add keylocation=https://, backed by fetch(3) or libcurl
#11959 vdev_id: variable not getting expanded under map_slot()
#11966 Scale worker threads and taskqs with number of CPUs
#11994 Clean up use of zfs_log_create in zfs_dir
#11997 FreeBSD: Don't force xattr mount option
#11997 FreeBSD: Implement xattr=sa
#11997 FreeBSD: Use SET_ERROR to trace xattr name errors
#11998 Simplify/fix dnode_move() for dn_zfetch
#12003 FreeBSD: Initialize/destroy zp->z_lock
#12010 Fix dRAID self-healing short columns
#12033 Revert "Fix raw sends on encrypted datasets when copying back snapshots"
#12040 Reinstate the old zpool read label logic as a fallback
#12046 Improve scrub maxinflight_bytes math
#12049 FreeBSD: avoid memory allocation in arc_prune_async
#12052 FreeBSD: incorporate changes to the VFS_QUOTACTL(9) KPI
#12061 Fix dRAID sequential resilver silent damage handling
#12072 Let zfs diff be more permissive
#12077 FreeBSD: Retry OCF ENOMEM errors.
#12088 Propagate vdev state due to invalid label corruption
#12091 libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
#12097 FreeBSD: Update dataset_kstats for zvols in dev mode
#12104 FreeBSD boot code reminder after zpool upgrade
#12114 Introduce write-mostly sums
Mark Johnston [Tue, 8 Jun 2021 13:40:30 +0000 (09:40 -0400)]
hyperv: Fix vmbus after the i386 4/4 split
The vmbus ISR needs to live in a trampoline. Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time. Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.
Rather than fight this, move the Hyper-V ISR to i386/exception.s. Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386. This is sufficient to make use of vmbus(4) after the
4/4 split. Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together. I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30577
Alan Somers [Tue, 8 Jun 2021 13:36:43 +0000 (07:36 -0600)]
libzfs: On FreeBSD, use MNT_NOWAIT with getfsstat
`getfsstat(2)` is used to retrieve the list of mounted file systems,
which libzfs uses when fetching properties like mountpoint, atime,
setuid, etc. The `mode` parameter may be `MNT_NOWAIT`, which uses
information in the VFS's cache, or `MNT_WAIT`, which effectively does a
`statfs` on every single mounted file system in order to fetch the most
up-to-date information. As far as I can tell, the only fields that
libzfs cares about are the filesystem's name, mountpoint, fstypename,
and mount flags. Those things are always updated on mount and unmount,
so they will always be accurate in the VFS's mount cache except in two
circumstances:
1) When a file system is busy unmounting
2) When a ZFS file system changes the value of a mount-overridable
property like atime or setuid, but doesn't remount the file system.
Right now that only happens when the property is changed by an
unprivileged user who has delegated authority to change the property
but not to mount the dataset. But perhaps libzfs could choose to do
it for other reasons in the future.
Switching to `MNT_NOWAIT` will greatly improve speed with no downside,
as long as we explicitly update the mount cache whenever we change a
mount-overridable property.
For comparison, Illumos gets this information using the native
`getmntany` and `getmntent` functions, which also use cached
information. The illumos function that would refresh the cache,
`resetmnttab`, is never called by libzfs.
And on GNU/Linux, `getmntany` and `getmntent` don't even communicate
with the kernel directly. They simply parse the file they are given,
which is usually /etc/mtab or /proc/mounts. Perhaps the implementation
of /proc/mounts is synchronous, ala MNT_WAIT; I don't know.
Sponsored-by: Axcient Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com> Closes: #12091
Test functionality of ng_bridge(4):
- replicating traffic to anything but the sending hook
- persistence
- detect loops
- unicast to only one link of many
- stretch to implementation limits on broadcast
Dmitry Chagin [Tue, 8 Jun 2021 05:18:00 +0000 (08:18 +0300)]
Fix copyright, remove "all rights reserved".
The eventfd code was written by me, rdivacky@ copyrigth applicable only
to epoll part of the Linuxulator code. Roman is ok to retire his copyright
from sys/kern/sys_eventfd.c and 'All rights reserved.' lines from
sys/compat/linux/linux_event.[c|h] and sys/kern/sys_eventfd.c files.
Fedor Korotkov [Fri, 29 Jan 2021 14:22:54 +0000 (09:22 -0500)]
Cirrus-CI: Use the default Git history depth
Which is `50`. I saw a few errors like
`Failed to force reset to SHA: object not found!` which seems is
happening because the SHA is not available because there were two
commits pushed almost simultaneously and the second from the top fails
with this error because the SHA is not in the history.
Originally committed as fcb4797c90f3 and reverted in 80a840b8ba03 due
to the clone operation taking significantly longer. However, I have
seen many failures due to the "object not found" issue recently.
7 of 37 recent runs failed because of this, and intermittent failures
like this makes CI much less useful.
Prefer longer-running runs to intermittent failures.
Rick Macklem [Mon, 7 Jun 2021 20:48:25 +0000 (13:48 -0700)]
nfscl: Fix generation of va_fsid for a tree of NFSv4 server file systems
Pre-r318997 the code looked like:
if (vp->v_mount->mnt_stat.f_fsid.val[0] != (uint32_t)np->n_vattr.na_filesid[0])
vap->va_fsid = (uint32_t)np->n_vattr.na_filesid[0];
Doing this assignment got lost by r318997 and, as such, NFSv4 mounts
of servers with trees of file systems on the server is broken, due to duplicate
fileno values for the same st_dev/va_fsid.
Although I could have re-introduced the assignment, since the value of
na_filesid[0] is not guaranteed to be unique across the server file systems,
I felt it was better to always do the hash for na_filesid[0,1].
Since dev_t (st_dev/va_fsid) is now 64bits, I switched to a 64bit hash.
There is a slight chance of a hash conflict where 2 different na_filesid
values map to same va_fsid, which will be documented in the BUGS
section of the man page for mount_nfs(8). Using a table to keep track
of mappings to catch conflicts would not easily scale to 10,000+ server file
systems and, when the conflict occurs, it only results in fts(3) reporting
a "directory cycle" under certain circumstances.
Rich Ercolani [Mon, 7 Jun 2021 19:29:27 +0000 (15:29 -0400)]
Force --enable-debug on FreeBSD if INVARIANTS is set
There's already logic to force INVARIANTS on for building if it's
present in the running kernel; however, not having DEBUG enabled
when DEBUG and INVARIANTS are can cause strange panics.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12185
Closes #12163
Update the logic to handle the dedup-case of consecutive
FREEs in the livelist code. The logic still ensures that
all the FREE entries are matched up with a respective
ALLOC by keeping a refcount for each FREE blkptr that we
encounter and ensuring that this refcount gets to zero
by the time we are done processing the livelist.
zdb -y no longer panics when encountering double frees
Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #11480
Closes #12177
Alexander Motin [Mon, 7 Jun 2021 16:02:47 +0000 (12:02 -0400)]
More aggsum optimizations
- Avoid atomic_add() when updating as_lower_bound/as_upper_bound.
Previous code was excessively strong on 64bit systems while not
strong enough on 32bit ones. Instead introduce and use real
atomic_load() and atomic_store() operations, just an assignments
on 64bit machines, but using proper atomics on 32bit ones to avoid
torn reads/writes.
- Reduce number of buckets on large systems. Extra buckets not as
much improve add speed, as hurt reads. Unlike wmsum for aggsum
reads are still important.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc.
Closes #12145
Stefan Eßer [Mon, 7 Jun 2021 13:46:24 +0000 (15:46 +0200)]
usr.bin/calendar: do not treat // in text as comment
The C++-style comment marker "//" has been added with the rewrite of
the preprocessor features. Since this character sequence occurs in
ULRS, the reminder of the URL was considered a comment and stripped
from the calendar line.
Change parsing of "//" to only start a comment at the begin of a line
or when preceeded by a white-space character.
PR: 256455
Reported by: Philippe Michel (philippe.michel7 at free.fr)
MFC after: 3 days