Eugene Grosbein [Mon, 26 Nov 2018 11:28:34 +0000 (11:28 +0000)]
MFC r339810: ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".
Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:
ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0
Eugene Grosbein [Mon, 26 Nov 2018 11:20:25 +0000 (11:20 +0000)]
MFC r339816: mount_msdosfs
mount_msdosfs: do not fail mounts requiring locale name conversion table
that is already present in a kernel statically.
For example, the command "mount_msdosfs -L ru_RU.KOI8-R" fails with error
"mount_msdosfs: msdosfs_iconv: File exists" for a kernel having
options LIBICONV and MSDOSFS_ICONV. After this change, it mounts
successfully.
Ed Maste [Sun, 25 Nov 2018 00:32:23 +0000 (00:32 +0000)]
MFC r340771: proto: change device permissions to 0600
C Turt reports that the driver is not thread safe and may have
exploitable races.
Note that the proto device is intended for prototyping and development,
and is not for use on production systems. From the man page:
SECURITY CONSIDERATIONS
Because programs have direct access to the hardware, the proto
driver is inherently insecure. It is not advisable to use this
driver on a production machine.
The proto device is not included in any of FreeBSD's kernel config files
(although the module is built).
The issues in the proto device still need to be fixed, and the device is
inherently (and intentionally) insecure, but it might as well be limited
to root only.
admbugs: 782
Reported by: C Turt <ecturt@gmail.com>
Sponsored by: The FreeBSD Foundation
MFC r340649:
Add a fortune describing how to upload a machine's dmesg information to the
NYCBUG database.
We want to encourage our users to upload their dmesgs so that the project can
get a better insight into what kind of hardware is run on. This helps in making
data-driven decisions about i.e., platform and driver support.
Note that dmesgs may contain sensitive information like hardware serial numbers,
hence uploading them without review is discouraged.
Ed Maste [Fri, 23 Nov 2018 20:32:41 +0000 (20:32 +0000)]
MFC r340663 (rmacklem):
Improve sanity checking for the dircount hint argument to
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.
Ed Maste [Fri, 23 Nov 2018 20:31:27 +0000 (20:31 +0000)]
MFC r340662 (rmacklem):
nfsm_advance() would panic() when the offs argument was negative.
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.
Ed Maste [Fri, 23 Nov 2018 20:29:47 +0000 (20:29 +0000)]
MFC r340661 (rmacklem):
r304026 added code that started statistics gathering for an operation
before the operation number (the variable called "op") was sanity checked.
This patch moves the code down to below the range sanity check for "op".
Marius Strobl [Thu, 22 Nov 2018 13:03:11 +0000 (13:03 +0000)]
MFC: r340656
Given that the idea of D15374 was to "make memmove a first class citizen",
provide a _MEMMOVE extension of _MEMCPY that deals with overlap based on
the previous bcopy(9) implementation and use the former for bcopy(9) and
memmove(9). This addresses my D15374 review comment, avoiding extra MOVs
in case of memmove(9) and trashing the stack pointer.
Tijl Coosemans [Thu, 22 Nov 2018 09:47:42 +0000 (09:47 +0000)]
MFC r340674:
Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument. Stop using the macro as well as LINUX_CMSG_FIRSTHDR. Use
the size field of the kernel copy of the control message header to obtain
the next control message.
Tijl Coosemans [Thu, 22 Nov 2018 09:41:45 +0000 (09:41 +0000)]
MFC r340631:
Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin. For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly. One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).
Kyle Evans [Thu, 22 Nov 2018 03:04:13 +0000 (03:04 +0000)]
MFC r339701: Update lualoader test script a little bit
Use userboot.so from the test directory if possible, fall back to .OBJDIR.
This avoids a problem that we've had since userboot coexistence was added,
where userboot.so alone no longer exists in the .OBJDIR but is instead just
a link installed later.
Update lualo
r340507:
libbe(3): rewrite init to support chroot usage
libbe(3) currently uses zfs_be_root and locates which of its children is
currently mounted at "/". This is reasonable, but not correct in the case of
a chroot, for two reasons:
- chroot root may be of a different zpool than zfs_be_root
- chroot root will not show up as mounted at "/"
Fix both of these by rewriting libbe_init to work from the rootfs down.
zfs_path_to_zhandle on / will resolve to the dataset mounted at the new
root, rather than the real root. From there, we can derive the BE root/pool
and grab the bootfs off of the new pool. This does no harm in the average
case, and opens up bectl to operating on different pools for scenarios where
one may be, for instance, updating a pool that generally gets re-rooted into
from a separate UFS root or zfs bootpool.
While here, I've also:
- Eliminated the check for /boot and / to be on the same partition. This
leaves one open to a setup where /boot (and consequently, kernel/modules)
are not included in the boot environment. This may very well be an
intentional setup done by someone that knows what they're doing, we should
not kill BE usage because of it.
- Eliminated the validation bits of BEs and snapshots that enforced
'mountpoint' to be "/" -- this broke when trying to operate on an imported
pool with an altroot, but we need not be this picky.
Reported by: philip
Reviewed by: philip, allanjude (previous version)
Tested by: philip
Differential Revision: https://reviews.freebsd.org/D18012
r340508:
libbe(3): Rewrite be_unmount to stop mucking with getmntinfo(2)
Go through the ZFS layer instead; given a BE, we can derive the dataset,
zfs_open it, then zfs_unmount. ZFS takes care of the dirty details and
likely gets it more correct than we did for more interesting setups.
r340592:
bectl(3)/libbe(3): Allow BE root to be specified
Add an undocumented -r option preceding the bectl subcommand to specify a BE
root to operate out of. This will remain undocumented for now, as some
caveats apply:
- BEs cannot be activated in the pool that doesn't contain the rootfs
- bectl create cannot work out of the box without the -e option right now,
since it defaults to the rootfs and cross-pool cloning doesn't work like
that (IIRC)
Plumb the BE root through to libbe(3) so that some things -can- be done to
it, e.g.
this aides in some upgrade setups where rootfs is not necessarily ZFS, and
also makes it easier/possible to regression-test bectl when combined with a
file-backed zpool.
r340593:
libbe(3): Properly account for altroot when creating new BEs
Previously we would blindly copy the 'mountpoint' property, which includes
the altroot. The altroot needs to be snipped off prior to setting it on the
new BE, though, or you'll end up with a new BE and a mountpoint of /mnt with
altroot=/mnt
r340594:
bectl(8): Add some regression tests
These tests operate on a file-backed zpool that gets created in the kyua
temp dir. root and ZFS support are both required for these tests. Current
tests cover create, destroy, export/import, jail, list (kind of), mount,
rename, and jail.
List tests should later be extended to cover formatting and the different
list flags, but for now only covers basic "are create/destroy actually
reflected properly"
r340635:
libbe(3): Handle non-ZFS rootfs better
If rootfs isn't ZFS, current version will emit an error claiming so and fail
to initialize libbe. As a consumer, bectl -r (undocumented) can be specified
to operate on a BE independently of whether on a UFS or ZFS root.
Unbreak this for the UFS case by only erroring out the init if we can't
determine a ZFS dataset for rootfs and no BE root was specified. Consumers
of libbe should take care to ensure that rootfs is non-empty if they're
trying to use it, because this could certainly be the case.
Some check is needed before zfs_path_to_zhandle because it will
unconditionally emit to stderr if the path isn't a ZFS filesystem, which is
unhelpful for our purposes.
This should also unbreak the bectl(8) tests on a UFS root, as is the case in
Jenkins' -test runs.
r340636:
bectl(8) tests: attempt to load the ZFS module
Observed in a CI test image, bectl_create test will run and be marked as
skipped because the module is not loaded. The first zpool invocation will
automagically load the module, but bectl_create is still skipped. Subsequent
tests all pass as expected because the module is now loaded and everything
is OK.
Marius Strobl [Wed, 21 Nov 2018 18:53:13 +0000 (18:53 +0000)]
MFC: r340495
- Restore setting the clock for devices which support the default/legacy
transfer mode only (lost with r321385). [1]
- Similarly, don't try to set the power class on MMC devices that comply
to version 4.0 of the system specification but are operated in default/
legacy transfer or 1-bit bus mode as no power class is specified for
these cases. Trying to set a power class nevertheless resulted in an -
albeit harmless - error message.
Ed Maste [Tue, 20 Nov 2018 20:16:03 +0000 (20:16 +0000)]
Introduce src.conf knob to build userland with retpoline
MFC r339511: Introduce src.conf knob to build userland with retpoline
WITH_RETPOLINE enables -mretpoline vulnerability mitigation in userland
for CVE-2017-5715.
MFC r340099: libcompat: disable retpoline when building build tools
These are built with the host toolchain which may not support retpoline.
While here, move the MK_ overrides to a separate line and sort them
alphabetically to support future changes.
MFC r340650: Avoid retpolineplt with static linking
Statically linked binaries linked with -zretpolineplt crash at startup
as lld produces a broken PLT.
MFC r340652: rescue: set NO_SHARED in Makefile
The rescue binary is built statically via the Makefile generated by
crunchgen, but that does not trigger other shared/static logic in
bsd.prog.mk - in particular
PR: 233336
Reported by: Peter Malcom (r339511), Charlie Li (r340652)
Approved by: re (gjb, early MFC)
Sponsored by: The FreeBSD Foundation
Niclas Zeising [Tue, 20 Nov 2018 19:37:09 +0000 (19:37 +0000)]
MFC r340387: Add evdev support to amd64 and i386
This merge is done sans sys/i386/conf/MINIMAL, since that doesn't exist in
stable/12, only current.
Include evdev support and drivers in the amd64 GENERIC and MINIMAL, and i386
GENERIC kernels. Evdev is used by X and wayland to handle input devices, and
this change, together with upcomming changes in ports will make us handle input
devices better in graphical UIs.
amd64: tidy up memset to have rax set earlier for small sizes
amd64: finish the tail in memset with an overlapping store
amd64: align memset buffers to 16 bytes before using rep stos
amd64: convert libc bzero to a C func to avoid future bloat
amd64: sync up libc memset with the kernel version
amd64: handle small memset buffers with overlapping stores
Fix -DNO_CLEAN amd64 build after r340463
Eugene Grosbein [Tue, 20 Nov 2018 10:43:18 +0000 (10:43 +0000)]
MFC r339558: New sysctl: net.inet.icmp.error_keeptags
Currently, icmp_error() function copies FIB number from original packet
into generated ICMP response but not mbuf_tags(9) chain.
This prevents us from easily matching ICMP responses corresponding
to tagged original packets by means of packet filter such as ipfw(8).
For example, ICMP "time-exceeded in-transit" packets usually generated
in response to traceroute probes lose tags attached to original packets.
This change adds new sysctl net.inet.icmp.error_keeptags
that defaults to 0 to avoid extra overhead when this feature not needed.
Set net.inet.icmp.error_keeptags=1 to make icmp_error() copy mbuf_tags
from original packet to generated ICMP response.
Kyle Evans [Mon, 19 Nov 2018 19:04:50 +0000 (19:04 +0000)]
MFC r340392: Add dynamic_kenv assertion to init_static_kenv
Both to formally document the requirement that this not be called after the
dynamic kenv is setup, and to perhaps help static analyzers figure out
what's going on. While calling init_static_kenv this late isn't fatal, there
are some caveats that the caller should be aware of:
- Late calls are effectively a no-op, as far as default FreeBSD is
concerned, as everything will switch to searching the dynamic kenv once it's
available.
- Each of the kern_getenv calls will leak memory, as it's assumed that
these are searching static environment and allocations will not be made.
As such, this usage is not sensible and should be detected.
Kyle Evans [Mon, 19 Nov 2018 18:59:06 +0000 (18:59 +0000)]
MFC r340390: Fix test-dts{,o} targets
There were two main problems here:
1.) sys/dts/Makefile.inc is not included from various */overlays directories
by default, only ../Makefile.inc
2.) When shelling out for DTS/DTSO, cwd != .CURDIR, so enumeration always
failed
These changes allow make test-dts and make test-dtso to function in their
respective directories.
Glen Barber [Mon, 19 Nov 2018 15:29:40 +0000 (15:29 +0000)]
Remove debugging options from amd64 MINIMAL [1] and riscv GENERIC
kernel configuration files. This should have been turned off when
stable/12 branched.
This is a direct commit to stable/12.
Submitted by: Harry Schmalzbauer [1]
Sponsored by: The FreeBSD Foundation
Stephen Hurd [Mon, 19 Nov 2018 15:18:30 +0000 (15:18 +0000)]
MFC r340434, r340445
r340434:
Fix leaks caused by ifc_nhwtxqs never being initialized
r333502 removed initialization of ifc_nhwtxqs, and it's not clear
there's a need to copy it into the struct iflib_ctx at all. Use
ctx->ifc_sctx->isc_ntxqs instead.
Further, iflib_stop() did not clear the last ring in the case where
isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use
ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl.
r340445:
Clear RX completion queue state veriables in iflib_stop()
iflib_stop() was not resetting the rxq completion queue state variables.
This meant that for any driver that has receive completion queues, after a
reinit, iflib would start asking what's available on the rx side starting at
whatever the completion queue index was prior to the stop, instead of at 0.
Add the lb program, which is able to load-balance input traffic
received from a netmap port over M groups, with N netmap pipes in
each group. Each received packet is forwarded to one of the pipes
chosen from each group (using an L3/L4 connection-consistent hash function).
This also adds a man page for lb and some cross-references in related
man pages.
Eugene Grosbein [Mon, 19 Nov 2018 06:33:38 +0000 (06:33 +0000)]
MFC r339465: rc.initdiskless: add support for auxiliary NVRAM.
Currently, rc.inidiskless assumes that local system configuration
changes are kept in some mountable file system. For example,
nanobsd uses dedicated partition mounted as /cfg for this.
However, small embedded devices like MIPS routers may have no enough flash
space to keep full-blown file system but have only one or couple
small flash blocks to keep persistent local configuration overrides.
This change extends rc.initdiskless and introduces ability to run auxiliary
command /conf/T/M/extract that is supposed to extract configuration overrides
from such local storage.
For example, the command /conf/default/etc/extract may contain something like:
cd "$1" && bsdcpio --quiet -idu < /dev/map/cfg
bsdcpio command extracts compressed archive from the storage to /etc
assuming the storage is exposed by the kernel as /dev/map/cfg to userland.
Rick Macklem [Mon, 19 Nov 2018 00:49:08 +0000 (00:49 +0000)]
MFC: r339999
Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.
Ed Maste [Sun, 18 Nov 2018 14:52:16 +0000 (14:52 +0000)]
MFC r340329: build(7): clarify buildenv target can be used for non-cross builds
make buildenv can be used for building for the same architecture as
the host (perhaps this is a degenerate case of cross-building).
TARGET and TARGET_ARCH do not need to be set in this case.
Kristof Provost [Sun, 18 Nov 2018 12:09:26 +0000 (12:09 +0000)]
MFC r340067:
pfsync: Ensure uninit is done before pf
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.
Kristof Provost [Sun, 18 Nov 2018 12:04:24 +0000 (12:04 +0000)]
MFC r340066:
Notify that the ifnet will go away, even on vnet shutdown
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.
Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.
Kristof Provost [Sun, 18 Nov 2018 10:57:31 +0000 (10:57 +0000)]
MFC r339676:
pf: Fix copy/paste error in IPv6 address rewriting
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.
Kristof Provost [Sun, 18 Nov 2018 10:47:36 +0000 (10:47 +0000)]
MFC r339470:
pf synproxy will do the 3WHS on behalf of the target machine, and once
the 3WHS is completed, establish the backend connection. The trigger
for "3WHS completed" is the reception of the first ACK. However, we
should not proceed if that ACK also has RST or FIN set.
MFC r339554:
Rework if_ipsec(4) to use epoch(9) instead of rmlock.
* use CK_LIST and FNV hash to keep chains of softc;
* read access to softc is protected by epoch();
* write access is protected by ipsec_ioctl_sx. Changing of softc fields
is allowed only when softc is unlinked from CK_LIST chains.
* linking/unlinking of softc is allowed only when ipsec_ioctl_sx is
exclusive locked.
* the plain LIST of all softc is replaced by hash table that uses ingress
address of tunnels as a key.
* added support for appearing/disappearing of ingress address handling.
Now it is allowed configure non-local ingress IP address, and thus the
problem with if_ipsec(4) configuration that happens on boot, when
ingress address is not yet configured, is solved.
MFC r339555:
Follow the fix in r339532 (by glebius):
Fix exiting an epoch(9) we never entered. May happen only with MAC.
MFC r339642:
Remove softc from idhash when interface is destroyed.
MFC r339646:
Add the check that current VNET is ready and access to srchash is
allowed.
ipsec_srcaddr() callback can be called during VNET teardown, since
ingress address checking subsystem isn't VNET specific. And thus
callback can make access to already freed memory. To prevent this,
use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make
epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to
be sure that ipsec_srcaddr() is finished its work.
MFC r339551:
Add handling for appearing/disappearing of ingress addresses to if_gif(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
* remove the note about ingress address from BUGS section.
MFC r339552:
Add handling for appearing/disappearing of ingress addresses to if_gre(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
MFC r339553:
Add handling for appearing/disappearing of ingress addresses to if_me(4).
* register handler for ingress address appearing/disappearing;
* add new srcaddr hash table for fast softc lookup by srcaddr;
* when srcaddr disappears, clear IFF_DRV_RUNNING flag from interface,
and set it otherwise;
Sponsored by: Yandex LLC
MFC r339649:
Add the check that current VNET is ready and access to srchash is allowed.
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.
MFC r339550,339556:
Add KPI that can be used by tunneling interfaces to handle IP addresses
appearing and disappearing on the host system.
Such handling is need, because tunneling interfaces must use addresses,
that are configured on the host as ingress addresses for tunnels.
Otherwise the system can send spoofed packets with source address, that
belongs to foreign host.
The KPI uses ifaddr_event_ext event to implement addresses tracking.
Tunneling interfaces register event handlers and then they are
notified by the kernel, when an address disappears or appears.
ifaddr_event_compat() handler from if.c replaced by srcaddr_change_event()
in the ip_encap.c