dim [Mon, 18 Mar 2019 21:04:28 +0000 (21:04 +0000)]
Enable building libomp.so for 32-bit x86. This is done by selectively
enabling the functions that save and restore MXCSR, since access to this
register requires SSE support.
Note that you may run into other issues with OpenMP on i386, since this
*not* yet supported upstream, and certainly not extensively tested.
emaste [Mon, 18 Mar 2019 19:26:36 +0000 (19:26 +0000)]
makefs: Fix "time" mtree attribute handling
When processing mtree(5) MANIFEST files, makefs(8) previously threw an
error if it encountered an entry whose "time" attribute contained a
non-zero subsecond component (e.g. time=1551620152.987220000).
Update the handling logic to properly assign the subsecond component if
built with nanosecond support, or silently discard it otherwise.
Also, re-enable the time attribute for the kyua tests.
emaste [Mon, 18 Mar 2019 19:23:19 +0000 (19:23 +0000)]
sys/stat.h: Improve timespec compatibility with other BSDs
OpenBSD and NetBSD provide macros to directly reference the underlying
struct timespec's tv_nsec member. While FreeBSD has such macros for
tv_sec, the others are missing. Add the following macros:
Adding these fields will provide programs which reference them better
portability to FreeBSD. An example of such a program is makefs(8),
which has unused support for subseconds that it has inherited from
NetBSD.
dim [Mon, 18 Mar 2019 19:11:11 +0000 (19:11 +0000)]
Also explicitly link libomp.so against -lm, as it transitively depends
on scalbn and a few other math functions, via libcompiler-rt. This
should allow OpenMP programs to link with BFD linkers too.
ae [Mon, 18 Mar 2019 12:59:08 +0000 (12:59 +0000)]
Update NAT64LSN implementation:
o most of data structures and relations were modified to be able support
large number of translation states. Now each supported protocol can
use full ports range. Ports groups now are belongs to IPv4 alias
addresses, not hosts. Each ports group can keep several states chunks.
This is controlled with new `states_chunks` config option. States
chunks allow to have several translation states for single alias address
and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
special prefix "::" value should be used for this purpose when instance
is created.
o due to modified internal data structures relations, the socket opcode
that does states listing was changed.
gallatin [Mon, 18 Mar 2019 12:41:42 +0000 (12:41 +0000)]
Fix a typo introduced in r344133
The line was misedited to change tt to st instead of
changing ut to st.
The use of st as the denominator in mul64_by_fraction() will lead
to an integer divide fault in the intr proc (the process holding
ithreads) where st will be 0. This divide by 0 happens after
the total runtime for all ithreads exceeds 76 hours.
vmaffione [Mon, 18 Mar 2019 12:22:23 +0000 (12:22 +0000)]
netmap: add support for multiple host rings
Some applications forward from/to host rings most or all the
traffic received or sent on a physical interface. In this
cases it is desirable to have more than a pair of RX/TX host
rings, and use multiple threads to speed up forwarding.
This change adds support for multiple host rings. On registering
a netmap port, the user can specify the number of desired receive
and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings
fields of the nmreq_register structure.
ae [Mon, 18 Mar 2019 11:44:53 +0000 (11:44 +0000)]
Add NAT64 CLAT implementation as defined in RFC6877.
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.
# ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
# ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
# ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in
Obtained from: Yandex LLC
Submitted by: Boris N. Lytochkin
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
ae [Mon, 18 Mar 2019 10:39:14 +0000 (10:39 +0000)]
Modify struct nat64_config.
Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.
Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.
Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.
markj [Sun, 17 Mar 2019 17:34:06 +0000 (17:34 +0000)]
Optimize lseek(SEEK_DATA) on UFS.
The old implementation, at the VFS layer, would map the entire range of
logical blocks between the starting offset and the first data block
following that offset. With large sparse files this is very
inefficient. The VFS currently doesn't provide an interface to improve
upon the current implementation in a generic way.
Add ufs_bmap_seekdata(), which uses the obvious algorithm of scanning
indirect blocks to look for data blocks. Use it instead of
vn_bmap_seekhole() to implement SEEK_DATA.
Reviewed by: kib, mckusick
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19598
bz [Sun, 17 Mar 2019 09:31:09 +0000 (09:31 +0000)]
Fix legacy IP autoconfiguration.
It seems my subconcious plan in r345088 to not only prefer IPv6 autoconf
but to also slowly deteriorate legacy IP auto-configuration was uncovered
way too early.
In case IPv6 is a thing yet ipv6_autoconfif was not true, we would not
bring up the interface yet tell the follow-up DHCPv4 configuration in
ifconfig_up() that we did. So unless you were doing SYNCDHCP or IPv6
you would not get legacy-IP DHCPv4 configuration.
I see multiple problems here: (a) people not yet using IPv6 (obviously a
problem), and (b) the dhclient startup script not running dhclient in
that case despite configured to do so (needs to be investigated seperately).
Reported by: Pawel Biernacki (pawel.biernacki gmail.com)
Tested by: Pawel Biernacki
Differential Revision: https://reviews.freebsd.org/D19488
Pointyhat to: bz (not sure if it is for breaking or
for letting them notice it so easily)
dim [Sat, 16 Mar 2019 15:45:15 +0000 (15:45 +0000)]
Connect lib/libomp to the build.
* Set MK_OPENMP to yes by default only on amd64, for now.
* Bump __FreeBSD_version to signal this addition.
* Ensure gcc's conflicting omp.h is not installed if MK_OPENMP is yes.
* Update OptionalObsoleteFiles.inc to cope with the conflicting omp.h.
* Regenerate src.conf(5) with new WITH/WITHOUT fragments.
dim [Sat, 16 Mar 2019 15:01:36 +0000 (15:01 +0000)]
Add lib/libomp, with a Makefile, and generated configuration headers.
Not connected to the main build yet, as there is still the issue of the
GNU omp.h header conflicting with the LLVM one. (That is, if MK_GCC is
enabled.)
dim [Sat, 16 Mar 2019 13:40:27 +0000 (13:40 +0000)]
Add LLVM openmp trunk r351319 (just before the release_80 branch point)
to contrib/llvm. This is not yet connected to the build, the glue for
that will come in a follow-up commit.
kib [Sat, 16 Mar 2019 11:44:33 +0000 (11:44 +0000)]
amd64 KPTI: add control from procctl(2).
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting. PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.
kib [Sat, 16 Mar 2019 11:31:01 +0000 (11:31 +0000)]
amd64: Add md process flags and first P_MD_PTI flag.
PTI mode for the process pmap on exec is activated iff P_MD_PTI is set.
On exec, the existing vmspace can be reused only if pti mode of the
pmap matches the P_MD_PTI flag of the process. Add MD
cpu_exec_vmspace_reuse() callback for exec_new_vmspace() which can
vetoed reuse of the existing vmspace.
MFC note: md_flags change struct proc KBI.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
kib [Sat, 16 Mar 2019 11:16:09 +0000 (11:16 +0000)]
amd64: fix switching to the pmap with pti disabled.
When the pmap with pti disabled (i.e. pm_ucr3 == PMAP_NO_CR3) is
activated, tss.rsp0 was not updated. Any interrupt that happen before
next context switch would use pti trampoline stack for hardware frame
but fault and interrupt handlers are not prepared to this. Correctly
update tss.rsp0 for both PMAP_NO_CR3 and pti pmaps.
Note that this case, pti = 1 but pmap->pm_ucr3 == PMAP_NO_CR3 is not
used at the moment.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
ngie [Fri, 15 Mar 2019 21:49:19 +0000 (21:49 +0000)]
Integrate cddl/usr.sbin/zfds/tests into the FreeBSD test suite
This change integrates the unit tests for zfsd into the test suite using the
integration method described in r345203.
This change removes the `LOCALBASE` includes added for the port version of
googlemock/googletest, as well as unnecessary `LIBADD`/`DPADD` and `CXXFLAGS`
defines, which are included in the `GTEST_CXXFLAGS` variable, as part of
r345203.
ngie [Fri, 15 Mar 2019 21:43:52 +0000 (21:43 +0000)]
Initial googlemock/googletest integration into the build/FreeBSD test suite
This initial integration takes googlemock/googletest release 1.8.1, integrates
the library, tests, and sample unit tests into the build.
googlemock/googletest's inclusion is optionally available via `MK_GOOGLETEST`.
`MK_GOOGLETEST` is dependent on `MK_TESTS` and is enabled by default when
built with a C++11 capable toolchain.
Google tests can be specified via the `GTESTS` variable, which, in comparison
with the other test drivers, is more simplified/streamlined, as Googletest only
supports C++ tests; not raw C or shell tests (C tests can be written in C++
using the standard embedding methods).
No dependent libraries are assumed for the tests. One must specify `gmock`,
`gmock_main`, `gtest`, or `gtest_main`, via `LIBADD` for the program.
More information about googlemock and googletest can be found on the
Googletest [project page](https://github.com/google/googletest), and the
[GoogleMock](https://github.com/google/googletest/blob/v1.8.x/googlemock/docs/Documentation.md)
and
[GoogleTest](https://github.com/google/googletest/tree/v1.8.x/googletest/docs)
docs.
These tests are originally integrated into the build as plain driver tests, but
will be natively integrated into Kyua in a later version.
Known issues/Errata:
* [WhenDynamicCastToTest.AmbiguousCast fails on FreeBSD](https://github.com/google/googletest/issues/2172)
mav [Fri, 15 Mar 2019 18:59:04 +0000 (18:59 +0000)]
MFV r336930: 9284 arc_reclaim_thread has 2 jobs
`arc_reclaim_thread()` calls `arc_adjust()` after calling
`arc_kmem_reap_now()`; `arc_adjust()` signals `arc_get_data_buf()` to
indicate that we may no longer be `arc_is_overflowing()`.
The problem is, `arc_kmem_reap_now()` can take several seconds to
complete, has no impact on `arc_is_overflowing()`, but due to how the
code is structured, can impact how long the ARC will remain in the
`arc_is_overflowing()` state.
The fix is to use seperate threads to:
1. keep `arc_size` under `arc_c`, by calling `arc_adjust()`, which
improves `arc_is_overflowing()`
2. keep enough free memory in the system, by calling
`arc_kmem_reap_now()` plus `arc_shrink()`, which improves
`arc_available_memory()`.
glebius [Fri, 15 Mar 2019 18:18:05 +0000 (18:18 +0000)]
Deanonymize thread and proc state enums, so that a userland app can
use them without redefining the value names. New clang no longer
allows to redefine a enum value name to the same value.
Bump __FreeBSD_version, since ports depend on that.
kevans [Fri, 15 Mar 2019 17:19:36 +0000 (17:19 +0000)]
if_bridge(4): Drop pointless rtflush
At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.
kevans [Fri, 15 Mar 2019 17:13:05 +0000 (17:13 +0000)]
if_bridge(4): Drop pointless rtflush
At this point, all routes should've already been dropped by removing all
members from the bridge. This condition is in-fact KASSERT'd in the line
immediately above where this nop flush was added.
kp [Fri, 15 Mar 2019 15:52:36 +0000 (15:52 +0000)]
bridge: Fix STP-related panic
After r345180 we need to have the appropriate vnet context set to delete an
rtnode in bridge_rtnode_destroy().
That's usually the case, but not when it's called by the STP code (through
bstp_notify_rtage()).
We have to set the vnet context in bridge_rtable_expire() just as we do in the
other STP callback bridge_state_change().
eugen [Fri, 15 Mar 2019 14:42:23 +0000 (14:42 +0000)]
trim(8): emit more user-friendly error message in verbose mode.
If underlying driver provides no TRIM/UNMAP support and operation fails
due to this reason, state it clearly in verbose mode (default)
instead of writing standard message that may be too cryptic for a user:
trim: ioctl(DIOCGDELETE) failed: nda0: Operation not supported
Now it would write:
trim: nda0: TRIM/UNMAP not supported by driver
But still use previous format including errno value for quiet mode.
Small candelete() function borrowed from diskinfo(8) code.
This function was committed by Alan Somers <asomers@FreeBSD.org>,
so give him some credit.
kevans [Fri, 15 Mar 2019 13:19:52 +0000 (13:19 +0000)]
if_bridge(4): Fix module teardown
bridge_rtnode_zone still has outstanding allocations at the time of
destruction in the current model because all of the interface teardown
happens in a VNET_SYSUNINIT, -after- the MOD_UNLOAD has already been
processed. The SYSUNINIT triggers destruction of the interfaces, which then
attempts to free the memory from the zone that's already been destroyed, and
we hit a panic.
Solve this by virtualizing the uma_zone we allocate the rtnodes from to fix
the ordering. bridge_rtable_fini should also take care to flush any
remaining routes that weren't taken care of when dynamic routes were flushed
in bridge_stop.
fsu [Fri, 15 Mar 2019 11:49:46 +0000 (11:49 +0000)]
Remove unneeded mount point unlock function calls.
The ext2_nodealloccg() function unlocks the mount point
in case of successful node allocation.
The additional unlocks are not required and should be removed.
kp [Fri, 15 Mar 2019 11:21:20 +0000 (11:21 +0000)]
bridge: Fix panic if the STP root is removed
If the spanning tree root interface is removed from the bridge we panic
on the next 'ifconfig'.
While the STP code is notified whenever a bridge member interface is
removed from the bridge it does not clear the bs_root_port. This means
bs_root_port can still point at an bridge_iflist which has been free()d.
The next access to it will panic.
Explicitly check if the interface we're removing in bstp_destroy() is
the root, and if so re-assign the roles, which clears bs_root_port.
kp [Fri, 15 Mar 2019 11:08:44 +0000 (11:08 +0000)]
pf :Use counter(9) in pf tables.
The counters of pf tables are updated outside the rule lock. That means state
updates might overwrite each other. Furthermore allocation and
freeing of counters happens outside the lock as well.
Use counter(9) for the counters, and always allocate the counter table
element, so that the race condition cannot happen any more.
chuck [Fri, 15 Mar 2019 02:11:27 +0000 (02:11 +0000)]
Fix bhyve's NVMe Identify Namespace data
The NVMe Identify Namespace data structure's Number of LBA Formats
(NLBAF) field is a 0's based value (i.e. 0x0 means 1). Since the
emulation only supports a single format, set NLBAF to 0x0, not 1.
glebius [Thu, 14 Mar 2019 22:52:16 +0000 (22:52 +0000)]
PFIL_MEMPTR for ipfw link level hook
With new pfil(9) KPI it is possible to pass a void pointer with length
instead of mbuf pointer to a packet filter. Until this commit no filters
supported that, so pfil run through a shim function pfil_fake_mbuf().
Now the ipfw(4) hook named "default-link", that is instantiated when
net.link.ether.ipfw sysctl is on, supports processing pointer/length
packets natively.
- ip_fw_args now has union for either mbuf or void *, and if flags have
non-zero length, then we use the void *.
- through ipfw_chk() we handle mem/mbuf cases differently.
- ether_header goes away from args. It is ipfw_chk() responsibility
to do parsing of Ethernet header.
- ipfw_log() now uses different bpf APIs to log packets.
Although ipfw_chk() is now capable to process pointer/length packets,
this commit adds support for the link level hook only, see
ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet()
can be improved too, but that requires more changes since the hook
supports more complex actions: NAT, divert, etc.
glebius [Thu, 14 Mar 2019 22:28:50 +0000 (22:28 +0000)]
- Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
output interface, depending on IPFW_ARGS_IN/OUT bit set.
kevans [Thu, 14 Mar 2019 19:48:43 +0000 (19:48 +0000)]
ether_fakeaddr: Use 'b' 's' 'd' for the prefix
This has the advantage of being obvious to sniff out the designated prefix
by eye and it has all the right bits set. Comment stolen from ffec.
I've removed bryanv@'s pending question of using the FreeBSD OUI range --
no one has followed up on this with a definitive action, and there's no
particular reason to shoot for it and the administrative overhead that comes
with deciding exactly how to use it.
kevans [Thu, 14 Mar 2019 17:18:00 +0000 (17:18 +0000)]
ether: centralize fake hwaddr generation
We currently have two places with identical fake hwaddr generation --
if_vxlan and if_bridge. Lift it into if_ethersubr for reuse in other
interfaces that may also need a fake addr.
Reviewed by: bryanv, kp, philip
Differential Revision: https://reviews.freebsd.org/D19573
0mp [Thu, 14 Mar 2019 14:34:36 +0000 (14:34 +0000)]
chroot.8: Add examples & clean up
- Sort arguments in synopsis.
- Clarify that it is possible to specify arguments to the command (and that
they could be passed as further arguments to chroot(1)).
- Standardize the description of the flags.
- Improve formatting (e.g., do not use macros in strings specifying width).
- Add examples.