John Baldwin [Tue, 6 Nov 2018 00:11:36 +0000 (00:11 +0000)]
Add a facility for transmitting "raw" work requests on regular NIC queues.
- Use PH_loc.eight[1] as a general 'cflags' (Chelsio flags) field to
describe properties of a queued packet. The MC_RAW_WR flag
indicates an mbuf holding a raw work request. mbuf_cflags() returns
the current flags.
- Raw work request mbufs are allocated via alloc_wr_mbuf() which will
allocate a single contiguous range to hold the mbuf data. The
consumer can use mtod() to obtain the start of the work request and
write the required work request in the buffer. The mbuf can then be
enqueued directly to the txq via mp_ring_enqueue().
- Since raw work requests might potentially send arbitrary work
requests, only set the EQUIQ and EQUEQ bits on work requests that
support them such as the normal tunneled Ethernet packet work
requests.
John Baldwin [Mon, 5 Nov 2018 22:54:03 +0000 (22:54 +0000)]
Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC. For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond. If the TSC does not exist, read from I/O port 0x84 to
delay instead.
PR: 228768
Reported by: Roger Hammerstein <cheeky.m@live.com>
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17851
Alex Richardson [Mon, 5 Nov 2018 22:51:44 +0000 (22:51 +0000)]
Keep inheriting $PATH when using system linker/compiler
I missed this case when testing r340157. For now just keep
$PATH when we aren't bootstrapping a compiler so that the build
can find cc/c++/ld without an absolute path.
Ed Maste [Mon, 5 Nov 2018 22:36:45 +0000 (22:36 +0000)]
revert r340156, restoring sys/sys/capability.h
More time is still needed for ports to accommodate the migration to
capsicum.h.
The header was renamed in 2014 due to concerns about conflicts with with
a draft POSIX.1e capability.h header on other systems and there is (now)
no need for complex autoconf tests for both capability.h and capsicum.h.
Any supported Capsicum-capable system has capsicum.h.
Reported by: antoine
Sponsored by: The FreeBSD Foundation
John Baldwin [Mon, 5 Nov 2018 21:34:17 +0000 (21:34 +0000)]
Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI. Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms. However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.
John Baldwin [Mon, 5 Nov 2018 20:00:36 +0000 (20:00 +0000)]
Rework setting PTE_D for kernel mappings.
Rather than unconditionally setting PTE_D for all writeable kernel
mappings, set PTE_D for writable mappings of unmanaged pages (whether
user or kernel). This matches what amd64 does and also matches what
the RISC-V spec suggests (preset the A and D bits on mappings where
the OS doesn't care about the state).
Alex Richardson [Mon, 5 Nov 2018 19:51:16 +0000 (19:51 +0000)]
Build the elftoolchain libraries as part of bootstrap-tools
It is not necessary to build libelf and libdwarf this early. Furthermore,
when building on Linux/MacOS, m4 will only be built during the bootstrap
tools phase and not be available in $PATH before.
Alex Richardson [Mon, 5 Nov 2018 19:51:10 +0000 (19:51 +0000)]
Allow building world without inheriting $PATH
Inheriting $PATH during the build phase can cause the build to fail when
compiling on a different system due to missing build tools or incompatible
versions somewhere in $PATH. This has cause build failures for us before
due to the jenkins slaves still running FreeBSD 10.
Listing the tools we depend on explicitly instead of just using whatever
happens to be in $PATH allows us to check that we don't accidentally add a
new build dependency.
All tools that do no need to be bootstrapped will now be symlinked to
${WORLDTMP}/legacy/bin and during the build phase $PATH will only contain
${WORLDTMP}. There is also a new variable "BOOTSTRAP_ALL_TOOLS" which can
be set to force compiling almost all bootstrap tools instead of symlinking
them. This will not bootstrap tools such as cp,mv, etc. since they may be
used during the build and for those we should really only be using POSIX
compatible options.
Furthermore, this change is required in order to be able to build on
non-FreeBSD hosts. While the same binaries may exist on Linux/MacOS they
often accept different flags or produce incompatible output.
Warner Losh [Mon, 5 Nov 2018 18:47:29 +0000 (18:47 +0000)]
Only assert locked for many async events.
Many async events that we see are called for this specific path. When
calling an async callback for a targetted device, XTP will lock that
specific device's path lock (same as what cam_periph_lock does). For
those AC_ events, assert we have the lock rather than trying to
recusrively take it (which causes panics since it's not recursive).
Add annotations about this and about the fact that AC_SCSI_AEN events
are generated now only in the ata stack (which cannot have a scsi_da
attachment). Leave it in place in case I've overlooked something as
the code is harmless.
This is fallout from my attempts to "fix" locking for softc->flags in
r330796 that's not been triggered often enough to get my attention
until now.
Sponsored by: Netflix
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D17837
Matt Macy [Mon, 5 Nov 2018 08:11:16 +0000 (08:11 +0000)]
hwpmc: limit wait for user callchain collection to 1 tick
The hwpmc pcpu sample buffer is prone to head of line blocking
when waiting for user process to return to user space and
collect a pending callchain. If more than one tick has elapsed
between the time the sample entry was marked for collection and
the time that the hardclock pmc handler runs to copy the records
to a larger temporary buffer, mark the sample entry as not in
use.
This changes reduces the number of samples marked as not valid
when collecting under load from ~99.5% to 5-20%.
Justin Hibbits [Mon, 5 Nov 2018 01:53:20 +0000 (01:53 +0000)]
powerpc/SMP: Don't spam the console with AP bringup messages
Especially on new POWER9 systems, the console can be filled with
SMP: AP CPU #XX launched
messages. This can also slow down the console printing. Instead, do what
x86 now does, as of r333335, and print it all on one line, unless
bootverbose is set.
There is no need to check if capdns is NULL.
If we will build the system without casper all cap_gethostaddr will be
replaced by the standard functions.
The getaddrinfo(3) and gethostbyname(3) are used to return the address for a
given hostname. The getnameinfo(3) and gethostbyaddr(3) are used to return
hostname for a given address. Right now in casper, we have two limitations:
- NAME which allows resolving DNS names.
- ADDR which allows to do revert DNS lookups.
Before this change the rights was mixed up:
NAME - getnameinfo(3) and gethostbyname(3)
ADDR - gethostbyaddr(3) and getaddrinfo(3)
Which no matters on limitation allowed us to resolve DNS names and do DNS
lookups basically by using a different set of functions.
Now the NAME type allows getaddrinfo(3) and gethostbyname (3)functions,
and the ADDR names allow to use gethostbyaddr(3) and getnameinfo(3) functions.
Ed Maste [Sun, 4 Nov 2018 19:21:12 +0000 (19:21 +0000)]
rtld: move relro enforcement after ifunc processing
Previously the combination of relro (implicit), -z now and ifunc use
resulted in a segfault when applying ifuncs after relro (test binary
here just calls amd64_get_fsbase()):
Eugene Grosbein [Sun, 4 Nov 2018 19:10:44 +0000 (19:10 +0000)]
Make ng_pptpgre(8) netgraph node be able to restore order for packets
reordered in transit instead of dropping them altogether.
It uses sequence numbers of PPtPGRE packets.
A set of new sysctl(8) added to control this ability or disable it:
net.graph.pptpgre.reorder_max (1) defines maximum length of node's
private reorder queue used to keep data waiting for late packets.
Zero value disables reordering. Default value 1 allows the node to restore
the order for two packets swapped in transit. Greater values allow the node
to deliver packets being late after more packets in sequence
at cost of increased kernel memory usage.
net.graph.pptpgre.reorder_timeout (1) defines time value in miliseconds
used to wait for late packets. It may be useful to increase this
if reordering spot is distant.
The writeRandomBytes_arc4random is not used if the arc4random_buf
is available. This caused compiler to throw warnings which are treated as
an error in libexpact.
Conrad Meyer [Sun, 4 Nov 2018 17:56:16 +0000 (17:56 +0000)]
Drop ed(1) "crypto"
You should not be using DES. You should not have been using DES for the
past 30 years.
The ed DES-CBC scheme lacked several desirable properties of a sealed
document system, even ignoring DES itself. In particular, it did not
provide the "integrity" cryptographic property (detection of tampering), and
it treated ASCII passwords as 64-bit keys (instead of using a KDF like
scrypt or PBKDF2).
Some general approaches ed(1) users might consider to replace the removed
DES mode:
1. Full disk encryption with something like AES-XTS. This is easy to
conceptualize, design, and implement, and it provides confidentiality for
data at rest. Like CBC, it lacks tampering protection. Examples include
GELI, LUKS, FileVault2.
The idea behind those functions is not to force consumers to remember that there
is a need to check errno on failure. We already have a caph_enter(3) function
which does the same for cap_enter(2).
It is unused after r340102, and more important, I do not see how to
define textsize in both practically useful and correct way, for binaries
with more that one executable segments.
The buffer is already zeroed in compile_rule() function, and also it
may contain configured F_NOT flag in o.len field. This fixes the filling
for "not icmp6types" opcode.
Ed Maste [Sat, 3 Nov 2018 19:31:11 +0000 (19:31 +0000)]
libcompat: disable retpoline when building build tools
These are built with the host toolchain which may not support retpoline.
While here, move the MK_ overrides to a separate line and sort them
alphabetically to support future changes.
MFC with: r339511
Sponsored by: The FreeBSD Foundation
Bjoern A. Zeeb [Sat, 3 Nov 2018 18:03:24 +0000 (18:03 +0000)]
Update the "flag" for draft-ietf-6man-ipv6only-flag.
Having the flag named "6" can possibly be a problem for configurations
where parsing strings and numbers can produce ambivalent results.
Rename the "6" flag to the "S"ix (or Silence-IPv4) flag.
Matt Macy [Sat, 3 Nov 2018 03:43:32 +0000 (03:43 +0000)]
Convert epoch to read / write records per cpu
In discussing D17503 "Run epoch calls sooner and more reliably" with
sbahra@ we came to the conclusion that epoch is currently misusing the
ck_epoch API. It isn't safe to do a "write side" operation (ck_epoch_call
or ck_epoch_poll) in the middle of a "read side" section. Since, by definition,
it's possible to be preempted during the middle of an EPOCH_PREEMPT
epoch the GC task might call ck_epoch_poll or another thread might call
ck_epoch_call on the same section. The right solution is ultimately to change
the way that ck_epoch works for this use case. However, as a stopgap for
12 we agreed to simply have separate records for each use case.
Alexander Motin [Sat, 3 Nov 2018 03:10:06 +0000 (03:10 +0000)]
9952 Block size change during zfs receive drops spill block
Replication code in receive_object() falsely assumes that if received
object block size is different from local, then it must be a new object
and calls dmu_object_reclaim() to wipe it out. In most cases it is not a
problem, since all dnode, bonus buffer and data block(s) are immediately
rewritten any way, but the problem is that spill block (if used) is not.
This means loss of ACLs, extended attributes, etc.
This issue can be triggered in very simple way:
1. create 4KB file with 10+ ACL entries;
2. take snapshot and send it to different dataset;
3. append another 4KB to the file;
4. take another snapshot and send incrementally;
5. witness ACL loss on receive side.
Ed Maste [Sat, 3 Nov 2018 01:53:26 +0000 (01:53 +0000)]
Remove apparently unused 0-byte files that cause grief on Windows
r235274 added a sort regression test (it operates by comparing output
against GNU sort). The commit included a number of 0-byte files, one
of which ends in a trailing . which reportedly breaks svn/git checkouts
on Windows.
It appears these were added accidentally, so just remove them.
Warner Losh [Sat, 3 Nov 2018 00:37:51 +0000 (00:37 +0000)]
Implement ability to turn on/off PHYs for AHCI devices.
As part of Chuck's work on fixing kernel crashes caused by disk I/O
errors, it is useful to be able to trigger various kinds of
errors. This patch allows causing an AHCI-attached disk to disappear,
by having the driver keep the PHY disabled when the driver would
otherwise enable the PHY. It also allows making the disk reappear by
having the driver go back to setting the PHY enable/disable state as
it normal would and simulating the hardware event that causes a bus
rescan.
Warner Losh [Fri, 2 Nov 2018 22:15:30 +0000 (22:15 +0000)]
Document r226775: tell why we omit usbus[0-9]+
tcpdump can capture packet traces from the usb bus. usbus[0-9] are
registered as ifnet devices so this can work. When these devices come
up, devd was trying to run pccard_ether on those interfaces, which
didn't exist and generated an error.
Ed Maste [Fri, 2 Nov 2018 21:20:46 +0000 (21:20 +0000)]
newvers.sh: fix git false positive -dirty tag
Assuming that any output from `git diff-index --name-only` implies
changes in the working tree results in false positives: files with
metadata, but not content, changes are also listed.
Check that content differences exist before adding the -dirty tag to
the git hash.
PR: 229230
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D15968
Ed Maste [Fri, 2 Nov 2018 20:48:29 +0000 (20:48 +0000)]
sys/types.h: avoid using terse macro _M
Although _M is reserved for use by the implemenation it is rather non-
descriptive and conflicted with a libc++ test. Just rename to _Major
and _Minor to avoid conflicts.
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16734
Kristof Provost [Fri, 2 Nov 2018 19:23:50 +0000 (19:23 +0000)]
pf: Fix build if INVARIANTS is not set
r340061 included a number of assertions pf_frent_remove(), but these assertions
were the only use of the 'prev' variable. As a result builds without
INVARIANTS had an unused variable, and failed.
Kristof Provost [Fri, 2 Nov 2018 16:53:15 +0000 (16:53 +0000)]
pfsync: Ensure uninit is done before pf
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.
Kristof Provost [Fri, 2 Nov 2018 16:50:17 +0000 (16:50 +0000)]
Notify that the ifnet will go away, even on vnet shutdown
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.
Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.
Kristof Provost [Fri, 2 Nov 2018 15:32:04 +0000 (15:32 +0000)]
pf: Limit the fragment entry queue length to 64 per bucket.
So we have a global limit of 1024 fragments, but it is fine grained to
the region of the packet. Smaller packets may have less fragments.
This costs another 16 bytes of memory per reassembly and devides the
worst case for searching by 8.
Kristof Provost [Fri, 2 Nov 2018 15:26:51 +0000 (15:26 +0000)]
pf: Split the fragment reassembly queue into smaller parts
Remember 16 entry points based on the fragment offset. Instead of
a worst case of 8196 list traversals we now check a maximum of 512
list entries or 16 array elements.
Kristof Provost [Fri, 2 Nov 2018 15:23:57 +0000 (15:23 +0000)]
pf: Count holes rather than fragments for reassembly
Avoid traversing the list of fragment entris to check whether the
pf(4) reassembly is complete. Instead count the holes that are
created when inserting a fragment. If there are no holes left, the
fragments are continuous.
Brooks Davis [Fri, 2 Nov 2018 14:42:36 +0000 (14:42 +0000)]
Make vop_symlink take a const target path.
This will enable callers to take const paths as part of syscall
decleration improvements.
Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.
Bump __FreeBSD_version for external API consumers.
Toomas Soome [Fri, 2 Nov 2018 11:41:58 +0000 (11:41 +0000)]
loader: biosdisk should check if the media is present
The bd_print/bd_open/bd_strategy need to make sure the device does have
media, before getting into performing IO operations. Some systems can
hung if the device without a media is accessed.