rrs [Thu, 7 Jun 2018 19:57:55 +0000 (19:57 +0000)]
Fix build issue with const and volatile and the
myriad ways that the various compliers treat this. The
only safe prefetch appears to be for AMD. The other
compilers either are not volatile or are not const :(
benno [Thu, 7 Jun 2018 18:59:32 +0000 (18:59 +0000)]
Break recursion involving getnewvnode and zfs_rmnode.
When we're at our vnode limit, getnewvnode will call into the vnode LRU
cache to free up vnodes. If the vnode we try to recycle is a ZFS vnode we
end up, eventually, in zfs_rmnode. If the ZFS vnode we're recycling
represents something with extended attributes, zfs_rmnode will call
zfs_zget which will attempt to allocate another vnode. If the next vnode we
try to recycle is also a ZFS vnode representing something with extended
attributes we can recurse further. This ends up being unbounded and can end
up overflowing the stack.
In order to avoid this, restructure zfs_rmnode to simply add the extended
attribute directory's object ID to the unlinked set, thus not requiring the
allocation of a vnode. We then schedule a task that calls zfs_unlinked_drain
which will do the work of properly marking the vnodes for unlinking.
zfs_unlinked_drain is also called on mount so these will be cleaned up
there.
kevans [Thu, 7 Jun 2018 18:38:48 +0000 (18:38 +0000)]
bsdgrep(1): Don't initialize fts_flags twice
Admittedly, this is a clang-scan complaint... but it wasn't wrong. fts_flags
is initialized by all cases in the switch(), which should be fairly obvious.
Annotate this anyways.
kevans [Thu, 7 Jun 2018 18:27:58 +0000 (18:27 +0000)]
bsdgrep(1): Do some less dirty things with return types
Neither procfile nor grep_tree return anything meaningful to their callers.
None of the callers actually care about how many lines were matched in all
of the files they processed; it's all about "did anything match?"
This is generally just a light refactoring to remind me of what actually
matters as I'm rewriting these bits to care less about 'stuff'.
marius [Thu, 7 Jun 2018 18:24:25 +0000 (18:24 +0000)]
- Once we have shifted arguments up to thrice, base-bits-dir is $1 rather
than $4. Introduce $BASEBITSDIR for clarity and to avoid repeating this
mistake in the future. Fixing this ensures that we pick up newly built
boot bits native to the target rather for/from the host.
- Apply some of the argument quoting fixes done in r287635 but missing in
later revisions.
rrs [Thu, 7 Jun 2018 18:18:13 +0000 (18:18 +0000)]
This commit brings in a new refactored TCP stack called Rack.
Rack includes the following features:
- A different SACK processing scheme (the old sack structures are not used).
- RACK (Recent acknowledgment) where counting dup-acks is no longer done
instead time is used to knwo when to retransmit. (see the I-D)
- TLP (Tail Loss Probe) where we will probe for tail-losses to attempt
to try not to take a retransmit time-out. (see the I-D)
- Burst mitigation using TCPHTPS
- PRR (partial rate reduction) see the RFC.
Once built into your kernel, you can select this stack by either
socket option with the name of the stack is "rack" or by setting
the global sysctl so the default is rack.
Note that any connection that does not support SACK will be kicked
back to the "default" base FreeBSD stack (currently known as "default").
To build this into your kernel you will need to enable in your
kernel:
makeoptions WITH_EXTRA_TCP_STACKS=1
options TCPHPTS
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15525
kib [Thu, 7 Jun 2018 17:04:34 +0000 (17:04 +0000)]
Account for dmap limit when selecting the pages for the bootstrap
pagetables.
physmap[] can be inconsistent with the physical memory limit due to
buggy bios, or to the hw.physmem tunable. Since bootstrap pagetables
are initialized by accesses through the DMAP, we must ensure that DMAP
really cover the selected pages. This is only relevant when machine
has less than 4G RAM and buggy BIOS, which is the combination on Acer
Chromebook 720.
The call to mp_bootaddress() is moved later to have Maxmem initialized.
An alternative could be to always cover 4G for DMAP, but this change
seems to be simpler.
Reported and tested by: grembo
Reviewed by: royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D15675
bdrewery [Thu, 7 Jun 2018 16:16:22 +0000 (16:16 +0000)]
Stop using head(1) which is not available in installworld.
installworld should not be executing this anyhow but there is some
obscure case doing it still. The head(1) binary is not part of
ITOOLS and there's no need to add it.
avg [Thu, 7 Jun 2018 14:46:52 +0000 (14:46 +0000)]
x86: reorganize code that deals with unexpected NMI-s
Expected NMI-s are those than are either generated by the software (such
as a CPU sending NMI to other CPU) or generated by the hardware after
the software configured it to do so (such as NMI-s on PMC events).
Some unexpected NMI-s can be caused by hardware failures and it is
possible to inquire the hardware about them (somewhat like MCA but much
more primitive) using an EISA mechanism. In some cases the origin of
the NMI can remain truly unknown.
This commit should not change any functionality. It just reorganizes
the code, so that it is easier to extend with new checks for the origin
of the NMI. Also, it frees the code that has nothing to do with ISA
from DEV_ISA.
leitao [Thu, 7 Jun 2018 13:57:34 +0000 (13:57 +0000)]
md: use prestaged mfs_root
On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs
into memory and set two fdt entries to describe where the file is located in
the memory;
I need to pass this memory region to the md device as a mfs_root, but, current
md driver does not support two things:
* Just getting a pointer from an external (bootloader) memory. If I need to
workaround it, I would need to declare a static array and memcopy from this
external memory to this static variable.
* The size of the image. The usage of mfs_root_end, which is not a pointer,
seems to be not possible for this prestaged scenario.
This patch simply adds a new way to load mfs_root from memory.
jtl [Thu, 7 Jun 2018 13:29:54 +0000 (13:29 +0000)]
Fix a typo in vm_domain_set(). When a domain crosses into the severe range,
we need to set the domain bit from the vm_severe_domains bitset (instead
of clearing it).
Reviewed by: jeff, markj
Sponsored by: Netflix, Inc.
If the ipfw module is not loaded the net.inet.ip.fw.enable OID does not exist,
which leads the script to report errors and incorrectly report that ipfw is
enabled.
jhibbits [Thu, 7 Jun 2018 04:02:09 +0000 (04:02 +0000)]
Print Maximum Data Transfer Size as a long rather than int
PowerPC has PAGE_SIZE as a long, not an int. This causes the compiler to throw
a format mismatch warning on this print. To work around the difference, print
it as a long instead of an int, and force the argument to a long.
alc [Thu, 7 Jun 2018 02:54:11 +0000 (02:54 +0000)]
pidctrl_daemon() implements a variation on the classical, discrete PID
controller that tries to handle early invocations of the controller,
in other words, invocations before the expected end of the interval.
However, there were some calculation errors in this early invocation
case. Notably, if an early invocation occurred while the error was
negative, the derivative term was off by a large amount. One visible
effect of this error was that processes were being killed by the
virtual memory system's OOM killer when in fact there was plentiful
free memory.
Correct a couple minor errors in the sysctl descriptions, and apply
some style fixes.
mmacy [Thu, 7 Jun 2018 02:03:22 +0000 (02:03 +0000)]
pmc: convert native to jsonl and track TSC value of samples
- add '-j' options to filter to enable converting native pmc
log format to json lines format to enable the use of scripts
and external tooling
% pmc filter -j pmc.log pmc.jsonl
- Record the tsc value in sampling interrupts as opposed to
recording nanotime when the sample is copied to a global log
in hardclock - potentially many milliseconds later.
- At initialize record the tsc_freq and the time of day to give
us an offset for translating the tsc values in callchain records
jhibbits [Thu, 7 Jun 2018 00:24:10 +0000 (00:24 +0000)]
Add partition scheme for mpc85xx devices
Some mpc85xx devices with u-boot need MBR partitioning with a FAT boot
partition. Since the infrastructure is already in place to have a dedicated
boot partition, this adds the necessary bits to use that infrastructure with
mpc85xx boards.
cem [Wed, 6 Jun 2018 20:09:21 +0000 (20:09 +0000)]
strcpy.3: Improve legibility and clarity
In the DESCRIPTION, put the more commonly used functions first in the
corresponding sentence, to help catch the eye.
Pull out the note about overlapping buffers to its own paragraph, as it
applies to all routines documented by this page.
Emphasize the potentially surprising strncpy(3) behavior of zero-filling the
remainder of a buffer larger than the source string.
Encourage strlcpy use; remove portability note about strlcpy(3). Adapting a
strlcpy-using code base to a platform that does not provide strlcpy in libc
is so trivial as to not be worth mentioning. (Just copy strlcpy.c out of
any BSD libc, or include and link the pre-packaged libbsd library on non-BSD
platforms.)
Likewise, expand the page's warning about ease of potential misuse to cover
all functions documented herein, and explicitly suggest using strlcpy most
of the time. The text was mostly cribbed from a similar suggestion in
gets(3).
Finally, document the remaining valid use of strncpy -- the rare
fixed-length record with no expectation of nul-termination.
pf: Return non-zero from 'status' if pf is not enabled
In the pf rc.d script the output of `/etc/rc.d/pf status` or `/etc/rc.d/pf
onestatus` always provided an exit status of zero. This made it fiddly to
programmatically determine if pf was running or not.
Return a non-zero status if the pf module is not loaded, extend pfctl to have
an option to return an error status if pf is not enabled.
PR: 228632
Submitted by: James Park-Watt <jimmypw AT gmail.com>
MFC after: 1 week
tuexen [Wed, 6 Jun 2018 19:27:06 +0000 (19:27 +0000)]
Improve compliance with RFC 4895 and RFC 6458.
Silently dicard SCTP chunks which have been requested to be
authenticated but are received unauthenticated no matter if support
for SCTP authentication has been negotiated. This improves compliance
with RFC 4895.
When the application uses the SCTP_AUTH_CHUNK socket option to
request a chunk to be received in an authenticated way, enable
the SCTP authentication extension for the end-point. This improves
compliance with RFC 6458.
kevans [Wed, 6 Jun 2018 18:28:17 +0000 (18:28 +0000)]
lualoader: Add a loaded hook for others to execute upon config load
This will not be executed on reload, though later work could allow for that.
It's intended/expected that later work won't generally need to happen on
every config load, just once (for, e.g., menu initialization) or just when
config is reloaded but not upon the initial load.
sbruno [Wed, 6 Jun 2018 15:45:57 +0000 (15:45 +0000)]
Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).
This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.
hselasky [Wed, 6 Jun 2018 15:06:21 +0000 (15:06 +0000)]
Rename two structure field members while keeping backwards compatibility in
the LinuxKPI. Add a comment saying in which Linux version this change was made.
Make in_delayed_cksum() be similar to IPv6 implementation.
Use m_copyback() function to write checksum when it isn't located
in the first mbuf of the chain. Handmade analog doesn't handle the
case when parts of checksum are located in different mbufs.
Also in case when mbuf is too short, m_copyback() will allocate new
mbuf in the chain instead of making out of bounds write.
Also wrap long line and remove now useless KASSERTs.
thj [Wed, 6 Jun 2018 07:04:40 +0000 (07:04 +0000)]
Use UDP len when calculating UDP checksums
The length of the IP payload is normally equal to the UDP length, UDP Options
(draft-ietf-tsvwg-udp-options-02) suggests using the difference between IP
length and UDP length to create space for trailing data.
Correct checksum length calculation to use the UDP length rather than the IP
length when not offloading UDP checksums.
ian [Tue, 5 Jun 2018 22:13:45 +0000 (22:13 +0000)]
Remove comments and assertions that are no longer valid after r330809.
r330809 replaced duplication of devdesc struct fields with an embedded copy
of the devdesc struct, to avoid fragility. That means all the scattered
comments indicating that structs must match are no longer valid. Likewise
asserts that attempted to mitigate some of the old fragility.
Rework if_gif(4) to use new encap_lookup_t method to speedup lookup
of needed interface when many gif interfaces are present.
Remove rmlock from gif_softc, use epoch(9) and CK_LIST instead.
Move more AF-related code into AF-related locations.
Use hash table to speedup lookup of needed softc. Interfaces
with GIF_IGNORE_SOURCE flag are stored in plain CK_LIST.
Sysctl net.link.gif.parallel_tunnels is removed. The removal was planed
16 years ago, and actually it could work only for outbound direction.
Each protocol, that can be handled by if_gif(4) interface is registered
by separate encap handler, this helps avoid invoking the handler
for unrelated protocols (GRE, PIM, etc.).
This change allows dramatically improve performance when many gif(4)
interfaces are used.
Currently it has several disadvantages:
- it uses single mutex to protect internal structures. It is used by
data- and control- path, thus there are no parallelism at all.
- it uses single list to keep encap handlers for both INET and INET6
families.
- struct encaptab keeps unneeded information (src, dst, masks, protosw),
that isn't used by code in the source tree.
- matches are prioritized and when many tunneling interfaces are
registered, encapcheck handler of each interface is invoked for each
packet. The search takes O(n) for n interfaces. All this work is done
with exclusive lock held.
What this patch includes:
- the datapath is converted to be lockless using epoch(9) KPI.
- struct encaptab now linked using CK_LIST.
- all unused fields removed from struct encaptab. Several new fields
addedr: min_length is the minimum packet length, that encapsulation
handler expects to see; exact_match is maximum number of bits, that
can return an encapsulation handler, when it wants to consume a packet.
- IPv6 and IPv4 handlers are stored in separate lists;
- added new "encap_lookup_t" method, that will be used later. It is
targeted to speedup lookup of needed interface, when gif(4)/gre(4) have
many interfaces.
- the need to use protosw structure is eliminated. The only pr_input
method was used from this structure, so I don't see the need to keep
using it.
- encap_input_t method changed to avoid using mbuf tags to store softc
pointer. Now it is passed directly trough encap_input_t method.
encap_getarg() funtions is removed.
- all sockaddr structures and code that uses them removed. We don't have
any code in the tree that uses them. All consumers use encap_attach_func()
method, that relies on invoking of encapcheck() to determine the needed
handler.
- introduced struct encap_config, it contains parameters of encap handler
that is going to be registered by encap_attach() function.
- encap handlers are stored in lists ordered by exact_match value, thus
handlers that need more bits to match will be checked first, and if
encapcheck method returns exact_match value, the search will be stopped.
- all current consumers changed to use new KPI.
ian [Tue, 5 Jun 2018 17:18:10 +0000 (17:18 +0000)]
Make the v*printf() functions in libsa return int instead of void.
This makes them compatible with the C standard signatures, avoiding
spurious mismatch errors in the places where the oddball requirements
of standalone code end up putting two declarations of the same function
in play.
hselasky [Tue, 5 Jun 2018 15:37:28 +0000 (15:37 +0000)]
Add "access" function pointer to the "vm_operations_struct" structure
in the LinuxKPI. While at it document when to use the "virtual_address" or
the "address" field in the "vm_fault" structure.
ram [Tue, 5 Jun 2018 15:05:26 +0000 (15:05 +0000)]
Issue:
Utility hangs when OCS_IOCTL_CMD_MGMT_GET_ALL called in parallel on port 0 and port 1.
Fix: Using static structure for results is corrupting the second ioctl request. Removed static for results structure.
Approved by: ken
MFC after: 3 days
kevlo [Tue, 5 Jun 2018 05:24:42 +0000 (05:24 +0000)]
Since we don't enable BUF_TRACKING and FULL_BUF_TRACKING buffer debugging
options in GENERIC kernels on arm and arm64, there's no need to disable
them.
mmacy [Tue, 5 Jun 2018 04:26:40 +0000 (04:26 +0000)]
hwpmc: log name->pid, name->tid mappings
By logging all threads and processes 'pmc filter'
can now filter on process or thread name, relieving
the user of the burden of determining which tid or
pid was which when the sample was taken.