MFC r331758: makefs: sync fragment and block size with newfs
r222319 in newfs raised the default blocksize for UFS/FFS filesystems
from 16K to 32K and the default fragment size from 2K to 4K, with a
rationale that most disks were now running with 4K sectors.
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
After multiple start/stop of netmap, ixgbe will get into a bad state
requiring a reboot to recover. Adding a delay before stopping the interface
appears to work around the issue.
The -CURRENT driver has diverged too far from -STABLE for an MFC.
MFC 328101,328911: Require SHF_ALLOC for kernel object module sections.
328101:
Require the SHF_ALLOC flag for program sections from kernel object modules.
ELF object files can contain program sections which are not supposed
to be loaded into memory (e.g. .comment). Normally the static linker
uses these flags to decide which sections are allocated to loadable
program segments in ELF binaries and shared objects (including kernels
on all architectures and kernel modules on architectures other than
amd64).
Mapping ELF object files (such as amd64 kernel modules) into memory
directly is a bit of a grey area. ELF object files are intended to be
used as inputs to the static linker. As a result, there is not a
standardized definition for what the memory layout of an ELF object
should be (none of the section headers have valid virtual memory
addresses for example).
The kernel and loader were not checking the SHF_ALLOC flag but loading
any program sections with certain types such as SHT_PROGBITS. As a
result, the kernel and loader would load into RAM some sections that
weren't marked with SHF_ALLOC such as .comment that are not loaded
into RAM for kernel modules on other architectures (which are
implemented as ELF shared objects). Aside from possibly requiring
slightly more RAM to hold a kernel module this does not affect runtime
correctness as the kernel relocates symbols based on the layout it
uses.
Debuggers such as gdb and lldb do not extract symbol tables from a
running process or kernel. Instead, they replicate the memory layout
of ELF executables and shared objects and use that to construct their
own symbol tables. For executables and shared objects this works
fine. For ELF objects the current logic in kgdb (and probably lldb
based on a simple reading) assumes that only sections with SHF_ALLOC
are memory resident when constructing a memory layout. If the
debugger constructs a different memory layout than the kernel, then it
will compute different addresses for symbols causing symbols in the
debugger to appear to have the wrong values (though the kernel itself
is working fine). The current port of mdb does not check SHF_ALLOC as
it replicates the kernel's logic in its existing kernel support.
The bfd linker sorts the sections in ELF object files such that all of
the allocated sections (sections with SHF_ALLOCATED) are placed first
followed by unallocated sections. As a result, when kgdb composed a
memory layout using only the allocated sections, this layout happened
to match the layout used by the kernel and loader. The lld linker
does not sort the sections in ELF object files and mixed allocated and
unallocated sections. This resulted in kgdb composing a different
memory layout than the kernel and loader.
We could either patch kgdb (and possibly in the future lldb) to use
custom handling when generating memory layouts for kernel modules that
are ELF objects, or we could change the kernel and loader to check
SHF_ALLOCATED. I chose the latter as I feel we shouldn't be loading
things into RAM that the module won't use. This should mostly be a
NOP when linking with bfd but will allow the existing kgdb to work
with amd64 kernel modules linked with lld.
Note that we only require SHF_ALLOC for "program" sections for types
like SHT_PROGBITS and SHT_NOBITS. Other section types such as symbol
tables, string tables, and relocations must also be loaded and are not
marked with SHF_ALLOC.
328911:
Ignore relocation tables for non-memory-resident sections.
As a followup to r328101, ignore relocation tables for ELF object
sections that are not memory resident. For modules loaded by the
loader, ignore relocation tables whose associated section was not
loaded by the loader (sh_addr is zero). For modules loaded at runtime
via kldload(2), ignore relocation tables whose associated section is
not marked with SHF_ALLOC.
MFC r328988,r328989:
Rework ipfw dynamic states implementation to be lockless on fast path.
o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.
MFC r331668:
Rework ipfw rules parsing and printing code.
Introduce show_state structure to keep information about printed opcodes.
Split show_static_rule() function into several smaller functions. Make
parsing and printing opcodes into several passes. Each printed opcode
is marked in show_state structure and will be skipped in next passes.
Now show_static_rule() function is simple, it just prints each part
of rule separately: action, modifiers, proto, src and dst addresses,
options. The main goal of this change is avoiding occurrence of wrong
result of `ifpw show` command, that can not be parsed by ipfw(8).
Also now it is possible to make some simple static optimizations
by reordering of opcodes in the rule.
MFC r329388, r331441 and r331898, to bring the -CURRENT ck version.
r329388:
Define CK_MD_TSO for the relevant arches (i386, amd64 and sparc64).
Defaulting to CK_MD_RMO has the unfortunate side effect of generating
memory barriers that are useless on those arches, and the even more
unfortunate side effect of generating lfence/sfence/mfence on i386, even
if older CPUs don't support it.
This should fix the panic reported when using IPFW on a Pentium 3.
Note that mfence and sfence might still be used in a few case, but that
shouldn't happen in FreeBSD right now, and should be fixed upstream first.
r331441:
In __sync_bool_compare_and_swap(), return true if the returned value is the
same as the expected one, not the desired one.
r331898:
Import CK as of commit b19ed4c6a56ec93215ab567ba18ba61bf1cfbac8
It should fix ck_pr_[load|store]_ptr on mips and riscv, make sure no
*fence instructions are used on i386, as older cpus don't support it, and
make sure we don't rely on gcc builtins that can lead to calls to
libatomic when linked with -O0.
When /etc/rc runs all /etc/rc.d scripts, it has already loaded /etc/rc.subr
but each /etc/rc.d script sources it again (since /etc/rc.d scripts must
also work when started stand-alone).
Therefore, if rc.subr is already loaded, return so sh need not parse the
rest of the file.
A second effect is that there is no longer a compound command around most of
rc.subr. This reduces memory usage while sh is loading rc.subr for the first
time (but this memory is free()d once rc.subr is loaded).
For purposes of porting this to other systems, I do not recommend porting
this to systems with shells that do not have the change to the return
special builtin like in r255215 (before FreeBSD 10.0-RELEASE). This change
ensures that return in the top level of a dot script returns from the dot
script, even if the dot script was sourced from a function.
A comparison of CPU time on an amd64 bhyve virtual machine from a times
command added near the end of /etc/rc, all four values summed:
x orig1
+ quickreturn
+--------------------------------------------------------------------------+
| + + + x x x|
||______M__A_________| |______M___A__________| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 3 1.704 1.802 1.726 1.744 0.051419841
+ 3 1.467 1.559 1.487 1.5043333 0.048387326
Difference at 95.0% confidence
-0.239667 +/- 0.113163
-13.7424% +/- 6.48873%
(Student's t, pooled s = 0.0499266)
r324625:
rc.subr: Remove test that is always true.
The code above always sets _pidcmd to a non-empty value.
Revert r331880, MFC of r328331 and bump FreeBSD_version
There are logistics issues that weren't considered when this was originally
MFC'd. All rc scripts in ports need audited (this is in progress) for usage
of ${name}_limits that doesn't line up with the new interpretation, and
individual rc.conf(5)'s need to be scrubbed of usage that doesn't line up.
It's since been decided that it should be left for a feature in 12.
1101514 introduced interpretation of ${name}_limits for rc scripts; this
feature no longer exists as of 1101515.
MFC r326641 by bapt: Split body of mails not respecting RFC2822
For mails which has a body not respecting RFC2822 (which often happen with
crontabs) try to split by words finding the last space before 1000's
character
If no spaces are found then consider the mail to be malformed anyway
I missed that this was an assignment (a bad pattern, use another
member) on i386. As wl(4) is i386 only and gone in head, just
expand the ifr_ifru member rather than adding an accessor.
ifconf(): correct handling of sockaddrs smaller than struct sockaddr.
Portable programs that use SIOCGIFCONF (e.g. traceroute) assume
that each pseudo ifreq is of length MAX(sizeof(struct ifreq),
sizeof(ifr_name) + ifr_addr.sa_len). For short sockaddrs we copied
too much from the source sockaddr resulting in a heap leak.
I believe only one such sockaddr exists (struct sockaddr_sco which
is 8 bytes) and it is unclear if such sockaddrs end up on interfaces
in practice. If it did, the result would be an 8 byte heap leak on
current architectures.
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.
MFC r331969, r332035:
pthread.h: drop nullability attributes.
These have been found to be practically useless. We were actually
following the Android bionic library and had some interest in replicating
the same warnings and behaviour but Android has since removed them.
We are still keeping some uses of nullability attributes in other headers,
somewhat in line with Apple's libc.
MFC r330110: Add kernel retpoline option for amd64
Retpoline is a compiler-based mitigation for CVE-2017-5715, also known
as Spectre V2, that protects against speculative execution branch target
injection attacks.
In this commit it is disabled by default, but will be changed in a
followup commit.
MFC r330962: Remove KERNEL_RETPOLINE from BROKEN_OPTIONS on i386
Clang will compile both amd64 and i386 with retpoline.
The previous split of zeroing ifr_name and ifr_addr seperately is safe
on current architectures, but would be unsafe if pointers were larger
than 8 bytes. Combining the zeroing adds no real cost (a few
instructions) and makes the security property easier to verify.
GC never enabled support for SIOCGADDRROM and SIOCGCHIPID.
When de(4) was imported in 1997 the world was not ready for these ioctls.
In over 20 years that hasn't changed so it seems safe to assume their
time will never come.
Make all kernel accesses to ifru_buffer go via access functions
which take the process ABI into account and use an appropriate union
to access members in the correct place in struct ifreq.
r331654:
Don't access userspace directly from the kernel in nxge(4).
Update to what the previous code seemed to be doing via the correct
interfaces. Further issues exist in xge_ioctl_registers(), but this is
debugging code in a driver that has few users and they don't appear to
be crashes or leaks.
r331869:
Fix the build on arches with default unsigned char. Capture the fubyte()
return value in an int as well as the char, and test the full int value
for fubyte() failure.
Set the inp_vflag consistently for accepted TCP/IPv6 connections when
net.inet6.ip6.v6only=0.
Without this patch, the inp_vflag would have INP_IPV4 and the
INP_IPV6 flags for accepted TCP/IPv6 connections if the sysctl
variable net.inet6.ip6.v6only is 0. This resulted in netstat
to report the source and destination addresses as IPv4 addresses,
even they are IPv6 addresses.
When using SCTP for sending probe packets, use INIT chunks for payloads
larger than or equal to 32 bytes. For smaller probe packets, keep using
SHUTDOWN-ACK chunks, possibly bundled with a PAD chunk.
Packets with INIT chunks more likely pass through firewalls. Therefore,
use them when possible.
Ensure that the vnet is set when calling pru_sockaddr() and
pru_peeraddr().
This is already true when called via kern_getsockname() and
kern_getpeername(). This patch sets it also, when they arecalled
via soo_fill_kinfo(). This is necessary, since the corresponding
functions for SCTP require the vnet to be set. Without this,
if a process having an wildcard bound SCTP socket is
terminated and a core is written, the kernel panics.
Add to ipfw support for sending an SCTP packet containing an ABORT chunk.
This is similar to the TCP case. where a TCP RST segment can be sent.
There is one limitation: When sending an ABORT in response to an incoming
packet, it should be tested if there is no ABORT chunk in the received
packet. Currently, it is only checked if the first chunk is an ABORT
chunk to avoid parsing the whole packet, which could result in a DOS attack.
Thanks to Timo Voelker for helping me to test this patch.
MFC r327200:
When adding support for sending SCTP packets containing an ABORT chunk
to ipfw in https://svnweb.freebsd.org/changeset/base/326233,
a dependency on the SCTP stack was added to ipfw by accident.
This was noted by Kevel Bowling in https://reviews.freebsd.org/D13594
where also a solution was suggested. This patch is based on Kevin's
suggestion, but implements the required SCTP checksum computation
without any dependency on other SCTP sources.
While there, do some cleanups and improve comments.
Thanks to Kevin Kevin Bowling for reporting the issue and suggesting
a fix.
This option was used in the early days to allow performance measurements
extrapolating the use of SCTP checksum offloading. Since this feature
is now available, get rid of this option.
This also un-breaks the LINT kernel. Thanks to markj@ for making me
aware of the problem.
Fix the handling of ERROR chunks which a lot of error causes.
While there, clean up the code.
Thanks to Felix Weinrank who found the bug by using fuzz-testing
the SCTP userland stack.
Cleanup the handling of control chunks. While there fix some minor
bug related to clearing the assoc retransmit counter and the dup TSN
handling of NR-SACK chunks.
Fix an accounting bug where data was counted twice if on the read
queue and on the ordered or unordered queue.
While there, improve the checking in INVARIANTs when computing the
a_rwnd.
Fix parsing error when processing cmsg in SCTP send calls. The bug is
related to a signed/unsigned mismatch.
This should most likely fix the issue in sctp_sosend reported by
Dmitry Vyukov on the freebsd-hackers mailing list and found by
running syzkaller.
Fix a bug which avoided that rules for matching port numbers for SCTP
packets where actually matched.
While there, make clean in the man-page that SCTP port numbers are
supported in rules.
* Update function definitions.
* Ensure that the datalen always describes the length after the IPv6
header consistently, not matter which protocol us used for probes..
* Document that the default length is 20, not 12.
* Don't send inormation in probe packets which is not needed or
even checked when the responses are processed.
* Address CID 978587.
This is mainly a cleanup preparing the addition of SCTP and TCP
as possible probe packet protocols.