MFC 302899: Add documentation for the sigevent structure.
- Add a sigevent(3) manpage to give a general overview of the sigevent
structure and the available notification mechanisms.
- Document that AIO requests contain a nested sigevent structure that can
be used to request completion notification.
- Expand the sigevent details in other manuals to note details such as
the extra values stored in a queued signal's information or in a posted
kevent.
_Unwind_Exception is required to be double word aligned. GCC has
interpreted this to mean "use the maximum useful alignment for the
target" so follow that lead.
MFC 302860: Fix aio system call wrappers in librt.
- Update aio_return/waitcomplete wrappers for the ssize_t return type.
- Fix the aio_return() wrapper to fail with EINVAL on a pending job.
This matches the semantics of the in-kernel system call. Also,
aio_return() returns errors via errno, not via the return value.
Approved by: re (gjb)
Sponsored by: Chelsio Communications
MFC r303157: libcxxrt: add padding in __cxa_allocate_* to fix alignment
The addition of the referenceCount to __cxa_allocate_exception put the
unwindHeader at offset 0x58 in __cxa_exception, but it requires 16-byte
alignment. In order to avoid changing the current __cxa_exception ABI
(and thus breaking its consumers), add explicit padding in the
allocation routines (and account for it when freeing).
This is intended as a lower-risk change for FreeBSD 11. A "more correct"
fix should be prepared for upstream and -CURRENT.
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
MFC r302567:
In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called,
if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate().
MFC r302904:
Fix a bug which results in a core dump when running netstat with
the -W option and having a listening SCTP socket.
The bug was introduced in r279122 when adding support for libxo.
MFC r302907:
When calling netstat -Laptcp the local address values are not aligned
with the corresponding entry in the table header. r295136
increased the value width from 14 to 32 without the corresponding
change to the table header. This commit adds the change to the table
header width.
MFC r302917:
Ensure that the -a, -W, -L options for SCTP behave similar
as for TCP.
MFC r302928:
Address a potential memory leak found a the clang static code analyzer
running on the userland stack.
MFC r302930:
Don't free a data chunk twice.
Found by the clang static code analyzer running for the userland stack.
MFC r302935:
Deal with a portential memory allocation failure, which was reported
by the clang static code analyzer.
Joint work with rrs@.
MFC r302942:
Add missing sctps_reasmusrmsgs counter.
Joint work with rrs@.
MFC r302945:
Don't duplicate code for SCTP, just use the ones used for UDP and TCP.
This fixes a bug with link local addresses. This will require and
upcoming change in the kernel to bring SCTP to the same behaviour
as UDP and TCP.
MFC r302949:
Fix the PR-SCTP behaviour.
This is done by rrs@.
MFC r302950:
Add a constant required by RFC 7496.
MFC r303024:
netstat and sockstat expect the IPv6 link local addresses to
have an embedded scope. So don't recover.
MFC r303025:
Use correct order of conditions to avoid NULL deref.
MFC r303073:
Fix a bug in deferred stream reset processing which results
in using a length field before it is set.
Thanks to Taylor Brandstetter for reporting the issue and
providing a fix.
r302450: libunwind: update to upstream snapshot r272680
The key improvement is that it may be built without cross-unwinding
support, which significantly reduces the stack space requirement.
r302456: libunwind: enable only the native unwinder by default
This significantly reduces stack space requirements, and runtimes
require only native unwinding.
r302475: libunwind: limit stack usage in unwind cursor
This may be reworked upstream but in the interim should address the
stack usage issue reported in the PR.
r303016: llvm-libunwind: use conventional (non-Darwin) X86 register numbers
For historical reasons Darwin/i386 has ebp and esp swapped in the
eh_frame register numbering. That is:
Darwin Other
Reg # eh_frame eh_frame DWARF
===== ======== ======== =====
4 ebp esp esp
5 esp ebp ebp
Although the UNW_X86_* constants are not supposed to be coupled to
DWARF / eh_frame numbering they are currently conflated in LLVM
libunwind, and thus we require the non-Darwin numbering.
PR: 206384
Approved by: re (kib)
Sponsored by: The FreeBSD Foundation
MFC r303031: clang++: Always use --eh-frame-hdr on FreeBSD, even for -static
FreeBSD uses LLVM's libunwind on FreeBSD/arm64 today (and we expect to
use it more widely in the future) and it requires the EH frame segment
in static binaries.
MFC r302980
Break up vm_fault()'s implementation of the read-ahead and delete-behind
optimizations into two distinct pieces. The first piece consists of the
code that should only be performed once per page fault and requires the
map to be locked. The second piece consists of the code that should be
performed each time a pager is called on an object in the shadow chain.
(This second piece expects the map to be unlocked.)
Previously, the entire implementation could be executed multiple times.
Moreover, the second and subsequent executions would occur with the map
unlocked. Usually, the ensuing unsynchronized accesses to the map were
harmless because the map was not changing. Nonetheless, it was possible
for a use-after-free error to occur, where vm_fault() wrote to a freed
map entry. This change corrects that problem.
Reduce the disc1.iso size from 850+M to just over 650M.
As a result of this change, the 'kernel-dbg.txz' distribution
is no longer provided on disc1.iso, and deselected by default
in bsdinstall(8). When 'kernel-dbg.txz' is selected, network
configuration happens before the installer proceeds, to fetch
the distribution from the mirrors.
This is a direct commit to stable/11, as there is intention
to solve this differently for 12.0-RELEASE.
Reviewed by: nwhitehorn (glanced at)
Approved by: re (hrs)
Sponsored by: The FreeBSD Foundation
Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.
While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's
Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.
READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
MFC r302772: re-apply r299908: zfsctl_snapdir_lookup: clear VV_ROOT of
snapshot's root
The change has been undone in r301275 on the assumption that it was no
longer required. But that was incorrect, because in this case (and only
in this case) the snapshot root vnode is looked up before z_parent is
fixed up.
Fix problems in the FQ-PIE AQM cleanup code that could leak memory or
cause a crash.
Because dummynet calls pie_cleanup() while holding a mutex, pie_cleanup()
is not able to use callout_drain() to make sure that all callouts are
finished before it returns, and callout_stop() is not sufficient to make
that guarantee. After pie_cleanup() returns, dummynet will free a
structure that any remaining callouts will want to access.
Fix these problems by allocating a separate structure to contain the
data used by the callouts. In pie_cleanup(), call callout_reset_sbt()
to replace the normal callout with a cleanup callout that does the cleanup
work for each sub-queue. The instance of the cleanup callout that
destroys the last flow will also free the extra allocated block of memory.
Protect the reference count manipulation in the cleanup callout with
DN_BH_WLOCK() to be consistent with all of the other usage of the reference
count where this lock is held by the dummynet code.
Don't check the area that the host has not filled.
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209443
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210425
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Reviewed by: sephe, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6955
302605
hyperv/stor: Save the response status and xfer length properly.
The current command response handling discards status and xfer
length unconditionally, so that all of the commands would be
considered successful, even if errors happened. When errors
really happens, this causes all kinds of wiredness, since the
buffer will not be filled on the host side and sense data will
be ignored.
Most of the time, errors do not happen, however, error does
happen for the request sent immediately after the disk resizing.
Discarding the SCSI status (SCSI_STATUS_CHECK_COND) and sense
data (capacity changes) prevents the disk resizing from working
properly.
This commit saves the response status and xfer length properly
for later use.
Submitted by: Dexuan Cui <decui microsoft com>
Noticed by: sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D7181
ed [Tue, 12 Jul 2016 06:25:28 +0000 (06:25 +0000)]
MFC r302448:
Don't forget to set sa->narg for CloudABI system calls.
It turns out that this value is not used within the system call code
under normal conditions, except when using tracing tools like ktrace.
If we forget to set this value, it is set to random garbage. This may
cause ktrace to hang indefinitely, making it impossible to kill.
Approved by: re@
Reported by: Michael Plass
PR: 210800
Allow - in distribution names. This is needed for freebsd-update to
work with 11.0+, where the debugging symbols use a new naming scheme
for release distribution files.
lang/gcc{48,49,5} lacks -fformat-extensions support (causing build errors, which
is what prompted r302403 to be committed). devel/amd64-gcc on the other hand
(which is used by Jenkins), has the support.
This fixes the Jenkins failure emails due to excessive warnings being produced
with "make buildkernel".
Fix ahci(4) driver attach to controller with 32 ports.
Incorrect sign expansion in variables that supposed to be a bit fields
caused infinite loop. Fixing this allows system properly detect maximal
possible 32 devices configured on AHCI HBA of BHyVe. That case did not
happen in a wild before due to lack of hardware AHCI HBAs with 32 ports.
fcntl(2): Document interrupt/restart for file locks.
Since r302216, thread suspension causes advisory file locks to restart
(instead of continuing to wait) and for a long time SA_RESTART has
affected advisory file locks. These are both not compliant to POSIX.1.
To clarify that restarting means something, add a paragraph about fair
queuing. Note that the network lock manager does not implement fair
queuing.
Reviewed by: kib (previous version)
Approved by: re (gjb)
Change the type of the map entry's next_read field from a vm_pindex_t to a
vm_offset_t. (This field is used to detect sequential access to the virtual
address range represented by the map entry.) There are three reasons to
make this change. First, a vm_offset_t is smaller on 32-bit architectures.
Consequently, a struct vm_map_entry is now smaller on 32-bit architectures.
Second, a vm_offset_t can be written atomically, whereas it may not be
possible to write a vm_pindex_t atomically on a 32-bit architecture. Third,
using a vm_pindex_t makes the next_read field dependent on which object in
the shadow chain is being read from.
Sometimes the software loses the race when appending more descriptors to
the tx ring and the tx queue stops.
This commit detects this condition and restart the tx queue whenever it stall.
Tested by: sobomax@, Keith White <kwhite@site.uottawa.ca>,
Paul Mather <paul@gromit.dlib.vt.edu>
Sponsored by: Rubicon Communications (Netgate)
Approved by: re (kib)
Autotune the number of pages set aside for UMA startup based on the number
of CPUs present. On amd64 this unbreaks the boot for systems with 92 or
more CPUs; the limit will vary on other systems depending on the size of
their uma_zone and uma_cache structures.
The major consumer of pages during UMA startup is the 19 zone structures
which are set up before UMA has bootstrapped itself sufficiently to use
the rest of the available memory: UMA Slabs, UMA Hash, 4 / 6 / 8 / 12 /
16 / 32 / 64 / 128 / 256 Bucket, vmem btag, VM OBJECT, RADIX NODE, MAP,
KMAP ENTRY, MAP ENTRY, VMSPACE, and fakepg. If the zone structures occupy
more than one page, they will not share pages and the number of pages
currently needed for startup is 19 * pages_per_zone + N, where N is the
number of pages used for allocating other structures; on amd64 N = 3 at
present (2 pages are allocated for UMA Kegs, and one page for UMA Hash).
This patch adds a new definition UMA_BOOT_PAGES_ZONES, currently set to 32,
and if a zone structure does not fit into a single page sets boot_pages to
UMA_BOOT_PAGES_ZONES * pages_per_zone instead of UMA_BOOT_PAGES (which
remains at 64). Consequently this patch has no effect on systems where the
zone structure fits into 2 or fewer pages (on amd64, 59 or fewer CPUs), but
increases boot_pages sufficiently on systems where the large number of CPUs
makes this structure larger. It seems safe to assume that systems with 60+
CPUs can afford to set aside an additional 128kB of memory per 32 CPUs.
The vm.boot_pages tunable continues to override this computation, but is
unlikely to be necessary in the future.
Tested on: EC2 x1.32xlarge
Relnotes: FreeBSD can now boot on 92+ CPU systems without requiring
vm.boot_pages to be manually adjusted.
Reviewed by: jeff, alc, adrian
Approved by: re (kib)
Add new unmount(2) flag, MNT_NONBUSY, to check whether there are
any open vnodes before proceeding. Make autounmound(8) use this flag.
Without it, even an unsuccessfull unmount causes filesystem flush,
which interferes with normal operation.
Due to ARC initial configuration not being done and kmem information
not being available we need to blindly set zfs_arc_max and zfs_arc_min
when configured via the tunable.
This fixes vfs.zfs.arc_(min|max) configuration via loader.conf broken by
r302265.
andrew [Wed, 6 Jul 2016 16:20:10 +0000 (16:20 +0000)]
Remove the old pre-INTRNG arm64 interrupt framework. GENERIC was switched
to INTRNG in r301565 with the old code no longer being built by default with
no reports of issues on any supported hardware.
Approved by: re (gjb)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
The TCPPCAP debugging feature caches recently-used mbufs for use in
debugging TCP connections. This commit provides a mechanism to free those
mbufs when the system is under memory pressure.
Because this will result in lost debugging information, the behavior is
controllable by a sysctl. The default setting is to free the mbufs.
- Replace all CTASSERT macro instances with static_assert's.
- Remove the WRAPPED_CTASSERT macro; it's now an unnecessary obfuscation.
- Localize all static_assert's to the structures being tested.
- Sort some headers per-style(9).