John Baldwin [Mon, 18 Jul 2016 14:53:55 +0000 (14:53 +0000)]
Add PTRACE_VFORK to trace vfork events.
First, PL_FLAG_FORKED events now also set a PL_FLAG_VFORKED flag when
the new child was created via vfork() rather than fork(). Second, a
new PL_FLAG_VFORK_DONE event can now be enabled via the PTRACE_VFORK
event mask. This new stop is reported after the vfork parent resumes
due to the child calling exit or exec. Debuggers can use this stop to
reinsert breakpoints in the vfork parent process before it resumes.
The assertion re-added in r302614 was triggered when stopping signal
is delivered to vforked child. Issue is that we avoid stopping such
children in issignal() to not block parents. But executed AST, which
ignored stops, leaves the child with the signal pending but no AST
pending.
On first exec after vfork(), call signotify() to handle pending
reenabled signals. Adjust the assert to not check vfork children
until exec.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Relax checking if the privider size matches size recorded in the
superblock, allowing provider to be bit bigger, i.e. have some
extra padding after the FS image. That in some cases might be
a side-effect of using CLOOP format which enforces certain block
size and trying to compress image that is not exactly the number
of those blocks in size. The UFS itself does not have any issues
mounting such padded file systems, so it's what GEOM_LABEL should
do.
Alan Cox [Mon, 18 Jul 2016 04:20:26 +0000 (04:20 +0000)]
Break up vm_fault()'s implementation of the read-ahead and delete-behind
optimizations into two distinct pieces. The first piece consists of the
code that should only be performed once per page fault and requires the map
to be locked. The second piece consists of the code that should be
performed each time a pager is called on an object in the shadow chain.
(This second piece expects the map to be unlocked.)
Previously, the entire implementation could be executed multiple times.
Moreover, the second and subsequent executions would occur with the map
unlocked. Usually, the ensuing unsynchronized accesses to the map were
harmless because the map was not changing. Nonetheless, it was possible for
a use-after-free error to occur, where vm_fault() wrote to a freed map
entry. This change corrects that problem.
Don't print same value twice, one in decimal once in hex. This makes
output more cryptic than it needs to be and wastes cpu cycles and
console bandwidth.
Will Andrews [Mon, 18 Jul 2016 02:13:57 +0000 (02:13 +0000)]
Add my beinstall script.
This is meant to install a new BE (boot environment) given a fully built
world/kernel. In addition to installing world and kernel in the new BE,
it also automatically performs /etc updates (using etcupdate or mergemaster)
and package updates (using pkg).
Because this process is performed in a new BE, it reduces the need for a
second reboot. It also means a reboot into a partially updated system (due
to install or hardware failure) can't happen.
Inspired by and similar in function to Solaris/illumos-style upgrades.
Will Andrews [Mon, 18 Jul 2016 01:55:25 +0000 (01:55 +0000)]
libkvm: Improve physical address lookup scaling.
Instead of using a hash table to convert physical page addresses to offsets
in the sparse page array, cache the number of bits set for each 4MB chunk of
physical pages. Upon lookup, find the nearest cached population count, then
add/subtract the number of bits from that point to the page's PTE bit.
Then multiply by page size and add to the sparse page map's base offset.
This replaces O(n) worst-case lookup with O(1) (plus a small number of bits
to scan in the bitmap). Also, for a 128GB system, a typical kernel core of
about 8GB will now only require ~4.5MB of RAM for this approach instead of
~48MB as with the hash table.
More concretely, /usr/sbin/crashinfo against the same core improves from a
max RSS of 188MB and wall time of 43.72s (33.25 user 2.94 sys) to 135MB and
9.43s (2.58 user 1.47 sys). Running "thread apply all bt" in kgdb has a
similar RSS improvement, and wall time drops from 4.44s to 1.93s.
Remove booke_enable_l3_cache declaration and remaining definition.
L3 cache is not defined by Book-E, so is platform specific. Since it was
already moved for e500-based devices into mpc85xx in r292903, just eliminate it
altogether. Any device that supports L3 cache should have its own platform
means to enable it.
When we change nvl_array_next to NULL it means that we want to destroy or
take nvlist_array. The nvpair, which stores next nvlist of nvlist_array element
is no longer needed and can be freed.
Submitted by: Adam Starak <starak.adam@gmail.com>
MFC after: 1 week
Michal Meloun [Sun, 17 Jul 2016 13:43:00 +0000 (13:43 +0000)]
OFWPCI: Improve resource handling.
- add new rman for prefetchable memory. Is used only if given 'ranges'
property contains prefetchable memory range.
- not all ranges in 'ranges' property are subject for rman's filling.
Tegra for example, have two addition records which are used for
'pci 'register' -> 'assigned-address' -> 'ranges' machinery.
Add sc_ranges_mask for masking not rman related ranges.
- consistently pass unknown (not managed at this level) resources
allocation/release/adjust requests to parent.
Alexander Motin [Sun, 17 Jul 2016 12:45:58 +0000 (12:45 +0000)]
In AHCI_IRQ_MODE_AFTER mode do not clear interrupts below.
This is probably a NOP change since IS register is not activery used for
interrupts below the shared, but it looked odd to clear interrupts we did
not handle.
In g_Ctoc() apply CHAR() macro to *str to strip all flags. It gains nothing
right now, but some architectures theoretically may 64-bit wchar_t and the
code looks more correct.
1) This file full of direct char <-> wchar_t assignment, not converted, cut
them down. This hack still remains:
* 2. Illegal byte sequences in filenames are handled by treating them as
* single-byte characters with a values of such bytes of the sequence
* cast to wchar_t.
2) Reword the comment in the hack above to reflect implementation.
3) Protect signed wchar_t from sign extension when a signed char is assigned
to it in the hack above.
3) Corresponding backward hack in g_Ctoc() was not implemented, so all
pathes with illegal byte sequences are skipped as result, implement it now.
4) globtilde() forget to convert expanded user home dir from multibyte to
wchar.
5) Protect globtilde() from long expansion truncation.
6) Results was not sorted according to collate as POSIX requires.
Allan Jude [Sat, 16 Jul 2016 18:28:44 +0000 (18:28 +0000)]
Fix encrypted MBR install
The pools are exported and reimported in order to write the bootcode
This causes an error when the bootpool is later mounted by common code
The bootpool is now imported with the -N flag to prevent mounting
Jared McNeill [Sat, 16 Jul 2016 18:06:41 +0000 (18:06 +0000)]
Add support for Allwinner H3 EMAC.
H3 EMAC is the same as A83T/A64 except the SoC includes an (optional)
internal 10/100 PHY. Both internal and external PHYs are supported on H3
with this driver.
Another issue reported on http://seclists.org/oss-sec/2016/q3/68 is
that struct kevent member ident has uintptr_t type, which is silently
truncated to int in the call to fget(). Explicitely check for the
valid range.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Alexander Motin [Sat, 16 Jul 2016 09:08:33 +0000 (09:08 +0000)]
Increase I82545_MAX_TXSEGS from 20 to 64 and add checks for it.
There seems no hard limit on number of segments per packet in the chip,
and 20 appeared insufficient. Hope 64 will be enough, but if not -- add
check to report that and drop the packet instead of corrupting stack.
Colin Percival [Sat, 16 Jul 2016 08:04:00 +0000 (08:04 +0000)]
Now that potentially buggy versions of Xen are automatically detected
(see r302635), there is no need to force msix interrupt migration off
via loader.conf.
Add a regression test to make sure the Russian collation is actually working
when importing collation support from Dragonfly/Illumos amdmi3@ tested the
collation branch and reported an issue with Russian collation. John Marino fixed
the issue in Dragonfly and I merged it back to FreeBSD.
Now that Illumos is working on merging our fixes they (Lauri Tirkkonen) found
issues with the commit that fixes the russian collation in UTF-8 that resulted
in a crash with strxfrm(3) and the ISO-8859-5 locale (fixed in FreeBSD r302916).
This small test was written to ensure we do not bring back the old issue with
russian collation while fixing the other issue.
Fix dlsym(RTLD_NEXT) handling to only return the next library in last library cases.
The root of the problem here is that TAILQ_FOREACH_FROM will default to
the head of the list if passed NULL, which will be the case if there are
no libraries loaded after this one. Thus all libraries, including the
current, were iterated in that case rather than none.
Michael Tuexen [Fri, 15 Jul 2016 17:40:34 +0000 (17:40 +0000)]
When calling netstat -Laptcp the local address values are not aligned
with the corresponding entry in the table header.
r295136 increased the value width from 14 to 32 without the corresponding
change to the table header. This commit adds the change to the table
header width.
Michael Tuexen [Fri, 15 Jul 2016 15:55:36 +0000 (15:55 +0000)]
Fix a bug which results in a core dump when running netstat with
the -W option and having a listening SCTP socket.
The bug was introduced in r279122 when adding support for libxo.
John Baldwin [Fri, 15 Jul 2016 15:32:09 +0000 (15:32 +0000)]
Add a mask of optional ptrace() events.
ptrace() now stores a mask of optional events in p_ptevents. Currently
this mask is a single integer, but it can be expanded into an array of
integers in the future.
Two new ptrace requests can be used to manipulate the event mask:
PT_GET_EVENT_MASK fetches the current event mask and PT_SET_EVENT_MASK
sets the current event mask.
The current set of events include:
- PTRACE_EXEC: trace calls to execve().
- PTRACE_SCE: trace system call entries.
- PTRACE_SCX: trace syscam call exits.
- PTRACE_FORK: trace forks and auto-attach to new child processes.
- PTRACE_LWP: trace LWP events.
The S_PT_SCX and S_PT_SCE events in the procfs p_stops flags have
been replaced by PTRACE_SCE and PTRACE_SCX. PTRACE_FORK replaces
P_FOLLOW_FORK and PTRACE_LWP replaces P2_LWP_EVENTS.
The PT_FOLLOW_FORK and PT_LWP_EVENTS ptrace requests remain for
compatibility but now simply toggle corresponding flags in the
event mask.
While here, document that PT_SYSCALL, PT_TO_SCE, and PT_TO_SCX both
modify the event mask and continue the traced process.
John Baldwin [Fri, 15 Jul 2016 15:12:56 +0000 (15:12 +0000)]
Add documentation for the sigevent structure.
- Add a sigevent(3) manpage to give a general overview of the sigevent
structure and the available notification mechanisms.
- Document that AIO requests contain a nested sigevent structure that can
be used to request completion notification.
- Expand the sigevent details in other manuals to note details such as
the extra values stored in a queued signal's information or in a posted
kevent.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7122
Add new System Hardening menu and options to bsdinstall.
This patch add new 'hardening' file responsible for new bsdinstall
'System Hardening' menu allowing users to set some sane and carefully
picked system security options (like random process id's, hiding
other users/groups processes and others).
All options are OFF by default in this patch due to POLA principle
with intention to turn change some of them to ON by default in future.
Fix regression introduced by r302350. The change of return value for a
callout that wasn't scheduled at all was unintentional and yielded in
several panics.
Do not allow creation of char or block special nodes with VNOVAL dev_t.
As was reported on http://seclists.org/oss-sec/2016/q3/68, tmpfs code
contains assertion that rdev != VNOVAL. On FreeBSD, there is no other
consequences except triggering the assert. To be compatible with
systems where device nodes have some significance, reject mknod(2)
call with dev == VNOVAL at the syscall level.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Adrian Chadd [Fri, 15 Jul 2016 06:39:35 +0000 (06:39 +0000)]
[ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
When building multiple kernels using KERNCONF, non-existent KERNCONF
files will produce an error and buildkernel will fail. Previously missing
KERNCONF files silently failed giving no indication as to why, only to
subsequently discover during installkernel that the desired kernel was
never built in the first place.
John Baldwin [Thu, 14 Jul 2016 23:28:53 +0000 (23:28 +0000)]
Fix aio system call wrappers in librt.
- Update aio_return/waitcomplete wrappers for the ssize_t return type.
- Fix the aio_return() wrapper to fail with EINVAL on a pending job.
This matches the semantics of the in-kernel system call. Also,
aio_return() returns errors via errno, not via the return value.