Ed Schouten [Thu, 9 Jul 2015 12:04:45 +0000 (12:04 +0000)]
Don't clobber td->td_retval[0] in proc_reap().
While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.
Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.
Zbigniew Bodek [Thu, 9 Jul 2015 11:32:29 +0000 (11:32 +0000)]
Rework CPU identification on ARM64
This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.
The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.
Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.
New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:
Cover a race between doselwakeup() and selfdfree(). If doselwakeup()
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory. The result is the race and accesses to the freed memory.
Refcount the selfd ownership. One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.
Reported by: Larry Rosenman <ler@lerctr.org>
Tested by: Larry Rosenman <ler@lerctr.org>, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Ed Schouten [Thu, 9 Jul 2015 07:20:15 +0000 (07:20 +0000)]
Import the CloudABI datatypes and create a system call table.
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.
CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.
The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.
We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.
The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().
More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA
Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].
upon further examination, it turns out that _unregister_all already
provides the guarantee that no threads will be in the _newsession code..
This is provided by the CRYPTODRIVER lock... This makes the pause
unneeded...
yet more documentation improvements... Many changes were made to the
OCF w/o documentation...
Document the new (8+ year old) device_t way of handling things, that
_unregister_all will leave no threads in newsession, the _SYNC flag,
the requirement that a flag be specified...
Other minor changes like breaking up a wall of text into paragraphs...
Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].
address an issue where consumers, like IPsec, can reuse the same
session in multiple threads w/o locking.. There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...
This patch adds a per cpu context and a lock to protect it...
It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...
I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines... The patch in D3015
was used to verify that this fix does fix the issue...
Reimplement the ordering requirements for the timehands updates, and
for timehands consumers, by using fences.
Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).
Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].
Use atomic_fence_fence_rel() to ensure ordering in the
seq_write_begin(), instead of the load_rmb/rbm_load functions. The
update does not need to be atomic due to the write lock owned.
Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.
For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().
In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Alan Cox [Wed, 8 Jul 2015 17:45:59 +0000 (17:45 +0000)]
The intention of r254304 was to scan the active queue continuously.
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2). To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.
Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.
Hiroki Sato [Wed, 8 Jul 2015 16:37:48 +0000 (16:37 +0000)]
Implement PF_IMMUTABLE flag and apply it to "name" and "jid" in
jail.conf parameters. This flag disallows redefinition of the parameter.
"name" and/or "jid" are automatically defined in jail.conf by using
the jail names at the front of jail parameter definitions. However,
one could override them by using a variable with the same name like
$name = "foo". This confused the parser and could end up with SIGSEGV.
Note that this change also affects a case when all of parameters are
defined in the command line arguments, not in jail.conf. Specifically,
"jail -c name=j1 name=j2" no longer works. This should be harmless.
Start using the gcc sentinel attribute, which can be used to
mark varargs function that need a NULL pointer to mark argument
termination, like execl(3).
Zbigniew Bodek [Wed, 8 Jul 2015 13:52:59 +0000 (13:52 +0000)]
Add memory barrier to bus_dmamap_sync()
On platforms which are fully IO-coherent, the map might be null.
We need to guarantee that all data is observable after the
sync operation is called. Add a memory barrier to ensure that on ARM.
Reviewed by: andrew, kib
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3012
Rick Macklem [Tue, 7 Jul 2015 23:41:25 +0000 (23:41 +0000)]
Since the case where secflavor < 0 indicates the security flavor is
to be negotiated, it could be a Kerberized mount. As such, filling
in the "principal" argument using the canonized host name makes sense.
If it is negotiated as AUTH_SYS, the "principal" argument is meaningless
but harmless.
unroll the loop slightly... This improves performance enough to
justify, especially for CBC performance where we can't pipeline.. I
don't happen to have my measurements handy though...
Mark Johnston [Tue, 7 Jul 2015 19:29:18 +0000 (19:29 +0000)]
Fix an incorrect assertion in witness.
The number of available lock list entries for a thread is LOCK_CHILDCOUNT,
and each entry can record up to LOCK_NCHILDREN locks. When iterating over
the locks held by a thread, a bound on the loop index is therefore given
by LOCK_CHILDCOUNT * LOCK_NCHILDREN; WITNESS_COUNT is an unrelated
constant.
Ed Maste [Tue, 7 Jul 2015 18:44:27 +0000 (18:44 +0000)]
Avoid creating invalid UEFI device path
The UEFI loader on the 10.1 release install disk (disc1) modifies an
existing EFI_DEVICE_PATH_PROTOCOL instance in an apparent attempt to
truncate the device path. In doing so it creates an invalid device
path.
Perform the equivalent action without modification of structures
allocated by firmware.
Fix rfcomm_sppd regression I could reproduced.
To reproduce it, Two machine running FreeBSD and
run
rfcomm_sppd -c 3 -S
rfcomm_sppd -a ${PEER} -c 3
on each side.
Place sched_random nearer to where it's first used: moving the
code nearer to where it is used makes the code easier to read
and we can reduce the initial "#ifdef SMP" island.
Reword a little the comment and clean some whitespaces
while here.
Adrian Chadd [Tue, 7 Jul 2015 03:51:29 +0000 (03:51 +0000)]
Attempt to make 5GHz HT/40 work on the 6xxx series NICs.
The 6205 (Taylor Peak) in the Lenovo X230 works fine in 5GHz 11a and 11n HT20,
but not 11n HT40. The NIC goes RX deaf the moment HT40 is configured.
It's so RX deaf that it doesn't even hear beacons and the firmware sends
"BEACON MISS" events. That's pretty deaf.
I tried configuring up the HT40 flags in monitor mode and it worked - so
I assumed that doing the transition from 20 -> 40MHz channel configuration
when going auth->assoc (ie, after the NIC has been partially configured)
is a problem.
So for now, let's just always set them if they're available.
Tested:
* Intel 5300, STA mode, 5GHz HT/40 AP; 2GHz HT/20 AP
* Intel 6205, STA mode, 5GHz HT/40, HT20, 11a AP; 2GHz HT/20 AP
This was pointed out to me by coworkers trying to use FreeBSD-HEAD
in the office on their Thinkpad T420p laptops.
TODO:
* I don't like how the HT40 flags are configured - the whole interop/
protection config should be re-checked. Notably, I think curhtprotmode
is 0 in a lot of cases, which means "no interoperability" and i think
that's busted.
cxgbe(4): Do not override the the global defaults for congestion drops.
The hw.cxgbe.cong_drop knob is not affected by this change because the
driver sets up congestion drop on a per-queue basis.
Warner Losh [Mon, 6 Jul 2015 20:10:47 +0000 (20:10 +0000)]
The results of the vote are in. This reflects that vote. Single
line statements inside of braces is recognized as an acceptable
style.
http://reviews.freebsd.org/V3
As always, this isn't license for wholesale change, etc.
Neel Natu [Mon, 6 Jul 2015 19:41:43 +0000 (19:41 +0000)]
Move the 'devmem' device nodes from /dev/vmm to /dev/vmm.io
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Neel Natu [Mon, 6 Jul 2015 19:33:29 +0000 (19:33 +0000)]
Always assert DCD and DSR in bhyve's uart emulation.
The /etc/ttys entry for a serial console in FreeBSD/x86 is as follows:
ttyu0 "/usr/libexec/getty 3wire" vt100 onifconsole secure
The initial terminal type passed to getty(8) is "3wire" which sets the
CLOCAL flag. However reset(1) clears this flag and any programs that try
to open the terminal will hang waiting for DCD to be asserted.
Fix this by always asserting DCD and DSR in the emulated uart.
The following discussion on virtualization@ has more details:
https://lists.freebsd.org/pipermail/freebsd-virtualization/2015-June/003666.html
Patrick Kelsey [Mon, 6 Jul 2015 16:07:21 +0000 (16:07 +0000)]
Don't acquire sysctlmemlock in userland_sysctl() when the old value
pointer is NULL, as in that case there are no userland pages that
could potentially be wired. It is common for old to be NULL and
oldlenp to be non-NULL in calls to userland_sysctl(), as this is used
to probe for the length of a variable-length sysctl entry before
retrieving a value. Note that it is typical for such calls to be made
with an uninitialized value in *oldlenp, so sysctlmemlock was
essentially being acquired at random (depending on the uninitialized
value in *oldlenp being > PAGE_SIZE or not) for these calls prior to
this patch.
Patrick Kelsey [Mon, 6 Jul 2015 02:12:49 +0000 (02:12 +0000)]
Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt. As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt. Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.
Mark Johnston [Sun, 5 Jul 2015 22:37:33 +0000 (22:37 +0000)]
Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.