des [Mon, 7 Oct 2013 10:26:38 +0000 (10:26 +0000)]
Introduce the /libexec/freebsd-version script, which is intended to be
used by auditing tools to determine the userland patch level when it
differs from what `uname -r` reports. This can happen when the system
is kept up-to-date using freebsd-update and the last SA did not touch
the kernel, or when a new kernel has been installed but the system has
not yet rebooted.
gibbs [Sat, 5 Oct 2013 23:11:01 +0000 (23:11 +0000)]
Formalize the concept of virtual CPU ids by adding a per-cpu vcpu_id
field. Perform vcpu enumeration for Xen PV and HVM environments
and convert all Xen drivers to use vcpu_id instead of a hard coded
assumption of the mapping algorithm (acpi or apic ID) in use.
amd64/include/pcpu.h:
i386/include/pcpu.h:
Add vcpu_id to the amd64 and i386 pcpu structures.
dev/xen/timer/timer.c
x86/xen/xen_intr.c
Use new vcpu_id instead of assuming acpi_id == vcpu_id.
i386/xen/mp_machdep.c:
i386/xen/mptable.c
x86/xen/hvm.c:
Perform Xen HVM and Xen full PV vcpu_id mapping.
x86/xen/hvm.c:
x86/acpica/madt.c
Change SYSINIT ordering of acpi CPU enumeration so that it
is guaranteed to be available at the time of Xen HVM vcpu
id mapping.
neel [Sat, 5 Oct 2013 21:22:35 +0000 (21:22 +0000)]
Merge projects/bhyve_npt_pmap into head.
Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.
Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.
pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.
The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.
Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.
An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)
The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).
The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.
TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.
Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.
PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.
Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.
Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.
Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.
Special thanks to Peter Holm for testing the patch on short notice.
gibbs [Sat, 5 Oct 2013 19:51:09 +0000 (19:51 +0000)]
Correct panic caused by attaching both Xen PV and HyperV virtualization
aware drivers on Xen hypervisors that advertise support for some
HyperV features.
x86/xen/hvm.c:
When running in HVM mode on a Xen hypervisor, set vm_guest
to VM_GUEST_XEN so other virtualization aware components in
the FreeBSD kernel can detect this mode is active.
dev/hyperv/vmbus/hv_hv.c:
Use vm_guest to ignore Xen's HyperV emulation when Xen is
detected and Xen PV drivers are active.
alc [Sat, 5 Oct 2013 18:53:03 +0000 (18:53 +0000)]
Tidy up kmeminit(): Since r245575, 'nmbclusters' is calculated after
kmeminit() runs, so it contributes nothing to 'vm_kmem_size'; update a
comment to reflect that r254025 replaced the kmem submap with the kmem
arena.
grehan [Fri, 4 Oct 2013 23:29:07 +0000 (23:29 +0000)]
Remove obsolete cmd-line options and code associated with
these.
The mux-vcpus option may return at some point, given it's utility
in finding bhyve (and FreeBSD) bugs.
jilles [Fri, 4 Oct 2013 21:25:55 +0000 (21:25 +0000)]
kldxref: Do not depend on the directory order.
Sort the filenames to get a consistent result between machines of the same
architecture.
Also, sort FTS_D entries after other entries so kldxref -R works properly in
the uncommon case that a directory contains both subdirectories and modules.
Previously, this may have happened to work, depending on the order of files
in the directory.
nwhitehorn [Fri, 4 Oct 2013 18:27:02 +0000 (18:27 +0000)]
Disable use of compiler atomic builtins. For APR, this is limited to
architectures where they are known not to work. For SVN itself, use
the least common denominator and disable them across the board. This
allows svnlite to build and run on all FreeBSD architectures.
jchandra [Fri, 4 Oct 2013 11:11:51 +0000 (11:11 +0000)]
Fixes for the Netlogic XLP on-chip RSA block driver
The changes are to:
* Use contigmalloc/contigfree which handling microcode buffer
* Use a different buffer to send microcode to each engine
* Swap microcode in little-endian compilation
* Fix freeback message queue id field
* Simplify xlp_get_rsa_opsize() to remove unnecessary checks
* Fix NULL check after use in xlp_free_cmd_params()
* Do better error handling when the hardware returns error
* Fix error codes in few cases
Submitted by: Vekatesh J. V. <venkatesh.vivekanandan@broadcom.com>
Approved by: re (hrs)
jchandra [Fri, 4 Oct 2013 10:01:20 +0000 (10:01 +0000)]
Style fixes for the Netlogic XLP RSA driver
Updates to the Netlogic XLP on-chp RSA block driver. The changes are
to follow style(9) guidelines, to improve readability and to remove
unnecessary initialization.
No changes to logic have been introduced by this commit.
Submitted by: Venkatesh J. V. <venkatesh.vivekanandan@broadcom.com>
Approved by: re (hrs)
hrs [Fri, 4 Oct 2013 04:15:18 +0000 (04:15 +0000)]
Do not attempt to do AF-specific configurations on a interface when
noafif() is true. The following warning message was displayed when
pflog0 interface existed, for example:
ifconfig: ioctl(SIOCGIFINFO_IN6): Protocol family not supported
hrs [Fri, 4 Oct 2013 02:44:04 +0000 (02:44 +0000)]
Add epair(4) support in $cloned_interfaces. One should be specified
as "epair0" in $cloned_interfaces and "epair0[ab]" in the others in
rc.conf like the following:
dim [Thu, 3 Oct 2013 17:50:14 +0000 (17:50 +0000)]
Pull in r186338 from upstream llvm trunk:
Remove invalid assert in DAGTypeLegalizer::RemapValue
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:
// Note that these invariants may not hold momentarily when processing a node:
// the node being processed may be put in a map before being marked Processed.
Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.
Thanks to Richard Sandiford for investigating this issue!
Fixes PR16562.
This fixes assertions which could occur in the multimedia/ffmpeg1 and
multimedia/ffmpeg2 ports.
Approved by: re (hrs)
Reported by: Matthias Apitz <guru@unixarea.de>
MFC after: 3 days
markj [Wed, 2 Oct 2013 17:14:12 +0000 (17:14 +0000)]
Add a separate translator for headers passed to the TCP probes in the
input path. These probes get some of the fields in host order, whereas the
output probes get them in network order, so a single translator isn't
enough. This workaround ensures that the problem is essentially invisble
to users: none of the probe arguments or their fields have changed.
kib [Wed, 2 Oct 2013 06:00:34 +0000 (06:00 +0000)]
When helping the bufdaemon from the buffer allocation context, there
is no sense to walk the whole dirty buffer queue. We are only
interested in, and can operate on, the buffers owned by the current
vnode [1]. Instead of calling generic queue flush routine, do
VOP_FSYNC() if possible.
Holding the dirty buffer queue lock in the bufdaemon, without dropping
it, can cause starvation of buffer writes from other threads. This is
esp. easy to reproduce on the big memory machines, where large files
are written, causing almost all dirty buffers accumulating in several
big files, which vnodes are locked by writers. Bufdaemon cannot flush
any buffer, but is iterating over the whole dirty queue
continuously. Since dirty queue mutex is not dropped, bufdone() in
g_up thread is starved, usually deadlocking the machine [2]. Mitigate
this by dropping the queue lock after the vnode is locked, allowing
other queue lock contenders to make a progress.
Discussed with: Jeff [1]
Reported by: pho [2]
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (hrs)
emaste [Wed, 2 Oct 2013 02:32:58 +0000 (02:32 +0000)]
Populate .rld_map on MIPS for debuggers
On MIPS the .dynamic section is read-only, so the pointer to rtld
information for debuggers cannot be stored there (in DT_DEBUG).
Instead, a special section .rld_map is used.
Sponsored by: DARPA, AFRL
Approved by: re (delphij)
emaste [Wed, 2 Oct 2013 00:50:27 +0000 (00:50 +0000)]
Use correct size for MIPS .rld_map section
On MIPS .dynamic is read-only and so a special section .rld_map is used
to store the pointer to the rtld information for debuggers. This
section had a hard coded size of 4 bytes which is not correct for
mips64. (Note that FreeBSD's rtld does not yet populate .rld_map.)
Sponsored by: DARPA, AFRL
Approved by: re (delphij)
jilles [Tue, 1 Oct 2013 21:17:18 +0000 (21:17 +0000)]
accept(2): Update portability note for accept4().
The accept(2) man page warns that O_NONBLOCK and other properties on the
new socket may vary across implementations. However, this issue only
applies to accept() and not to accept4(). On the other hand, accept4()
is not commonly available yet.
Reported by: pluknet
Reviewed by: bjk
Approved by: re (kib)
dim [Tue, 1 Oct 2013 19:14:24 +0000 (19:14 +0000)]
Pull in r191711 from upstream llvm trunk:
The X86FixupLEAs pass for Intel Atom must not call
convertToThreeAddress on ADD16rr opcodes, if src1 != src, since that
would cause convertToThreeAddress to try to create a virtual register.
This is not permitted after register allocation, which is when the
X86FixupLEAs pass runs.
This patch fixes PR16785.
Pull in r191715 from upstream llvm trunk:
Forgot to add a break statement.
This should enable building the x11-toolskits/libXaw port with
CPUTYPE=atom.
Approved by: re (gjb)
Reported by: Kenta Suzumoto <kentas@hush.com>
MFC after: 3 days
alfred [Tue, 1 Oct 2013 15:43:23 +0000 (15:43 +0000)]
Fixed kernel crash when running devinfo
When calling to ib_uverbs_cleanup_ucontext, there is a call to
mutex_lock of xrcd_table_mutex, which was not initialized.
Added missing initialization for xrcd_table_mutex.
alfred [Tue, 1 Oct 2013 15:36:51 +0000 (15:36 +0000)]
Fixed kernel crash when removing IPOIB_CM option from configuration file
Changed module init from module_init() to module_init_order() with
SI_ORDER_MIDDLE flag
Submitted by: Orit Moskovich (oritm mellanox.com)
Approved by: re
Update the Dutch calendar entries:
- prince Johan Friso passed away in 2013
- correct status of queen Maxima and crown princess Catharina-Amalia
- language fixes
Approved by: remko (mentor)
Approved by: re (gjb)
MFC after: 3 days